Ultimate access to all questions.
In a scenario where you are working with two DataFrames, df1 and df2, in a Spark SQL environment, df1 has the schema (id: int, name: string) and df2 has the schema (id: int, age: int). You are tasked with performing a left join operation between these two DataFrames to analyze customer data. The analysis requires that all customer names from df1 are included in the result, regardless of whether there is a matching age in df2. However, for those customers that do have a matching age, the age should also be included in the result. Considering the need to minimize computational resources and ensure the query's efficiency, especially with a significantly large dataset, which of the following statements not only accurately describes the result of a left join query between df1 and df2 but also suggests the most efficient execution plan? Choose the best option.