
Answer-first summary for fast verification
Answer: storesDF.join(employeesDF, Seq(col("storeId"), col("employeeId"))), storesDF.alias("s").join(employeesDF.alias("e"), col("s.storeId") === col("e.storeId") and col("s.employeeId") === col("e.employeeId"))
The question asks which code blocks fail to perform an inner join on 'storeId' and 'employeeId'. Let's analyze each option: - **Option A**: Uses `Seq(col("storeId"), col("employeeId"))`, which is a sequence of `Column` objects. The `join` method expects a `Seq[String]` for column names, not `Column` objects. This causes a type mismatch error. - **Option B**: Correctly uses `Seq("storeId", "employeeId")` to join on columns with the same name in both DataFrames. Defaults to inner join. - **Option C**: Properly defines the join condition using `===` operators and `and`, forming a valid `Column` expression for the join. Inner join is implicit. - **Option D**: Explicitly specifies `"inner"` join type but uses the correct `Seq[String]` for columns. Works as intended. - **Option E**: Uses `col("s.storeId")` and `col("e.employeeId")`, which incorrectly references aliased columns. Aliases in Spark require using the DataFrame's column method (e.g., `$"s.storeId"`) or avoiding ambiguous column names. The `col("s.storeId")` syntax looks for a non-existent column named 's.storeId', causing an error. **Options A and E** fail due to incorrect syntax for joining columns, while B, C, and D are valid.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following code blocks does not correctly return a new DataFrame resulting from an inner join between storesDF and employeesDF on columns storeId and employeeId?
A
storesDF.join(employeesDF, Seq(col("storeId"), col("employeeId")))
B
storesDF.join(employeesDF, Seq("storeId", "employeeId"))
C
storesDF.join(employeesDF, storesDF("storeId") === employeesDF("storeId") and storesDF("employeeId") === employeesDF("employeeId"))
D
storesDF.join(employeesDF, Seq("storeId", "employeeId"), "inner")
E
storesDF.alias("s").join(employeesDF.alias("e"), col("s.storeId") === col("e.storeId") and col("s.employeeId") === col("e.employeeId"))
No comments yet.