
Answer-first summary for fast verification
Answer: on = [col("column1"), col("column2")]
The question asks which on argument cannot be used with DataFrame.join() to join two DataFrames a and b on column1 and column2. When joining, Spark needs to know clearly which DataFrame each column belongs to—especially if both have columns with the same name. Option Analysis A: on=[a.column1 == b.column1, a.column2 == b.column2] ✅ Works — Explicitly compares columns from each DataFrame; no ambiguity. B: on=[col("column1"), col("column2")] ❌ Fails — Ambiguous because Spark doesn’t know if "column1" is from a or b. Both have the same names. C: on=[col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")] ✅ Works — Fully qualified names (a.column1) remove ambiguity. E: on=["column1", "column2"] ✅ Works — Clean syntax when joining on columns with identical names in both DataFrames; Spark matches them positionally.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following argument pairs cannot be used in DataFrame.join() to perform an inner join between two DataFrames (aliased as "a" and "b") on two key columns named column1 and column2?
A
on = [a.column1 == b.column1, a.column2 == b.column2]
B
on = [col("column1"), col("column2")]
C
on = [col("a.column1") == col("b.column1"), col("a.column2") == col("b.column2")]
D
on = ["column1", "column2"]
No comments yet.