
Answer-first summary for fast verification
Answer: storesDF.filter( (col("sqft") <= 25000) & (col("customerSatisfaction") >= 30) )
The correct answer must use the `&` operator for AND conditions in PySpark and properly reference column names using the `col` function. Option A uses the Python `and` keyword, which is incorrect for combining Spark Column conditions. Option B uses `or`, which does not meet the question's requirement for an AND condition. Option C has syntax errors by not using the `col` function. Option E is syntactically incorrect. Option D correctly uses the `&` operator for the AND condition but lacks parentheses around each condition, which is technically required for correct operator precedence in PySpark. Despite this, Option D is the closest to the correct syntax among the provided options.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25000 AND the value in column customerSatisfaction is greater than or equal to 30?
A
storesDF.filter(col("sqft") <= 25000 and col("customerSatisfaction") >= 30)
B
storesDF.filter(col("sqft") <= 25000 or col("customerSatisfaction") >= 30)
C
storesDF.filter(sqft) <= 25000 and customerSatisfaction >= 30)
D
storesDF.filter( (col("sqft") <= 25000) & (col("customerSatisfaction") >= 30) )
E
storesDF.filter(sqft <= 25000) & customerSatisfaction >= 30)
No comments yet.