
Explanation:
The correct answer must use the & operator for AND conditions in PySpark and properly reference column names using the col function. Option A uses the Python and keyword, which is incorrect for combining Spark Column conditions. Option B uses or, which does not meet the question's requirement for an AND condition. Option C has syntax errors by not using the col function. Option E is syntactically incorrect. Option D correctly uses the & operator for the AND condition but lacks parentheses around each condition, which is technically required for correct operator precedence in PySpark. Despite this, Option D is the closest to the correct syntax among the provided options.
Ultimate access to all questions.
No comments yet.
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25000 AND the value in column customerSatisfaction is greater than or equal to 30?
A
storesDF.filter(col("sqft") <= 25000 and col("customerSatisfaction") >= 30)
B
storesDF.filter(col("sqft") <= 25000 or col("customerSatisfaction") >= 30)
C
storesDF.filter(sqft) <= 25000 and customerSatisfaction >= 30)
D
storesDF.filter( (col("sqft") <= 25000) & (col("customerSatisfaction") >= 30) )
E
storesDF.filter(sqft <= 25000) & customerSatisfaction >= 30)