
Answer-first summary for fast verification
Answer: df.select('sales').summary() print(B)
The correct way to compute summary statistics for a specific column in a Spark DataFrame is to first select the column using the 'select' method and then call the 'summary' method on the selected column. Option B does this correctly. Option A is incorrect because it tries to call the 'summary' method directly on the column without selecting it first. Option C is incorrect because it uses the Python indexing syntax, which is not applicable in Spark DataFrames. Option D is incorrect because the 'describe' method is not available in Spark DataFrames.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Given a Spark DataFrame 'df' with a numerical column 'sales', write a code snippet that computes the summary statistics for the 'sales' column using the .summary() method and explain the output.
A
df.sales.summary() print(A)
B
df.select('sales').summary() print(B)
C
df['sales'].summary() print(C)
D
df['sales'].describe() print(D)
No comments yet.