
Answer-first summary for fast verification
Answer: No
The question asks whether `df.summary()` calculates min, max, mean, and standard deviation for both string and numeric columns. According to PySpark documentation and community consensus, `df.summary()` provides summary statistics including count, mean, stddev, min, and max for numeric columns only. For string columns, it only provides count and may show min/max based on lexicographical order, but does not calculate mean or standard deviation since these statistical measures are not meaningful for string data. The community discussion shows mixed opinions, but the most upvoted and technically accurate comments (including references to official Spark documentation) confirm that `df.summary()` does not fully meet the goal for string columns. Therefore, the correct answer is 'No' (B).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You have a Fabric tenant containing a new semantic model in OneLake. You use a Fabric notebook to load the data into a Spark DataFrame. You need to evaluate the data by calculating the minimum, maximum, mean, and standard deviation values for all string and numeric columns.
You implement the following PySpark code:
df.summary()
Does this solution meet the goal?
A
Yes
B
No
No comments yet.