Reddit

Identify the error in the following code block intended to return the exact number of distinct values in the division column of DataFrame storesDF:

Code block:

storesDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))

storesDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))

Exam-Like

The approx_count_distinct() operation needs a second argument to set the rsd parameter to ensure it returns the exact number of distinct values.

18.1%

There is no alias() operation for the approx_count_distinct() operation's output.

10.6%

There is no way to return an exact distinct number in Spark because the data Is distributed across partitions.

7.5%

The approx_count_distinct()operation is not a standalone function - it should be used as a method from a Column object.

16.3%

The approx_count_distinct() operation cannot determine an exact number of distinct values in a column.

47.5%

Databricks Certified Associate Developer for Apache Spark

Get started today

Comments