Microsoft Azure Data Engineer Associate - DP-203

Ultimate access to all questions.

Consider a scenario where you need to transform large volumes of data using Apache Spark in Azure Databricks. The data includes customer purchase histories and needs to be aggregated by region and month. Which of the following approaches would be most efficient for this task, considering the need for parallel processing and scalability?

Simulated

Use a for-loop to iterate through the data and aggregate it sequentially.

0.0%

Leverage Spark's DataFrame API to perform groupBy operations on region and month, followed by aggregation functions like sum and count.

Loading comments...