Ultimate access to all questions.
Consider a scenario where you need to transform large volumes of data using Apache Spark in Azure Databricks. The data includes customer purchase histories and needs to be aggregated by region and month. Which of the following approaches would be most efficient for this task, considering the need for parallel processing and scalability?