
Answer-first summary for fast verification
Answer: Azure Databricks with the 'partitionBy' function in Spark SQL
Azure Databricks is a powerful analytics platform that leverages Apache Spark. It provides various functions and capabilities for data processing, including the ability to split data into smaller chunks. The 'partitionBy' function in Spark SQL allows you to partition a DataFrame or Dataset into multiple partitions based on one or more columns. This can help improve the efficiency of data processing by distributing the workload across multiple nodes in a Spark cluster. While other services like Azure Data Factory, Azure Data Lake Storage Gen2, and Azure Stream Analytics have their own capabilities, they are not specifically designed for splitting data into smaller chunks for processing.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a data pipeline that involves processing large volumes of data, you need to split the data into smaller chunks for more efficient processing. Which Azure service and feature would you use to achieve this?
A
Azure Data Factory with the 'ForEach' activity
B
Azure Data Lake Storage Gen2 with hierarchical namespaces
C
Azure Databricks with the 'partitionBy' function in Spark SQL
D
Azure Stream Analytics with the 'Windowing' feature
No comments yet.