
Answer-first summary for fast verification
Answer: Use Azure Data Lake Storage Gen2 for raw data storage and Azure Databricks for processing. Partition data by date and use Azure Data Factory for job scheduling.
Azure Data Lake Storage Gen2 is optimized for large-scale data storage and processing, making it suitable for handling 100 TB datasets. Partitioning data by date helps in efficient querying and management. Azure Data Factory provides robust job scheduling capabilities, which is essential for managing batch processing jobs.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with developing a batch processing solution using Azure Data Lake Storage Gen2 and Azure Databricks. The solution must handle a large dataset of over 100 TB. Describe the steps you would take to ensure efficient data processing and storage management. Consider aspects such as data partitioning, storage optimization, and job scheduling.
A
Use Azure Data Lake Storage Gen2 for raw data storage and Azure Databricks for processing. Partition data by date and use Azure Data Factory for job scheduling.
B
Use Azure Blob Storage for raw data storage and Azure Databricks for processing. Do not partition data and use cron jobs for scheduling.
C
Use Azure Data Lake Storage Gen2 for raw data storage and Azure Databricks for processing. Partition data by date and use Azure Databricks for job scheduling.
D
Use Azure Blob Storage for raw data storage and Azure Databricks for processing. Partition data by date and use Azure Data Factory for job scheduling.