
Answer-first summary for fast verification
Answer: Evaluate the nature of the queries, the available storage capacity, and the data's partitioning scheme, suggesting file sizes ranging from 64MB to 1GB to balance query performance and storage efficiency.
The correct answer is B because it comprehensively considers the critical factors affecting both ingestion and query performance, including the nature of the queries, available storage, and the data's partitioning scheme. The recommended file size range of 64MB to 1GB is optimal for balancing performance and efficiency. Option A is too narrow, focusing only on data volume and storage costs without considering query performance. Option C is inflexible, applying a one-size-fits-all approach that may not suit all data or query types. Option D overlooks the importance of query performance and scalability, focusing solely on storage capacity.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of optimizing a data pipeline that ingests large volumes of data into Delta Lake on Microsoft Azure, you are tasked with determining the most appropriate file sizes to enhance both ingestion performance and query efficiency. Considering the need to balance cost, compliance, and scalability, which of the following factors should you prioritize when deciding on the file sizes, and what would be the most reasonable range for these file sizes to ensure optimal performance? Choose the best option.
A
Focus solely on the volume of data ingested, recommending file sizes strictly between 64MB and 128MB to minimize storage costs.
B
Evaluate the nature of the queries, the available storage capacity, and the data's partitioning scheme, suggesting file sizes ranging from 64MB to 1GB to balance query performance and storage efficiency.
C
Base the decision exclusively on the data's partitioning scheme, enforcing a uniform file size of 256MB for all data to simplify management, regardless of query patterns or storage constraints.
D
Consider only the available storage capacity, ignoring the potential impact on query performance and scalability, with no specific range recommended for file sizes.