
Answer-first summary for fast verification
Answer: Create a dedicated Delta table optimized for the Databricks SQL service, with carefully chosen partitioning columns and file sizes that match the most frequent query patterns, and consider using Z-ordering for columns frequently used in WHERE clauses.
The correct answer is C because it addresses the need for both performance and consistency by tailoring the optimization strategies to the specific requirements of the Databricks SQL service, including partitioning and file sizing that align with common query patterns. This approach also considers cost efficiency and scalability. Option A is insufficient as automatic optimizations may not fully meet the specific needs of the SQL service. Option B overlooks the importance of aligning optimization strategies with the SQL service's specific requirements. Option D is incorrect as it fails to leverage Delta Lake's optimization capabilities, potentially leading to poor performance and inconsistency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a scenario where a large dataset is stored in Delta Lake and is both frequently updated and queried by multiple teams using Databricks SQL service, what is the BEST approach to optimize the dataset for performance and consistency while considering cost efficiency and scalability? Choose the most appropriate option from the following:
A
Rely solely on Delta Lake's automatic optimization features, such as auto-compaction and auto-indexing, without any manual intervention.
B
Implement manual partitioning and indexing based on general best practices, ignoring the specific query patterns and performance requirements of the Databricks SQL service.
C
Create a dedicated Delta table optimized for the Databricks SQL service, with carefully chosen partitioning columns and file sizes that match the most frequent query patterns, and consider using Z-ordering for columns frequently used in WHERE clauses.
D
Avoid any form of optimization to minimize overhead during data ingestion, assuming that Delta Lake's default settings will suffice for all use cases.