
Ultimate access to all questions.
In a scenario where a large dataset is stored in Delta Lake and is both frequently updated and queried by multiple teams using Databricks SQL service, what is the BEST approach to optimize the dataset for performance and consistency while considering cost efficiency and scalability? Choose the most appropriate option from the following:
A
Rely solely on Delta Lake's automatic optimization features, such as auto-compaction and auto-indexing, without any manual intervention.
B
Implement manual partitioning and indexing based on general best practices, ignoring the specific query patterns and performance requirements of the Databricks SQL service.
C
Create a dedicated Delta table optimized for the Databricks SQL service, with carefully chosen partitioning columns and file sizes that match the most frequent query patterns, and consider using Z-ordering for columns frequently used in WHERE clauses.
D
Avoid any form of optimization to minimize overhead during data ingestion, assuming that Delta Lake's default settings will suffice for all use cases.