
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
When designing a data model for time-series data in a lakehouse, which approach is most effective for optimizing query performance for time-based aggregations?
When designing a data model for time-series data in a lakehouse, which approach is most effective for optimizing query performance for time-based aggregations?
Explanation:
The correct answer is C. Implement a star schema with a time dimension table that includes various time hierarchies. This approach is optimal because:
-
Star Schema Efficiency: A star schema organizes data into a central fact table linked to dimension tables, including a time dimension. This setup is ideal for time-series data, enabling quick queries and aggregations.
-
Time Hierarchies: Incorporating hierarchies (like year, month, day) in the time dimension allows for flexible data analysis at various granularities, essential for detailed time-based queries.
-
Performance Benefits: The star schema's structure supports efficient data filtering and aggregation by time, significantly improving query performance.
-
Data Integrity: This model ensures accurate relationships between time, events, and entities, crucial for reliable analysis and reporting.
In summary, a star schema with a detailed time dimension table is the best choice for managing time-series data in a lakehouse, offering both performance and flexibility.