Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


When designing a data model for time-series data in a lakehouse, which approach is most effective for optimizing query performance for time-based aggregations?




Explanation:

The correct answer is C. Implement a star schema with a time dimension table that includes various time hierarchies. This approach is optimal because:

  1. Star Schema Efficiency: A star schema organizes data into a central fact table linked to dimension tables, including a time dimension. This setup is ideal for time-series data, enabling quick queries and aggregations.

  2. Time Hierarchies: Incorporating hierarchies (like year, month, day) in the time dimension allows for flexible data analysis at various granularities, essential for detailed time-based queries.

  3. Performance Benefits: The star schema's structure supports efficient data filtering and aggregation by time, significantly improving query performance.

  4. Data Integrity: This model ensures accurate relationships between time, events, and entities, crucial for reliable analysis and reporting.

In summary, a star schema with a detailed time dimension table is the best choice for managing time-series data in a lakehouse, offering both performance and flexibility.