Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


The business intelligence team has a dashboard tracking retail store metrics with the following schema:

store_id INT, 
total_sales_gtd FLOAT, 
avg_daily_sales_gtd FLOAT, 
total_sales_ytd FLOAT, 
avg_daily_sales_ytd FLOAT, 
previous_day_sales FLOAT, 
total_sales_7d FLOAT,
avg_daily_sales_7d FLOAT, 
updated TIMESTAMP

The Lakehouse contains a validated products_per_order table with near real-time incremental updates, having these fields:

store_id INT, 
order_id INT, 
product_id INT, 
quantity INT, 
price FLOAT,
order_timestamp TIMESTAMP

Since long-term sales trend reporting is less volatile, analysts only need daily refreshes for this dashboard. Given that the dashboard will be queried interactively by multiple users throughout the day, it must return results quickly while minimizing compute costs per materialization.

What solution would meet these end-user requirements while effectively controlling costs?




Explanation:

The correct solution is A because it aligns with the requirement of refreshing data once daily. Configuring a nightly batch job to overwrite the table ensures that the dashboard uses precomputed aggregated values, which allows queries to return results quickly without incurring high compute costs during frequent interactive queries. Options B (streaming) and C (view) would process data on-the-fly, leading to slower performance and higher compute costs. Option D (Delta Cache) caches raw data but does not precompute aggregations, so it would still require expensive computations for each query.