
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
The business intelligence team has a dashboard tracking retail store metrics with the following schema:
store_id INT,
total_sales_gtd FLOAT,
avg_daily_sales_gtd FLOAT,
total_sales_ytd FLOAT,
avg_daily_sales_ytd FLOAT,
previous_day_sales FLOAT,
total_sales_7d FLOAT,
avg_daily_sales_7d FLOAT,
updated TIMESTAMP
The Lakehouse contains a validated products_per_order
table with near real-time incremental updates, having these fields:
store_id INT,
order_id INT,
product_id INT,
quantity INT,
price FLOAT,
order_timestamp TIMESTAMP
Since long-term sales trend reporting is less volatile, analysts only need daily refreshes for this dashboard. Given that the dashboard will be queried interactively by multiple users throughout the day, it must return results quickly while minimizing compute costs per materialization.
What solution would meet these end-user requirements while effectively controlling costs?
The business intelligence team has a dashboard tracking retail store metrics with the following schema:
store_id INT,
total_sales_gtd FLOAT,
avg_daily_sales_gtd FLOAT,
total_sales_ytd FLOAT,
avg_daily_sales_ytd FLOAT,
previous_day_sales FLOAT,
total_sales_7d FLOAT,
avg_daily_sales_7d FLOAT,
updated TIMESTAMP
The Lakehouse contains a validated products_per_order
table with near real-time incremental updates, having these fields:
store_id INT,
order_id INT,
product_id INT,
quantity INT,
price FLOAT,
order_timestamp TIMESTAMP
Since long-term sales trend reporting is less volatile, analysts only need daily refreshes for this dashboard. Given that the dashboard will be queried interactively by multiple users throughout the day, it must return results quickly while minimizing compute costs per materialization.
What solution would meet these end-user requirements while effectively controlling costs?
Explanation:
The correct solution is A because it aligns with the requirement of refreshing data once daily. Configuring a nightly batch job to overwrite the table ensures that the dashboard uses precomputed aggregated values, which allows queries to return results quickly without incurring high compute costs during frequent interactive queries. Options B (streaming) and C (view) would process data on-the-fly, leading to slower performance and higher compute costs. Option D (Delta Cache) caches raw data but does not precompute aggregations, so it would still require expensive computations for each query.