
Answer-first summary for fast verification
Answer: Implement a nightly batch job to pre-calculate the required metrics and store them in a summary table, overwriting the data with each update.
The most cost-effective and performant approach is to implement a **nightly batch job** that overwrites a pre-aggregated summary (Gold layer) table. **Why this is the best choice:** * **Performance:** By pre-calculating metrics like QTD, YTD, and 7-day averages, the dashboard queries a small, highly aggregated table rather than scanning the raw `sales_details` table. This ensures millisecond response times even under high user concurrency. * **Cost:** Since the business only requires daily updates, a single batch run per day is significantly cheaper than maintaining a continuous Structured Streaming cluster or triggering compute resources for every dashboard refresh. * **Best Practices:** Databricks recommends serving BI tools from highly aggregated Gold tables to minimize compute costs and maximize concurrency support. **Analysis of other options:** * **Structured Streaming:** Incurs unnecessary costs for near-real-time updates that the business constraints explicitly state are not needed. * **Webhook Refresh:** High user concurrency would lead to redundant compute triggers, resulting in high costs and potential performance bottlenecks. * **Delta Cache:** While helpful for caching, it still requires the SQL Warehouse to perform complex aggregations on raw data for every query, which is less efficient than reading pre-computed results. * **Materialized View:** While a valid technical option, a simple batch ETL process is a more standard, universally available, and cost-predictable method for daily refreshes in the context of the Professional certification scope.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A Business Intelligence (BI) team is developing a dashboard to monitor key retail sales metrics. The dashboard must display historical trends, including quarterly and yearly totals (and daily averages), sales for the previous day, and a rolling 7-day average.
The source data is a Lakehouse table, sales_details, which receives near real-time updates. It contains columns such as store_id, order_id, product_id, quantity, price, and order_timestamp.
The following constraints apply:
Which architectural approach best balances performance, user experience, and cost-effectiveness for this scenario?
A
Use Structured Streaming to build a live dashboard that queries the sales_details table directly to ensure data is always current.
B
Implement a nightly batch job to pre-calculate the required metrics and store them in a summary table, overwriting the data with each update.
C
Create a Materialized View on the sales_details table and use this view as the primary source for the dashboard.
D
Configure a webhook to trigger an incremental update on the sales_details table whenever the dashboard is refreshed by a user.
E
Leverage Delta Cache to store the entire sales_details table in memory on the SQL Warehouse for faster query execution.