
Answer-first summary for fast verification
Answer: Define the new entity as a view to prevent persisting results upon each metric recalculation.
Correct Data engineers should grasp the differences between materializing results in views versus tables on Databricks and how to minimize compute and storage costs accordingly. Opt for a view when: - Your query is straightforward, as views are computed on demand, leading to re-computation with each query. Complex queries with joins and subqueries, if frequently queried, can escalate compute costs. - You aim to cut down on storage expenses, since views don't demand extra storage resources. Choose a gold table when: - The table serves multiple downstream queries, avoiding the need to re-compute complex ad-hoc queries repeatedly. - Query results should be incrementally computed from a continuously or incrementally expanding data source.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
The data engineering team maintains a Silver table named 'sales_cleaned' with sales data appended in near real-time. They aim to create a Gold-layer entity from 'sales_cleaned' to compute the year-to-date (YTD) sales amount, featuring the schema: country_code STRING, category STRING, ytd_total_sales FLOAT, updated TIMESTAMP. The metrics need daily recalculation but are queried frequently by various business teams, prompting a need to minimize costs and latency. Which solution best fits these requirements?
A
Define the new entity as a global temporary view for shared computing resources among notebooks or jobs.
B
Set up a nightly batch job to recalculate metrics and overwrite them in a table with each update.
C
Establish multiple tables, one for each business team, to enable quick and efficient metric queries.
D
Define the new entity as a view to prevent persisting results upon each metric recalculation.
E
All the above solutions are suitable, leveraging Databricks' Delta Caching feature.
No comments yet.