
Explanation:
In the Databricks Lakehouse architecture, Bronze tables represent the first layer of data processing. The correct relationship between Bronze tables and raw data is:
Bronze tables contain raw data with a schema applied.
Let's analyze each option:
A. Bronze tables contain less data than raw data files. ❌ Incorrect - Bronze tables typically contain the same raw data as the source files, just structured with a schema. They don't necessarily contain less data.
B. Bronze tables contain more truthful data than raw data. ❌ Incorrect - Bronze tables maintain the raw data as-is, so they contain the same level of truthfulness as the original raw data.
C. Bronze tables contain aggregates while raw data is unaggregated. ❌ Incorrect - Bronze tables are not aggregated; they contain the raw, unaggregated data. Aggregation typically happens in Silver or Gold layers.
D. Bronze tables contain a less refined view of data than raw data. ❌ Incorrect - Bronze tables are actually more refined than raw data because they have a schema applied, making the data more structured and queryable.
E. Bronze tables contain raw data with a schema applied. ✅ Correct - This is the accurate description. Bronze tables take raw data files (like JSON, CSV, Parquet, etc.) and apply a schema to make them queryable in a tabular format while preserving the raw data as-is.
This understanding is fundamental to the Databricks Lakehouse architecture where data flows through Bronze → Silver → Gold layers, with each layer adding more value and refinement.
Ultimate access to all questions.
No comments yet.
Which of the following describes the relationship between Bronze tables and raw data?
A
Bronze tables contain less data than raw data files.
B
Bronze tables contain more truthful data than raw data.
C
Bronze tables contain aggregates while raw data is unaggregated.
D
Bronze tables contain a less refined view of data than raw data.
E
Bronze tables contain raw data with a schema applied.