
Ultimate access to all questions.
Which of the following describes the relationship between Bronze tables and raw data?
A
Bronze tables contain less data than raw data files.
B
Bronze tables contain more truthful data than raw data.
C
Bronze tables contain aggregates while raw data is unaggregated.
D
Bronze tables contain a less refined view of data than raw data.
E
Bronze tables contain raw data with a schema applied.
Explanation:
In the Databricks Lakehouse architecture, Bronze tables represent the first layer of data processing. The correct relationship between Bronze tables and raw data is:
Bronze tables contain raw data with a schema applied.
Let's analyze each option:
A. Bronze tables contain less data than raw data files. ❌ Incorrect - Bronze tables typically contain the same raw data as the source files, just structured with a schema. They don't necessarily contain less data.
B. Bronze tables contain more truthful data than raw data. ❌ Incorrect - Bronze tables maintain the raw data as-is, so they contain the same level of truthfulness as the original raw data.
C. Bronze tables contain aggregates while raw data is unaggregated. ❌ Incorrect - Bronze tables are not aggregated; they contain the raw, unaggregated data. Aggregation typically happens in Silver or Gold layers.
D. Bronze tables contain a less refined view of data than raw data. ❌ Incorrect - Bronze tables are actually more refined than raw data because they have a schema applied, making the data more structured and queryable.
E. Bronze tables contain raw data with a schema applied. ✅ Correct - This is the accurate description. Bronze tables take raw data files (like JSON, CSV, Parquet, etc.) and apply a schema to make them queryable in a tabular format while preserving the raw data as-is.
This understanding is fundamental to the Databricks Lakehouse architecture where data flows through Bronze → Silver → Gold layers, with each layer adding more value and refinement.