
Ultimate access to all questions.
In the context of implementing the medallion architecture on the Databricks Lakehouse Platform for processing IoT sensor data, consider the following scenario: A data engineering team has successfully ingested raw data into bronze tables, processed and enriched it into silver tables, and finally aggregated it into gold tables for business reporting. Given the need to optimize for cost, compliance with data governance policies, and scalability for future growth, which of the following statements best describes the differences between silver and gold tables in this architecture, and correctly identifies which workloads would typically use bronze and gold tables as sources? Choose the best option from the four provided.
A
Silver tables are created by applying business logic and aggregations to bronze tables, making them suitable for business reporting. Gold tables are created by further cleaning and enriching silver tables, making them suitable for ad-hoc analytics. Both batch and real-time workloads typically use silver tables as sources, while gold tables are rarely used as sources.
B
Silver tables are optimized for machine learning and are created by aggregating data from bronze tables. Gold tables are optimized for analytics and reporting and are created by further enriching silver tables. Real-time streaming workloads typically use bronze tables as sources, while batch reporting workloads use gold tables as sources.
C
Silver tables are derived from bronze tables and contain raw, unprocessed data for data science experimentation. Gold tables are derived from silver tables and contain cleaned, enriched, and aggregated data for operational reporting. Batch ETL jobs typically use gold tables as sources, while real-time analytics workloads use bronze tables as sources.
D
Silver tables are created by cleaning and enriching data from bronze tables and are typically used as sources for ad-hoc analytics and data exploration. Gold tables are created by aggregating and applying business logic to silver tables, and are typically used as sources for business intelligence dashboards and machine learning models. Batch ETL jobs often use bronze tables as sources, while both batch and real-time analytics workloads use gold tables as sources.
Explanation:
Option D accurately describes the medallion architecture's approach to data processing and the typical use cases for silver and gold tables. Silver tables serve as a refined version of the raw data in bronze tables, suitable for analytics and exploration, while gold tables are optimized for business intelligence and machine learning, containing aggregated and business-ready data. This option also correctly identifies that batch ETL jobs often start with bronze tables, and gold tables can support both batch and real-time analytics workloads, aligning with best practices for scalability and compliance. Reference: https://docs.databricks.com/aws/en/lakehouse/medallion