
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
You are managing a production Delta Lake table, sales_data, which is used by multiple downstream analytics jobs. The table is partitioned by region and sale_date. Recently, your team needs to ingest new data files that include two additional columns: discount_code (string) and promotion_flag (boolean), which were not present in the original schema.
Requirements:
Zero Downtime: The table must remain available for reads and writes during the schema update.
Data Consistency: All new data ingested must include the new columns, and existing queries should not break.
Auditability: You must be able to track when and how the schema was changed.
Best Practices: The solution should follow Delta Lake best practices for schema evolution and production data management.
Which approach best satisfies all requirements? Select the best answer and explain why the other options are less suitable.
You are managing a production Delta Lake table, sales_data, which is used by multiple downstream analytics jobs. The table is partitioned by region and sale_date. Recently, your team needs to ingest new data files that include two additional columns: discount_code (string) and promotion_flag (boolean), which were not present in the original schema.
Requirements:
Zero Downtime: The table must remain available for reads and writes during the schema update.
Data Consistency: All new data ingested must include the new columns, and existing queries should not break.
Auditability: You must be able to track when and how the schema was changed.
Best Practices: The solution should follow Delta Lake best practices for schema evolution and production data management.
Which approach best satisfies all requirements? Select the best answer and explain why the other options are less suitable.
Exam-Like
Explanation:
Use the ALTER TABLE sales_data ADD COLUMNS SQL command to add the new columns, then ingest the new data files.
Reference Explanation:
Zero Downtime: ALTER TABLE ... ADD COLUMNS is an online operation in Delta Lake, so the table remains available for reads and writes.
Data Consistency: Adding columns explicitly ensures the schema is updated before new data is ingested, preventing schema mismatch errors.
Auditability: Delta Lake transaction logs record all schema changes, so you can track when and how the schema was altered.
Best Practices: Explicit schema management is recommended in production to avoid accidental schema drift and maintain control over table structure.
Why other options are less suitable:
B: While mergeSchema can automatically evolve the schema, it is less auditable and can introduce accidental schema changes if not carefully managed, which is not recommended for production.
C: Dropping and recreating the table causes downtime, risks data loss, and is not necessary for adding columns.
D: Maintaining two tables and unioning them complicates data management, increases maintenance overhead, and is not scalable for production.