
Answer-first summary for fast verification
Answer: Establish a new Delta table with the updated schema (renamed and additional fields) to serve the application. Simultaneously, create a **view** under the original table name that aliases the new fields back to their original names, ensuring legacy queries continue to function without modification.
### Why This is the Best Practice * **Zero Disruption:** By redirecting the original table name to a **view** that aliases the updated columns back to their original names, existing BI dashboards and ML pipelines continue to run without any code changes. They remain unaware that the underlying physical schema has changed. * **Minimal Overhead:** Using a view is a metadata-only operation. Unlike cloning, it does not duplicate data or increase storage costs. The new Delta table becomes the single source of truth for both the new application and legacy users. * **Decoupling:** This approach decouples the physical storage layer from the logical presentation layer, a standard practice in robust data modeling. ### Why Other Options Fail * **Option B:** Relying on users to manually update their queries or use Time Travel is error-prone and causes significant friction across the organization. * **Option C:** `DEEP CLONE` creates a full physical copy of the data, doubling storage costs and introducing the complexity of keeping two physical tables in sync. * **Option D:** Replacing the primary table with a logical view of the *source logic* (rather than the new table) can break Delta-specific features like **Change Data Feed** or **Time Travel** that downstream consumers might depend on. * **Option E:** Overwriting the table immediately breaks all existing queries that reference the old column names, as Delta Lake enforces schema validation at read/write time.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineering team manages a suite of aggregate tables used by business intelligence dashboards, production machine learning models, and customer-facing applications. New requirements for a specific application necessitate renaming several existing fields and adding new columns to a heavily shared aggregate table.
Which strategy addresses these requirements while ensuring zero disruption to existing downstream consumers and minimizing administrative overhead?
A
Establish a new Delta table with the updated schema (renamed and additional fields) to serve the application. Simultaneously, create a view under the original table name that aliases the new fields back to their original names, ensuring legacy queries continue to function without modification.
B
Apply the schema modifications directly to the existing table and distribute a global notification to all stakeholders providing instructions on how to use Spark SQL aliases or Delta Time Travel to map the new schema back to legacy requirements.
C
Implement the updated schema in a new table and utilize Delta Lake's DEEP CLONE functionality to keep the original and updated tables synchronized, ensuring both sets of requirements are physically persisted.
D
Replace the existing physical table with a logical view containing the original query logic. Concurrently, create a separate physical table with the new schema specifically for the customer-facing application.
E
Perform an in-place OVERWRITE of the table to match the new specifications and update the table's metadata comments to warn users of the breaking schema changes and field renames.