
Answer-first summary for fast verification
Answer: The metadata created by the CLONE operation is referencing data files that were purged as invalid by the VACUUM command.
## Explanation **Correct Answer: D** **Why D is correct:** 1. **SHALLOW CLONE** creates a new table that references the **same underlying data files** as the source table, but with its own separate metadata. 2. When **VACUUM** is run on the source table, it removes data files that are no longer needed based on the retention period (default 7 days). 3. The shallow clone's metadata still points to these data files that have been physically deleted by VACUUM. 4. When queries try to access the cloned table, they fail because the referenced data files no longer exist. **Why other options are incorrect:** **A:** Type 1 SCD behavior (overwriting existing records) doesn't inherently prevent Delta Lake from maintaining consistency for cloned tables. This is unrelated to the VACUUM issue. **B:** VACUUM doesn't automatically invalidate shallow clones. The issue is that VACUUM removes data files that shallow clones depend on, not that it invalidates them automatically. **C:** SHALLOW CLONE tables are not automatically deleted after 7 days. The 7-day reference is likely confusing it with the default retention period for VACUUM. **E:** Running REFRESH won't help because the data files have been physically deleted. REFRESH updates metadata but cannot recreate deleted data files. **Key Learning Points:** - **SHALLOW CLONE** creates a metadata-only copy that references the original data files. - **DEEP CLONE** copies both metadata AND data files, creating an independent copy. - **VACUUM** removes data files older than the retention period (default 7 days). - For development tables that need to persist, use **DEEP CLONE** or ensure VACUUM isn't run on source tables while clones are in use. - Consider adjusting the retention period or using DEEP CLONE for long-term development work.
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both DEEP and SHALLOW CLONE, development tables are created using SHALLOW CLONE.
A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that VACUUM was run the day before.
Which statement describes why the cloned tables are no longer working?
A
Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.
B
Running VACUUM automatically invalidates any shallow clones of a table; DEEP CLONE should always be used when a cloned table will be repeatedly queried.
C
Tables created with SHALLOW CLONE are automatically deleted after their default retention threshold of 7 days.
D
The metadata created by the CLONE operation is referencing data files that were purged as invalid by the VACUUM command.
E
The data files compacted by VACUUM are not tracked by the cloned metadata; running REFRESH on the cloned table will pull in recent changes.