
Answer-first summary for fast verification
Answer: Delta Lake time travel is not efficient in terms of cost or latency when used for long-term auditing and data retention.
The correct answer is **B**. While Delta Lake’s **time travel** feature allows querying of prior table versions, it is intended for short-term retention. The default retention period is 7 days for data files and 30 days for transaction logs. Using time travel for **long-term auditing** requires significantly increasing these retention settings, which leads to: 1. **Increased Storage Costs:** Every Parquet file and log entry must be kept indefinitely, leading to massive storage bloat. 2. **Performance Degradation:** As the transaction log grows, operations like querying history or performing a `RESTORE` become increasingly expensive and slow. In contrast, an **SCD Type 2** design allows you to explicitly model history using 'effective' and 'expiry' timestamps. This approach allows for efficient indexing and partitioning, providing performant, cost-controlled access to historical data without the overhead of an unbounded transaction log. **Why the other options are incorrect:** * **A:** Shallow clones copy metadata but do not solve the underlying performance issues or log bloat associated with long-term time travel. * **C:** This is false. Delta Lake never modifies files in place (it uses a copy-on-write or merge-on-read mechanism), meaning time travel is technically possible for any version still within the retention period. * **D:** This is false. Delta Lake transactions are atomic, ensuring that multi-field updates in an SCD Type 2 table either succeed entirely or fail entirely, preventing corruption.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data architect is considering using Delta Lake’s time travel feature for long-term auditing of valid street addresses in a customers table. They plan to implement a Type 1 table architecture (where records are overwritten) and rely on time travel to retrieve historical states. A data engineer suggests that a Slowly Changing Dimension (SCD) Type 2 table would be a more performant and scalable solution. Which of the following statements provides the critical justification for the data engineer's suggestion?
A
Shallow clones can be utilized in conjunction with Type 1 tables to significantly accelerate historical queries for long-term versioning.
B
Delta Lake time travel is not efficient in terms of cost or latency when used for long-term auditing and data retention.
C
Time travel cannot access previous versions of Type 1 tables because Delta Lake updates modify the underlying data files in place.
D
SCD Type 2 tables require updating multiple fields in a single operation, which increases the risk of data corruption if a query fails during a partial update.
No comments yet.