
Answer-first summary for fast verification
Answer: Setting `VACUUM` to retain 0 hours can delete files not part of the vacuumed version, including those from concurrent transactions.
The correct answer highlights the critical behavior of the `VACUUM` command when configured with a 0-hour retention period: it can indiscriminately delete files not part of the current version, affecting even those involved in concurrent transactions. This scenario underscores the importance of setting a retention interval of at least 7 days to safeguard against unintentional data loss, especially in environments with concurrent operations. The other options are incorrect because they either contradict the given scenario (options A, B, C) or lack evidence from the table's history (option D).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A senior data engineer executed a VACUUM orders RETAIN 0 HOURS command on the orders Delta table. During execution, a junior data engineer, unaware of the ongoing task, began ingesting data from a source table with non-zero records into the same orders Delta table. Post-ingestion, the junior engineer found the table empty without any error messages, only to later discover that an insert operation was performed in parallel with the vacuum. The table's history showed only VACUUM and INSERT actions. What could be the reason for this occurrence?
A
The source table was empty, resulting in no data being ingested into the orders Delta table.
B
The spark.databricks.delta.retentionDurationCheck.enabled configuration was set to true.
C
The data ingestion process into the orders Delta table was only partially completed.
D
All records were deleted from the orders Delta table by an unknown party.
E
Setting VACUUM to retain 0 hours can delete files not part of the vacuumed version, including those from concurrent transactions.
No comments yet.