
Answer-first summary for fast verification
Answer: Configure a scheduled job using a workflow orchestration tool like Apache Airflow to execute the DELETE command on the Delta table based on the retention policy.
Delta Lake does not inherently provide a built-in feature for automatic data retention based on time frames. The most effective and scalable approach is to configure a scheduled job that automatically executes the DELETE command on the Delta table according to the retention policy. This method ensures compliance with regulatory requirements, is cost-effective by minimizing storage costs, and reduces manual intervention. Using a workflow orchestration tool like Apache Airflow for scheduling provides reliability and scalability, making it the BEST option among the choices provided.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of managing a Delta table with historical data, your organization requires implementing an automated data retention policy to delete data older than 2 years to comply with regulatory requirements. The solution must be cost-effective, scalable, and minimize manual intervention. Considering these constraints, which of the following approaches BEST leverages Delta Lake's features to achieve this goal? (Choose one option)
A
Manually delete the old data files from the underlying storage system, ensuring to audit the process for compliance.
B
Utilize Delta Lake's time travel feature to automatically purge data older than the specified time frame without additional configuration.
C
Configure a scheduled job using a workflow orchestration tool like Apache Airflow to execute the DELETE command on the Delta table based on the retention policy.
D
Implement a custom script that periodically checks the Delta table for old data and uses the VACUUM command to remove it, bypassing the need for scheduling.