
Explanation:
Delta Lake does not inherently provide a built-in feature for automatic data retention based on time frames. The most effective and scalable approach is to configure a scheduled job that automatically executes the DELETE command on the Delta table according to the retention policy. This method ensures compliance with regulatory requirements, is cost-effective by minimizing storage costs, and reduces manual intervention. Using a workflow orchestration tool like Apache Airflow for scheduling provides reliability and scalability, making it the BEST option among the choices provided.
Ultimate access to all questions.
In the context of managing a Delta table with historical data, your organization requires implementing an automated data retention policy to delete data older than 2 years to comply with regulatory requirements. The solution must be cost-effective, scalable, and minimize manual intervention. Considering these constraints, which of the following approaches BEST leverages Delta Lake's features to achieve this goal? (Choose one option)
A
Manually delete the old data files from the underlying storage system, ensuring to audit the process for compliance.
B
Utilize Delta Lake's time travel feature to automatically purge data older than the specified time frame without additional configuration.
C
Configure a scheduled job using a workflow orchestration tool like Apache Airflow to execute the DELETE command on the Delta table based on the retention policy.
D
Implement a custom script that periodically checks the Delta table for old data and uses the VACUUM command to remove it, bypassing the need for scheduling.
No comments yet.