
Ultimate access to all questions.
In the context of managing large datasets in Delta Lake on Azure Databricks, consider a scenario where a data engineer needs to archive historical data older than 3 years to comply with data retention policies while ensuring minimal impact on query performance for current data. The dataset is partitioned by year and month. Which of the following approaches BEST leverages Delta Lake's partitioning feature to achieve this goal efficiently? Choose the correct option and explain why it is the best choice.
A
Implementing a custom script to scan and delete individual records older than 3 years, ensuring no current data is affected.
B
Using Delta Lake's VACUUM feature to automatically remove files not referenced by the Delta table, including those older than 3 years.
C
Identifying and dropping entire partitions that contain data older than 3 years, leveraging the partitioning schema for efficient data isolation.
D
Creating a separate Delta table for historical data and manually moving records older than 3 years to this table.