
Ultimate access to all questions.
You are a data engineer working with a large dataset in Delta Lake on Microsoft Azure. Your organization requires efficient archiving or deletion of old data to comply with data retention policies and optimize storage costs. The dataset includes a timestamp column recording when each record was created. Which of the following strategies should you implement to achieve this goal? Choose the best option and explain why it is the most suitable. (Choose one option)
A
Implement a data retention policy that randomly selects records for deletion based on a non-time-based column to ensure fairness in data removal.
B
Use a data retention policy that targets records based on a unique identifier column, assuming that older records have lower unique IDs.
C
Apply a data retention policy that uses the timestamp column to identify and manage records older than a specified retention period for archiving or deletion.
D
Create a data retention policy that focuses on the most frequently accessed records, regardless of their age, to improve query performance.