
Answer-first summary for fast verification
Answer: Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.
The correct solution is to partition the Delta Lake table by the 'topic' field. This approach allows each topic's data, including 'registration' (containing PII), to reside in separate directories. By partitioning by topic, Access Control Lists (ACLs) can be applied specifically to the 'registration' partition directory to restrict access. Additionally, retention policies can be efficiently enforced by running delete operations targeting only the 'registration' partition where records exceed 14 days. Non-PII topics remain in their own partitions, retained indefinitely. Options A, B, and D fail because deleting all data biweekly (A) violates indefinite retention for non-PII, partitioning by a non-existent 'registration' field (B) is invalid, and isolating storage by Kafka's partition (D) does not align with PII requirements.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
How can you configure a Delta Lake table ingesting records from multiple Kafka topics (with schema: key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG) to meet these requirements:
A
All data should be deleted biweekly; Delta Lake's time travel functionality should be leveraged to maintain a history of non-PII information.
B
Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory.
C
Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.
D
Separate object storage containers should be specified based on the partition field, allowing isolation at the storage level.