Ultimate access to all questions.
How can you configure a Delta Lake table ingesting records from multiple Kafka topics (with schema: key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG) to meet these requirements:
Explanation:
The correct solution is to partition the Delta Lake table by the 'topic' field. This approach allows each topic's data, including 'registration' (containing PII), to reside in separate directories. By partitioning by topic, Access Control Lists (ACLs) can be applied specifically to the 'registration' partition directory to restrict access. Additionally, retention policies can be efficiently enforced by running delete operations targeting only the 'registration' partition where records exceed 14 days. Non-PII topics remain in their own partitions, retained indefinitely. Options A, B, and D fail because deleting all data biweekly (A) violates indefinite retention for non-PII, partitioning by a non-existent 'registration' field (B) is invalid, and isolating storage by Kafka's partition (D) does not align with PII requirements.