
Explanation:
The most efficient approach is Option B. Since the source system guarantees no late-arriving data, we can significantly reduce compute costs by filtering the account_history table for only the latest hourly window instead of scanning millions of rows.
MAX(last_login) within that filtered slice ensures that if a user has multiple updates within the hour, only the final state is merged into the target.MERGE command allows for efficient updates and inserts in a single ACID-compliant operation.Why other options are incorrect:
username is incorrect because the requirement is to update by user_id. Primary keys should be immutable; human-entered strings like usernames can change or collide.Ultimate access to all questions.
No comments yet.
A data engineering team manages an hourly batch job that ingests source system records into a Lakehouse. Given that the ingestion process ensures no data arrives late, and the target table contains millions of accounts but only receives tens of thousands of updates per hour, which approach is the most efficient for updating the account_current table with the latest value for each unique user_id?
A
Perform a full overwrite of the account_current table during each batch by querying the entire account_history table, grouping by user_id, and selecting the maximum last_updated value.
B
Filter the account_history table for records within the most recent hourly window, deduplicate using MAX(last_login) per user_id, and execute a MERGE statement to upsert these changes into the account_current table.
C
Utilize Auto Loader to monitor the account_history directory and configure a Structured Streaming trigger to process newly detected files into the account_current table using a batch update.
D
Filter the account_history table for the most recent hour's records, ensure deduplication based on username, and then execute a MERGE statement to update the account_current table.
E
Use Delta Lake version history to identify the difference between the two most recent versions of the account_history table and apply those specific changes to the account_current table.