
Ultimate access to all questions.
A data engineering team manages an hourly batch job that ingests source system records into a Lakehouse. Given that the ingestion process ensures no data arrives late, and the target table contains millions of accounts but only receives tens of thousands of updates per hour, which approach is the most efficient for updating the account_current table with the latest value for each unique user_id?
A
Perform a full overwrite of the account_current table during each batch by querying the entire account_history table, grouping by user_id, and selecting the maximum last_updated value.
B
Filter the account_history table for records within the most recent hourly window, deduplicate using MAX(last_login) per user_id, and execute a MERGE statement to upsert these changes into the account_current table.
C
Utilize Auto Loader to monitor the account_history directory and configure a Structured Streaming trigger to process newly detected files into the account_current table using a batch update.
D
Filter the account_history table for the most recent hour's records, ensure deduplication based on username, and then execute a MERGE statement to update the account_current table.
E
Use Delta Lake version history to identify the difference between the two most recent versions of the account_history table and apply those specific changes to the account_current table.