Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A data engineering team manages an hourly batch job that ingests source system records into a Lakehouse. Given that the ingestion process ensures no data arrives late, and the target table contains millions of accounts but only receives tens of thousands of updates per hour, which approach is the most efficient for updating the `account_current` table with the latest value for each unique `user_id`?

Real Exam

Last updated: January 6, 2026 at 15:41

Perform a full overwrite of the account_current table during each batch by querying the entire account_history table, grouping by user_id, and selecting the maximum last_updated value.

6.7%

Filter the account_history table for records within the most recent hourly window, deduplicate using MAX(last_login) per user_id, and execute a MERGE statement to upsert these changes into the account_current table.

Comments

Loading comments...

Filter the account_history table for the most recent hour's records, ensure deduplication based on username, and then execute a MERGE statement to update the account_current table.

13.3%