
Ultimate access to all questions.
An hourly batch job ingests data files from a cloud object storage container, with each batch containing all records generated by the source system within a given hour. The job is delayed sufficiently to account for late-arriving data. The schema includes: user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT, where user_id is the unique key.
All new records are loaded into the account_history table, which retains the complete history in the same schema. The account_current table is a Type 1 table storing only the latest record per user_id.
Given millions of user accounts and tens of thousands of hourly records, what is the most efficient method to update the account_current table during each batch job?