A nightly batch job ingests data files from a cloud object storage container with a nested directory structure (YYYY/MM/DD), where each date's data contains records processed by the source system on that date (some records may be delayed due to moderator approval). Each record represents a user review with the schema: ``` user_id STRING, review_id BIGINT, product_id BIGINT, review_timestamp TIMESTAMP, review_text STRING ``` The ingestion job appends the previous day's data to a target table `reviews_raw` (same schema as source). The next pipeline step performs a batch write to propagate only new records from `reviews_raw` to a deduplicated, validated, and enriched table. **Which solution minimizes compute costs for propagating this batch of data?** | Databricks Certified Data Engineer - Professional Quiz

A nightly batch job ingests data files from a cloud object storage container with a nested directory structure (YYYY/MM/DD), where each date's data contains records processed by the source system on that date (some records may be delayed due to moderator approval). Each record represents a user review with the schema:

user_id STRING,  
review_id BIGINT,  
product_id BIGINT,  
review_timestamp TIMESTAMP,  
review_text STRING

The ingestion job appends the previous day's data to a target table reviews_raw (same schema as source). The next pipeline step performs a batch write to propagate only new records from reviews_raw to a deduplicated, validated, and enriched table.

Which solution minimizes compute costs for propagating this batch of data?

Exam-Like

Databricks Certified Data Engineer - Professional

Get started today

Comments