Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

A nightly Spark batch job ingests Parquet data from an upstream source located at `/mnt/raw_orders/{{date}}`. The job applies `dropDuplicates(["customer_id", "order_id"])` to the incoming DataFrame before writing to the target table `orders` using the `append` mode. If the upstream system occasionally generates duplicate order entries across different batches, how will duplicate records be handled in the target table?

Real Exam

Last updated: January 6, 2026 at 15:41

Existing records in the target table with matching keys will be overwritten by the incoming data.

6.7%

The write job will deduplicate the union of the new data and the existing table data, ensuring the final table remains unique.

Loading comments...