LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


An upstream system writes Parquet data in hourly batches to date-named directories. A nightly batch job processes the previous day's data (specified by the date variable) using this code:

(spark.read
    .format("parquet")
    .load(f"/mnt/raw_orders/{date}")
    .dropDuplicates(["customer_id", "order_id"])
    .write
    .mode("append")
    .saveAsTable("orders"))

Given that customer_id and order_id form a composite key for unique order identification, and the upstream system sometimes generates duplicate entries for the same order hours apart, which statement is accurate?

Exam-Like



Powered ByGPT-5