LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


A task orchestrator is configured to execute two hourly tasks. First, an external system writes Parquet data to a mounted directory at /mnt/raw_orders/. Following this data write, a Databricks job runs the following code:

(spark.readStream
  .format("parquet")
  .load("/mnt/raw_orders/")
  .withWatermark("time", "2 hours")
  .dropDuplicates(["customer_id", "order_id"])
  .writeStream
  .trigger(once=True)
  .table("orders"))

Given that customer_id and order_id form a composite key to uniquely identify orders, and the time field represents when the record was queued in the source system, which statement is true if the upstream system occasionally enqueues duplicate entries for the same order hours apart?

Exam-Like



Powered ByGPT-5