An upstream system writes Parquet data in hourly batches to date-named directories. A nightly batch job processes the previous day's data (specified by the `date` variable) using this code: ```python (spark.read .format("parquet") .load(f"/mnt/raw_orders/{date}") .dropDuplicates(["customer_id", "order_id"]) .write .mode("append") .saveAsTable("orders")) ``` Given that `customer_id` and `order_id` form a composite key for unique order identification, and the upstream system sometimes generates duplicate entries for the same order hours apart, which statement is accurate? | Databricks Certified Data Engineer - Professional Quiz - LeetQuiz