Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


In an ETL framework, a Kafka stream acting as an upstream system frequently produces duplicate values within a batch. The streaming query reads from this source and writes to a downstream delta table using the default trigger interval. Given that the upstream system emits data every 20 minutes, which strategy effectively removes duplicates before saving to the downstream delta table while minimizing costs?