
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
In an ETL framework, a Kafka stream acting as an upstream system frequently produces duplicate values within a batch. The streaming query reads from this source and writes to a downstream delta table using the default trigger interval. Given that the upstream system emits data every 20 minutes, which strategy effectively removes duplicates before saving to the downstream delta table while minimizing costs?
In an ETL framework, a Kafka stream acting as an upstream system frequently produces duplicate values within a batch. The streaming query reads from this source and writes to a downstream delta table using the default trigger interval. Given that the upstream system emits data every 20 minutes, which strategy effectively removes duplicates before saving to the downstream delta table while minimizing costs?
Real Exam