Databricks Certified Data Engineer - Professional

Ultimate access to all questions.

Given a requirement to implement necessary logic for deduplication using Spark Structured Streaming, describe the steps you would take to ensure that duplicate records are identified and removed. Include the use of watermarking and the `dropDuplicates` method.

Simulated

Use the dropDuplicates method without using watermarking.

10.8%

Use watermarking to limit the state store size and the dropDuplicates method to remove duplicate records.

Loading comments...