
Ultimate access to all questions.
Given a requirement to implement necessary logic for deduplication using Spark Structured Streaming, describe the steps you would take to ensure that duplicate records are identified and removed. Include the use of watermarking and the dropDuplicates method.
A
Use the dropDuplicates method without using watermarking.
B
Use watermarking to limit the state store size and the dropDuplicates method to remove duplicate records.
C
Ignore duplicate records and focus only on the current data stream.
D
Use a batch query to handle deduplication.