
Answer-first summary for fast verification
Answer: Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
Amazon Kinesis Data Streams provides at-least-once delivery. Network timeouts or retries during the PutRecord action can result in the same record being sent multiple times. To achieve exactly-once processing semantics, you must embed a unique identifier (like a UUID) into each record at the data producer side, so the consumer application can identify and discard duplicates.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 21 A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company's application uses the PutRecord action to send data to Kinesis Data Streams. A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline. Which solution will meet this requirement?
A
Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
B
Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events.
C
Design the data source so events are not ingested into Kinesis Data Streams multiple times.
D
Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.
No comments yet.