
Explanation:
Amazon Kinesis Data Streams provides at-least-once delivery semantics. To achieve exactly-once processing across the entire pipeline, the standard architectural pattern is to embed a unique primary key or ID at the producer level (in the source) and then have the consuming application use that ID to perform deduplication (idempotent processing).
Ultimate access to all questions.
Question 21
A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company's application uses the PutRecord action to send data to Kinesis Data Streams. A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline. Which solution will meet this requirement?
A
Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
B
Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events.
C
Design the data source so events are not ingested into Kinesis Data Streams multiple times.
D
Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.
No comments yet.