
Explanation:
Using AWS Glue with an open data lake format (Apache Hudi, Delta Lake, or Apache Iceberg) is the most operationally efficient approach for maintaining historical change records (slowly changing dimensions / CDC) in S3. These formats natively support ACID transactions, upserts, and time-travel queries for full history — without complex streaming infrastructure, multiple Lambda functions, or DynamoDB intermediaries.
Ultimate access to all questions.
Question 28.
A data engineer needs to create a data pipeline to identify and store data that changes over time in a company's Amazon S3 based data lake. The data engineer plans to use the data to create dashboards and reports. The dashboards and reports will include complete historical change records for the data. Which solution will meet these requirements with the LEAST operational effort?
A
Configure a streaming pipeline that ingests the original data into Amazon DynamoDB. Create DynamoDB streams and ingest the streamed data into Amazon Kinesis Data Streams. Use an Amazon Kinesis Data Firehose delivery stream to capture the data from the Kinesis data stream and to write the data to an Amazon S3 bucket. Create an AWS Glue extract, transform, and load (ETL) job to process the changed data with the original file in the data lake to maintain historical changes.
B
Use an AWS Glue extract, transform, and load (ETL) job to compare the original data with the complete file that is in the Amazon S3 data lake. Perform update actions or insert actions based on the changes.
C
Configure a data pipeline that uses an AWS Glue extract, transform, and load (ETL) job. Use an open data lake format to process the data. Use condition checks in the AWS Glue ETL job to perform update actions or insert actions based on the existing row or new row checks.
D
Compare the original data with the complete file that is in the Amazon S3 data lake by using an AWS Lambda function to identify changes. Use a second Lambda function to process the changes and to perform update actions or insert actions based on the existing row or new row checks in the Amazon S3 data lake.
No comments yet.