
Answer-first summary for fast verification
Answer: Leverage AWS Kinesis Data Streams to capture and process the real-time streaming data, and use AWS Kinesis Data Firehose to load the data into the data lake and update the data catalog.
To ensure real-time ingestion of streaming data into an AWS data lake and update the metadata in the AWS Glue Data Catalog, you should leverage AWS Kinesis Data Streams for capturing and processing the data, and AWS Kinesis Data Firehose for loading the data into the data lake. This approach provides real-time data ingestion and can be integrated with AWS Glue to update the data catalog. While AWS Glue crawlers, Lambda functions, and Glue jobs can be used for batch processing, they may not be suitable for real-time streaming data sources.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are responsible for designing a data pipeline that ingests data from multiple sources into an AWS data lake. One of the sources is a real-time streaming data source. How can you ensure that the data is ingested into the data lake in real-time, and the metadata is updated in the AWS Glue Data Catalog?
A
Use AWS Glue crawlers to periodically scan the streaming data source and update the data catalog.
B
Implement an AWS Lambda function that triggers on new data in the streaming data source and updates the data catalog using the AWS Glue API.
C
Create an AWS Glue job that continuously monitors the streaming data source and ingests data into the data lake, updating the data catalog as needed.
D
Leverage AWS Kinesis Data Streams to capture and process the real-time streaming data, and use AWS Kinesis Data Firehose to load the data into the data lake and update the data catalog.