AWS Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Your company has a large dataset of IoT device data that needs to be processed in real-time for monitoring and alerting purposes. Describe how you would use Apache Spark to create a streaming ETL pipeline that can handle the high velocity and volume of data, and explain the considerations involved in designing such a pipeline.

Simulated

Last updated: January 31, 2026 at 14:03

Use Apache Spark's batch processing capabilities to process the data at regular intervals, as real-time processing is not required.

0.0%

Use Apache Spark Streaming to create a real-time ETL pipeline, with appropriate data sources, transformations, and sinks to handle the data efficiently.

Comments

Loading comments...

Use a traditional database system to store and process the data, as it can handle high velocity and volume more effectively than Apache Spark.