
Ultimate access to all questions.
Your company has a large dataset of IoT device data that needs to be processed in real-time for monitoring and alerting purposes. Describe how you would use Apache Spark to create a streaming ETL pipeline that can handle the high velocity and volume of data, and explain the considerations involved in designing such a pipeline.
A
Use Apache Spark's batch processing capabilities to process the data at regular intervals, as real-time processing is not required.
B
Use Apache Spark Streaming to create a real-time ETL pipeline, with appropriate data sources, transformations, and sinks to handle the data efficiently.
C
Use a traditional database system to store and process the data, as it can handle high velocity and volume more effectively than Apache Spark.
D
Only process a subset of the data to reduce the volume and velocity, as real-time processing of the entire dataset is not feasible.