
Ultimate access to all questions.
You are working on a data processing project that involves analyzing real-time streaming data from IoT devices. The data includes time-series data with high velocity and variability. Describe how you would use Apache Spark to create an ETL pipeline for this use case, and explain the considerations involved in handling time-series data.
A
Use Apache Spark's batch processing capabilities to process the time-series data at regular intervals, as real-time processing is not required.
B
Use Apache Spark Streaming to create a real-time ETL pipeline, with appropriate data sources, transformations, and sinks to handle the time-series data efficiently, considering time-window operations and data aggregation.
C
Use a traditional database system to store and process the time-series data, as it can handle high velocity and variability more effectively than Apache Spark.
D
Only process a subset of the time-series data to reduce the velocity and variability, as real-time processing of the entire dataset is not feasible.