
Ultimate access to all questions.
You are tasked with designing an ETL pipeline for a large e-commerce company that needs to process both structured and unstructured data from various sources. The data includes customer transactions, product reviews, and social media posts. Describe the steps you would take to create an ETL pipeline that can handle the volume, velocity, and variety of this data, and explain how you would use Apache Spark to process the data efficiently.
A
Use a single-stage ETL process to load all data into a data warehouse and then perform transformations and analysis.
B
Create a multi-stage ETL pipeline with intermediate data staging locations to handle the different data types and sources, and use Apache Spark to process the data in a distributed manner.
C
Only process structured data and ignore unstructured data due to the complexity of handling different data types.
D
Use a traditional batch processing approach to handle the data, as it is more cost-effective than using a distributed computing framework like Apache Spark.