
Answer-first summary for fast verification
Answer: Create a multi-stage ETL pipeline with intermediate data staging locations to handle the different data types and sources, and perform transformations at each stage.
Option B is the correct answer. A multi-stage ETL pipeline with intermediate data staging locations is necessary to handle the different data types and sources. Intermediate data staging locations allow for data to be processed and transformed at each stage, making it easier to manage and optimize the pipeline. Ignoring unstructured data or using a single-stage ETL process would not be sufficient for the given requirements. Traditional batch processing may not be able to handle the velocity and volume of the data effectively.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are working on a data processing project that involves analyzing social media posts to identify trends and sentiment. The data includes a mix of structured and unstructured data, with high velocity and volume. Describe how you would design an ETL pipeline to handle this data, and explain the role of intermediate data staging locations in the pipeline.
A
Use a single-stage ETL process to load the data directly into a data lake and perform all transformations and analysis there.
B
Create a multi-stage ETL pipeline with intermediate data staging locations to handle the different data types and sources, and perform transformations at each stage.
C
Only process structured data and ignore unstructured data due to the complexity of handling different data types.
D
Use a traditional batch processing approach to handle the data, as it is more cost-effective than using a distributed computing framework like Apache Spark.