
Answer-first summary for fast verification
Answer: Python using MapReduce
## Explanation Python using MapReduce is the correct choice for this scenario because: - **Checkpointing and pipeline splitting requirements**: Python provides more flexibility for implementing complex pipeline logic including checkpointing mechanisms and dynamic pipeline splitting - **Development efficiency**: Python is generally easier and faster to write and maintain compared to Java - **ETL pipeline flexibility**: Python allows for more sophisticated data transformation logic and better integration with various data processing patterns - **Apache Hadoop compatibility**: Python can work with Hadoop through frameworks like Hadoop Streaming or PySpark - **Pig and Hive limitations**: While PigLatin and HiveQL are good for specific use cases, they are less flexible for implementing custom checkpointing and complex pipeline splitting logic Python provides the right balance of performance, flexibility, and development efficiency needed for ETL pipelines with checkpointing and splitting requirements on a Hadoop cluster.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
NO.17 You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method should you use to write the pipelines?
A
PigLatin using Pig
B
HiveQL using Hive
C
Java using MapReduce
D
Python using MapReduce