
Answer-first summary for fast verification
Answer: PigLatin using Pig
The correct answer is A. PigLatin using Pig is the best choice for writing ETL pipelines on an Apache Hadoop cluster, especially when checkpointing and splitting pipelines are required. Pig is a scripting language specifically designed for such data flow tasks. It allows developers to checkpoint data in the pipeline and supports splitting of pipelines. While other options like Java using MapReduce and Python using MapReduce are powerful and provide low-level control, they are more complex and do not provide the same level of abstraction for these specific tasks.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with developing ETL (Extract, Transform, Load) pipelines for your company's data processing needs, specifically to be executed on an Apache Hadoop cluster. These pipelines need to support both checkpointing mechanisms, which allow the system to resume from a certain point in case of failures, and the ability to split pipelines into more manageable and logically distinct segments. Given these requirements, which method would be most suitable for writing the ETL pipelines?
A
PigLatin using Pig
B
HiveQL using Hive
C
Java using MapReduce
D
Python using MapReduce
No comments yet.