Google Professional Data Engineer

Get started today

Ultimate access to all questions.

Explanation:

The correct answer is A. PigLatin using Pig is the best choice for writing ETL pipelines on an Apache Hadoop cluster, especially when checkpointing and splitting pipelines are required. Pig is a scripting language specifically designed for such data flow tasks. It allows developers to checkpoint data in the pipeline and supports splitting of pipelines. While other options like Java using MapReduce and Python using MapReduce are powerful and provide low-level control, they are more complex and do not provide the same level of abstraction for these specific tasks.

Explanation:

Comments (0)

No comments yet.

You are tasked with developing ETL (Extract, Transform, Load) pipelines for your company's data processing needs, specifically to be executed on an Apache Hadoop cluster. These pipelines need to support both checkpointing mechanisms, which allow the system to resume from a certain point in case of failures, and the ability to split pipelines into more manageable and logically distinct segments. Given these requirements, which method would be most suitable for writing the ETL pipelines?

Exam-Like

PigLatin using Pig

51.2%

HiveQL using Hive

15.5%

Java using MapReduce

14.3%

Python using MapReduce

19.0%