Google Professional Data Engineer

Ultimate access to all questions.

You are tasked with creating a cloud-native historical data processing system with the following requirements in mind:

The data to be processed exists in CSV, Avro, and PDF formats.

This data will be accessed by various analysis tools such as Dataproc, BigQuery, and Compute Engine.

A batch pipeline will be used to move data on a daily basis.

Performance considerations are not a priority for this solution.

The design should prioritize maximum availability.

How would you structure the data storage to meet these needs?

Exam-Like

Create a Dataproc cluster with high availability. Store the data in HDFS, and perform analysis as needed.

6.4%

Store the data in BigQuery. Access the data using the BigQuery Connector on Dataproc and Compute Engine.

10.6%

Loading comments...