Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are working on rebuilding your machine learning (ML) pipeline for structured data using Google Cloud. Currently, you utilize PySpark to perform data transformations at scale. However, these pipelines are becoming inefficient, taking over 12 hours to run. To enhance both development speed and pipeline execution time, you prefer to adopt a serverless tool that supports SQL syntax. Additionally, you have already migrated your raw data into Google Cloud Storage. Considering these requirements and constraints, how should you design the new ML pipeline on Google Cloud to optimize both speed and processing efficiency?

Exam-Like

Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.

9.7%

Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.

Comments

Loading comments...

Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.

61.1%