
Answer-first summary for fast verification
Answer: Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.
The correct answer is D. BigQuery is a fully managed, serverless data warehouse that supports SQL syntax, making it an optimal choice for structured data transformations. Given the requirement to speed up the pipeline run time and use a serverless tool with SQL syntax, loading data into BigQuery and performing transformations there would be the most efficient solution. Option A involving Data Fusion does not meet the SQL syntax requirement and requires more management compared to BigQuery. Option B uses Dataproc, which is not a serverless solution. Option C involves using Cloud SQL, which also is not serverless and less efficient compared to directly using BigQuery.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on rebuilding your machine learning (ML) pipeline for structured data using Google Cloud. Currently, you utilize PySpark to perform data transformations at scale. However, these pipelines are becoming inefficient, taking over 12 hours to run. To enhance both development speed and pipeline execution time, you prefer to adopt a serverless tool that supports SQL syntax. Additionally, you have already migrated your raw data into Google Cloud Storage. Considering these requirements and constraints, how should you design the new ML pipeline on Google Cloud to optimize both speed and processing efficiency?
A
Use Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.
B
Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
C
Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
D
Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.
No comments yet.