Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

You are a Machine Learning Engineer at a retail company looking to optimize the ML pipeline for processing structured sales data on Google Cloud. The current pipeline uses PySpark for data transformations after moving raw data into Cloud Storage, but it's taking over 12 hours to run, which is not meeting the business's need for near-real-time analytics. The company requires a serverless solution that leverages SQL syntax for ease of use and efficiency. Additionally, the solution must be cost-effective, scalable, and minimize operational overhead. Which of the following approaches would best meet these requirements? (Choose one correct option)

Real Exam

Design the transformation pipelines using Data Fusion‘s GUI for a no-code solution, then store the processed data in BigQuery for analytics.

7.1%

Use BigQuery Load to transfer your data into BigQuery, convert your PySpark commands into BigQuery SQL queries for transformations, and save the results to a new table.

Comments

Loading comments...