Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

Explanation:

The correct answer is D. Enabling caching for the pipeline job will allow you to reuse the preprocessed data for subsequent pipeline executions, which significantly reduces both execution time and cost. This is especially useful for large datasets like the 10 TB of data mentioned in the question. Disabling caching for the model training step ensures that the model is retrained with potentially updated code for different algorithms each time. This approach minimizes pipeline changes while optimizing both execution time and cost.

Explanation:

Comments (0)

No comments yet.

You have created a Vertex AI pipeline comprising two steps: preprocessing and model training. The first step preprocesses 10 TB of data in about 1 hour and saves the result in a Cloud Storage bucket. The second step uses this processed data to train a machine learning model. Now, you need to update the model’s code to test different machine learning algorithms. Your goal is to reduce the pipeline’s execution time and cost, while minimizing changes to the existing pipeline. What should you do?

Exam-Like

Add a pipeline parameter and an additional pipeline step. Depending on the parameter value, the pipeline step conducts or skips data preprocessing, and starts model training.

20.2%

Create another pipeline without the preprocessing step, and hardcode the preprocessed Cloud Storage file location for model training.

6.4%

Configure a machine with more CPU and RAM from the compute-optimized machine family for the data preprocessing step.

4.6%

Enable caching for the pipeline job, and disable caching for the model training step.

68.8%