
Answer-first summary for fast verification
Answer: Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
The correct answer is B. Replacing the NVIDIA P100 GPU with a v3-32 TPU can significantly speed up the training process. TPUs are specialized hardware designed by Google specifically for machine learning tasks, and they can handle large datasets and complex computations more efficiently compared to traditional GPUs. This setup will help you reduce training time without sacrificing model performance. Option A, while increasing memory and batch size, might not address the core computational limit. Option C, enabling early stopping, is useful for preventing overfitting but won't inherently speed up the training process. Option D, using the tf.distribute.Strategy API for distributed training, is complicated and better suited for setups with multiple GPUs, which isn't the case here.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on a machine learning project that involves training an object detection model on a large dataset comprising three million X-ray images, each approximately 2 GB in size. To train the model, you are utilizing Google Cloud's Vertex AI Training service with a custom training application running on a Compute Engine instance. The instance is equipped with 32 CPU cores, 128 GB of RAM, and a single NVIDIA P100 GPU. Despite this setup, the model training process is taking an excessively long time. Your goal is to reduce the training duration without compromising the model's performance. What action should you take?
A
Increase the instance memory to 512 GB and increase the batch size.
B
Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
C
Enable early stopping in your Vertex AI Training job.
D
Use the tf.distribute.Strategy API and run a distributed training job.
No comments yet.