
Answer-first summary for fast verification
Answer: Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
Option B is the correct answer. When using the Reduction Server strategy on Vertex AI for distributed training, the first two worker pools should be configured with GPUs to leverage hardware acceleration for training. The third worker pool, which handles the reduction server tasks, should use the reductionserver container image without accelerators, and it is crucial to choose a machine type that prioritizes bandwidth. This is because the reduction server focuses on communication and gradient reduction, tasks that do not benefit from GPUs or TPUs but do benefit from high network bandwidth.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with training a custom language model for your company using a large dataset. To handle the computational load effectively, you decide to employ the Reduction Server strategy on Google's Vertex AI, which helps optimize bandwidth and latency for multi-node distributed training. You need to configure the worker pools for this distributed training job on Vertex AI. What configuration should you choose for the worker pools to ensure optimal performance?
A
Configure the machines of the first two worker pools to have GPUs, and to use a container image where your training code runs. Configure the third worker pool to have GPUs, and use the reductionserver container image.
B
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
C
Configure the machines of the first two worker pools to have TPUs and to use a container image where your training code runs. Configure the third worker pool without accelerators, and use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
D
Configure the machines of the first two pools to have TPUs, and to use a container image where your training code runs. Configure the third pool to have TPUs, and use the reductionserver container image.