
Ultimate access to all questions.
You are tasked with pre-training a large language model on Google Cloud. The model includes custom TensorFlow operations within the training loop, and the training process will use a large batch size. This pre-training process is anticipated to take several weeks, so it is crucial to configure a training architecture that minimizes both the training time and compute costs. Considering the available options for distributed strategies and hardware, what approach should you take?
A
Implement 8 workers of a2-megagpu-16g machines by using tf.distribute.MultiWorkerMirroredStrategy.
B
Implement a TPU Pod slice with -accelerator-type=v4-l28 by using tf.distribute.TPUStrategy.
C
Implement 16 workers of c2d-highcpu-32 machines by using tf.distribute.MirroredStrategy.
D
Implement 16 workers of a2-highgpu-8g machines by using tf.distribute.MultiWorkerMirroredStrategy.