Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


As a Data Scientist at a rapidly growing startup, you are tasked with optimizing the training performance of deep learning models developed in Python using TensorFlow. After implementing caching and prefetching techniques, you decide to leverage GPUs on a single machine to further enhance performance, keeping in mind the constraints of cost efficiency and the need for a flexible environment for experimentation. The startup's projects vary in scale and complexity, requiring a solution that can efficiently utilize multiple GPUs without the overhead of distributed training across multiple machines. Which of the following strategies should you choose to best meet these requirements? (Choose one correct option)




Explanation:

The correct strategy is tf.distribute.MirroredStrategy, which is specifically designed for scenarios where you want to utilize multiple GPUs on a single machine. It creates copies of the model (replicas) on each GPU and efficiently synchronizes the gradients across them, making it ideal for cost reduction and experimentation on a single machine. This approach contrasts with:

  • Option A (tf.distribute.OneDeviceStrategy), which is limited to training on a single device and does not leverage multiple GPUs.
  • Option B (tf.distribute.TPUStrategy), which is optimized for Tensor Processing Units (TPUs) rather than GPUs.
  • Option D (tf.distribute.MultiWorkerMirroredStrategy), which is intended for distributed training across multiple machines, adding unnecessary complexity and overhead for your single-machine scenario.

For further reading, consult the TensorFlow documentation on distributed training strategies to understand their applications and limitations in different contexts.