Ultimate access to all questions.
You recently developed a deep learning model using the Keras API. Initially, you trained the model using a single GPU, but found the training process to be too slow. To address this, you then used TensorFlow's tf.distribute.MirroredStrategy to distribute the training across 4 GPUs without making any other changes. Despite this, you did not observe any decrease in training time. Considering this, what should you do next to effectively improve the training time?