Ultimate access to all questions.
You are training an object detection model using a Cloud TPU v2. However, you have observed that the training time is taking longer than expected. You obtained a simplified trace through the Cloud TPU profile and noticed that the compute time is relatively low compared to the HostToDevice and DeviceToHost time, suggesting a potential data transfer bottleneck. Based on this information, what action should you take to decrease the training time in a cost-efficient way?