Ultimate access to all questions.
You have developed a Transformer model in TensorFlow for text translation. Your training data comprises millions of documents stored in a Cloud Storage bucket. To reduce training time, you aim to use distributed training. Additionally, you want to minimize the effort required for modifying the existing code and managing the cluster's configuration. Given these requirements and the need to effectively handle large-scale data, which approach should you choose?