
Ultimate access to all questions.
Your team of data scientists is leveraging a cloud-based backend system to submit training jobs, which has become increasingly complex to administer due to the variety of frameworks used, including Keras, PyTorch, theano, scikit-learn, and custom libraries. You are considering a managed service to streamline this process. The solution must support all these frameworks, minimize administrative overhead, and scale efficiently with the growing complexity and size of training jobs. Additionally, the solution should integrate well with other machine learning services for a comprehensive workflow. Which of the following options would be the most effective solution? (Choose one correct option)
A
Set up a Slurm workload manager to schedule and run jobs on your cloud infrastructure, requiring manual setup and management of resources.
B
Deploy Kubeflow on Google Kubernetes Engine and use TFJob for submitting training jobs, which involves managing Kubernetes clusters and may not support all frameworks natively.
C
Use Vertex AI Training to submit training jobs, which is compatible with any framework, offers managed infrastructure, and integrates seamlessly with other Vertex AI services.
D
Create a centralized repository of VM images on Compute Engine for your team to use, leading to potential scalability issues and increased management overhead.