
Ultimate access to all questions.
You are an ML researcher at an investment bank experimenting with the Gemma large language model (LLM) for an internal use case. You require full control over the model's underlying infrastructure and need to minimize the model's inference time. Which serving configuration should you use?
A
Deploy the model on a Vertex AI endpoint manually by creating a custom inference container.
B
Deploy the model on a Google Kubernetes Engine (GKE) cluster by using the deployment options in Model Garden.
C
Deploy the model on a Vertex AI endpoint by using one-click deployment in Model Garden.
D
Deploy the model on a Google Kubernetes Engine (GKE) cluster manually by cresting a custom yaml manifest.