A social media company plans to use a large language model (LLM) from Amazon SageMaker JumpStart to summarize messages and needs to compare the toxicity of the outputs from several candidate LLMs. Which approach minimizes operational overhead for this evaluation? | AWS Certified AI Practitioner Quiz - LeetQuiz