
Ultimate access to all questions.
A social media company plans to use a large language model (LLM) from Amazon SageMaker JumpStart to summarize messages and needs to compare the toxicity of the outputs from several candidate LLMs. Which approach minimizes operational overhead for this evaluation?
A
Crowd-sourced evaluation
B
Automatic model evaluation
C
Model evaluation with human workers
D
Reinforcement learning from human feedback (RLHF)