
Ultimate access to all questions.
A company uses generative AI with large language models (LLMs) to translate training manuals from English into other languages. They need to assess the accuracy of the generated translated text. Which model evaluation approach satisfies this need?
A
Bilingual Evaluation Understudy (BLEU)
B
Root mean squared error (RMSE)
C
Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
D
F1 score