
Explanation:
The question asks for an appropriate model evaluation strategy to assess the accuracy of machine-generated translations from English to other languages using LLMs. The key requirement is evaluating translation accuracy by examining the generated text.
A: Bilingual Evaluation Understudy (BLEU)
B: Root Mean Squared Error (RMSE)
C: Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
D: F1 Score
BLEU (Option A) is the most appropriate evaluation strategy because it's specifically designed for machine translation quality assessment. It provides an objective, standardized method to compare machine-generated translations against human reference translations, which aligns perfectly with the company's need to evaluate translation accuracy. The other metrics are designed for different types of machine learning tasks and are not suitable for evaluating translation quality.
Ultimate access to all questions.
No comments yet.
A company uses generative AI with large language models (LLMs) to translate training manuals from English into other languages. They need to assess the accuracy of the generated translated text. Which model evaluation approach satisfies this need?
A
Bilingual Evaluation Understudy (BLEU)
B
Root mean squared error (RMSE)
C
Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
D
F1 score