Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

For evaluating LLM safety in translation use cases, accuracy and relevance (Option D) are the most appropriate indicators. Safety concerns in translation include generating harmful, biased, or inappropriate content, which directly relates to whether the translated output accurately conveys the original meaning without introducing unsafe elements. While latency and response length (Option C) are performance metrics, they don't directly address safety. Code generation (Option A) is irrelevant to translation tasks. Similarity to previous language (Option B) relates more to consistency than safety. The community discussion with 100% consensus on Option D reinforces that accuracy and relevance are fundamental to assessing whether translations maintain appropriate, non-harmful content.

Explanation:

Comments (0)

No comments yet.

When qualitatively evaluating LLM responses for a translation use case, which metric should be used to assess the safety of the outputs?

Exam-Like

Last updated: March 1, 2026 at 14:03

The ability to generate responses in code

2.7%

The similarity to the previous language

11.7%

The latency of the response and the length of text generated

4.5%

The accuracy and relevance of the responses

81.1%