When qualitatively evaluating LLM responses for a translation use case, which metric should be used to assess the safety of the outputs? | Databricks Certified Generative AI Engineer - Associate Quiz - LeetQuiz