
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy. Which metric should the company use for the evaluation?
A
Precision
B
Time to first token
C
F1 score
D
Word error rate
Explanation:
For evaluating the accuracy of a fine-tuned LLM for a help desk question-answering system, the F1 score is the most appropriate metric among the given options.
For LLM evaluation in question-answering tasks, other relevant metrics might include:
However, among the given options, F1 score is the most comprehensive metric for evaluating accuracy improvements from fine-tuning.