
Ultimate access to all questions.
Which metric should the education company use to evaluate whether its custom LLM's responses match the creative spelling and shortened words typical of teenage language?
A
F1 score
B
BERTScore
C
Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
D
Bilingual Evaluation Understudy (BLEU) score