
Answer-first summary for fast verification
Answer: Model C
**Model C has the highest scores across all ROUGE metrics (ROUGE-1: 0.62, ROUGE-2: 0.46, ROUGE-L: 0.55), indicating superior overall summary quality and coherence.** - **ROUGE-1 (0.62)**: Measures unigram overlap, reflecting coverage of individual words in the summary. - **ROUGE-2 (0.46)**: Measures bigram overlap, indicating better fluency and local coherence. - **ROUGE-L (0.55)**: Evaluates the longest common subsequence, capturing structural similarity and overall summary coherence. **Why not the others:** - **Model A**: Lower scores in all metrics compared to Model C (ROUGE-1: 0.55, ROUGE-2: 0.43, ROUGE-L: 0.48). - **Model B**: Performs slightly better than Model A but scores lower than Model C in all metrics (ROUGE-1: 0.60, ROUGE-2: 0.45, ROUGE-L: 0.52). - **Model D**: While it is closer to Model B, it is also outperformed by Model C in all metrics (ROUGE-1: 0.58, ROUGE-2: 0.44, ROUGE-L: 0.50). **Model C consistently delivers the best performance across all evaluation metrics, making it the optimal choice for prioritizing summary quality and coherence.**
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
You are working on a text summarization project and have tested several models. Below are the ROUGE-1, ROUGE-2, and ROUGE-L scores for different models:
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|---|
| Model A | 0.55 | 0.43 | 0.48 |
| Model B | 0.60 | 0.45 | 0.52 |
| Model C | 0.62 | 0.46 | 0.55 |
| Model D | 0.58 | 0.44 | 0.50 |
Given that ROUGE-1 measures unigram overlap, ROUGE-2 measures bigram overlap, and ROUGE-L focuses on the longest common subsequence (LCS), which model should you select for this summarization task if your goal is to prioritize overall summary quality and coherence?
A
Model B
B
Model C
C
Model D
D
Model A