Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

You are working on a text summarization project and have tested several models. Below are the ROUGE-1, ROUGE-2, and ROUGE-L scores for different models:

Model	ROUGE-1	ROUGE-2	ROUGE-L
Model A	0.55	0.43	0.48
Model B	0.60	0.45	0.52
Model C	0.62	0.46	0.55
Model D	0.58	0.44	0.50

Given that ROUGE-1 measures unigram overlap, ROUGE-2 measures bigram overlap, and ROUGE-L focuses on the longest common subsequence (LCS), which model should you select for this summarization task if your goal is to prioritize overall summary quality and coherence?

Real Exam

Community

LLeetQuiz

Model B

Model C

Model D

Model A

Explanation:

Model C has the highest scores across all ROUGE metrics (ROUGE-1: 0.62, ROUGE-2: 0.46, ROUGE-L: 0.55), indicating superior overall summary quality and coherence.

ROUGE-1 (0.62): Measures unigram overlap, reflecting coverage of individual words in the summary.
ROUGE-2 (0.46): Measures bigram overlap, indicating better fluency and local coherence.
ROUGE-L (0.55): Evaluates the longest common subsequence, capturing structural similarity and overall summary coherence.

Why not the others:

Model A: Lower scores in all metrics compared to Model C (ROUGE-1: 0.55, ROUGE-2: 0.43, ROUGE-L: 0.48).
Model B: Performs slightly better than Model A but scores lower than Model C in all metrics (ROUGE-1: 0.60, ROUGE-2: 0.45, ROUGE-L: 0.52).
Model D: While it is closer to Model B, it is also outperformed by Model C in all metrics (ROUGE-1: 0.58, ROUGE-2: 0.44, ROUGE-L: 0.50).

Model C consistently delivers the best performance across all evaluation metrics, making it the optimal choice for prioritizing summary quality and coherence.

Powered ByGPT-5.2

Comments

Loading comments...