
Answer-first summary for fast verification
Answer: Pick an embedding model trained on related domain knowledge
The correct answer is C because embedding models trained on related domain knowledge are specifically optimized to capture the semantic relationships and nuances within that domain, leading to more accurate retrieval of relevant documents in RAG applications. This domain-specific alignment directly improves retrieval performance by ensuring the embedding space reflects the application's context. While option D (MTEB leaderboard models) provides strong general-purpose baselines and is a reasonable fallback strategy, it doesn't specifically address domain alignment. Option A (multilingual support) is only relevant if the application actually requires multilingual capabilities, and option B (most recent open LLM) is misguided since embedding models are distinct from LLMs and recency doesn't guarantee domain relevance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A Generative AI Engineer is developing a RAG application and wants to experiment with different embedding models to enhance its performance.
Which strategy for selecting an embedding model should they use?
A
Pick an embedding model with multilingual support to support potential multilingual user questions
B
Pick the most recent and most performant open LLM released at the time
C
Pick an embedding model trained on related domain knowledge
D
Pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace
No comments yet.