
Answer-first summary for fast verification
Answer: Llama2-70B
The question requires a model that can handle the generative component of a RAG system while meeting strict confidentiality requirements (no third-party data transmission) and prioritizing answer quality over latency/throughput. Option D (Llama2-70B) is optimal because it is a large, high-quality open-source generative model that can be self-hosted, ensuring no data leaves the organization. It can generate high-quality answers and be paired with an embedding model for retrieval. Option A (Dolly 1.5B) is too small to provide the best quality. Option B (OpenAI GPT-4) violates confidentiality by requiring third-party API calls. Option C (BGE-large) is an embedding model only and cannot generate answers, failing the core RAG requirement. The community discussion strongly supports D, with upvoted comments emphasizing its quality, self-hosting capability, and suitability for generation.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A Generative AI Engineer is developing a RAG application for a small internal group of experts. The primary requirements are the highest possible answer quality, with low latency and throughput being less critical. The data is highly confidential and subject to regulatory requirements that prohibit any information from being transmitted to third parties.
Which model is the most suitable for this scenario?
A
Dolly 1.5B
B
OpenAI GPT-4
C
BGE-large
D
Llama2-70B
No comments yet.