Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

The question requires a model that can handle the generative component of a RAG system while meeting strict confidentiality requirements (no third-party data transmission) and prioritizing answer quality over latency/throughput. Option D (Llama2-70B) is optimal because it is a large, high-quality open-source generative model that can be self-hosted, ensuring no data leaves the organization. It can generate high-quality answers and be paired with an embedding model for retrieval. Option A (Dolly 1.5B) is too small to provide the best quality. Option B (OpenAI GPT-4) violates confidentiality by requiring third-party API calls. Option C (BGE-large) is an embedding model only and cannot generate answers, failing the core RAG requirement. The community discussion strongly supports D, with upvoted comments emphasizing its quality, self-hosting capability, and suitability for generation.

Explanation:

Comments (0)

No comments yet.

A Generative AI Engineer is developing a RAG application for a small internal group of experts. The primary requirements are the highest possible answer quality, with low latency and throughput being less critical. The data is highly confidential and subject to regulatory requirements that prohibit any information from being transmitted to third parties.

Which model is the most suitable for this scenario?

Exam-Like

Last updated: April 16, 2026 at 14:02

Dolly 1.5B

8.2%

OpenAI GPT-4

11.9%

BGE-large

11.1%

Llama2-70B

68.7%