Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Explanation:

Option C is the optimal choice because it directly addresses the problem of irrelevant content through the system prompt, instructing the model to only answer questions about SnoPen AI. This approach is efficient, doesn't require data preprocessing, and leverages the model's instruction-following capabilities. Option D (consolidating into a single chunk) is problematic as it could exceed context window limits and doesn't scale well. Option A (keeping all articles) contradicts the filtering goal, and Option B (claiming all information is about SnoPen AI) is misleading and could cause the model to hallucinate incorrect information about unrelated topics. The community discussion shows strong support for C with 80% of answers and the highest upvoted comment endorsing this approach.

Explanation:

Comments (0)

No comments yet.

A Generative AI Engineer is developing a RAG application to answer questions about internal documents for the company SnoPen AI. The source documents may contain a substantial amount of irrelevant content, including advertisements, sports news, entertainment news, or information about other companies. Which approach is recommended for building the RAG application to effectively filter out this irrelevant information?

Exam-Like

Last updated: May 16, 2026 at 14:02

Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.

2.5%

Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.

7.0%

Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen AI.

85.1%

Consolidate all SnoPen AI related documents into a single chunk in the vector database.

5.5%