
Answer-first summary for fast verification
Answer: Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen AI.
Option C is the optimal choice because it directly addresses the problem of irrelevant content through the system prompt, instructing the model to only answer questions about SnoPen AI. This approach is efficient, doesn't require data preprocessing, and leverages the model's instruction-following capabilities. Option D (consolidating into a single chunk) is problematic as it could exceed context window limits and doesn't scale well. Option A (keeping all articles) contradicts the filtering goal, and Option B (claiming all information is about SnoPen AI) is misleading and could cause the model to hallucinate incorrect information about unrelated topics. The community discussion shows strong support for C with 80% of answers and the highest upvoted comment endorsing this approach.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A Generative AI Engineer is developing a RAG application to answer questions about internal documents for the company SnoPen AI. The source documents may contain a substantial amount of irrelevant content, including advertisements, sports news, entertainment news, or information about other companies. Which approach is recommended for building the RAG application to effectively filter out this irrelevant information?
A
Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.
B
Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.
C
Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen AI.
D
Consolidate all SnoPen AI related documents into a single chunk in the vector database.
No comments yet.