
Ultimate access to all questions.
A Generative AI Engineer has developed a RAG application to answer questions about a series of fantasy novels from the author's web forum. The text from the novels is chunked and embedded into a vector store along with metadata (page number, chapter number, book title). These chunks are retrieved based on the user's query and sent to an LLM to generate a response. The engineer initially selected the chunking strategy and its configurations based on intuition but now wants to choose the optimal values more systematically.
Which TWO strategies should the engineer use to optimize their chunking strategy and parameters? (Select two.)
A
Change embedding models and compare performance.
B
Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.
C
Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters. Choose the strategy that gives the best performance metric.
D
Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.
E
Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.