
Answer-first summary for fast verification
Answer: Chunk the document into sections, further splitting large sections into smaller chunks that respect sentence boundaries while staying within the 2048-token limit.
## Explanation When preparing a large legal document for use in a generative AI model that has a token limit of 2048, the most efficient way to chunk the text is to split the document into smaller, manageable sections that respect natural language boundaries, such as sentences. This ensures that each chunk is coherent and meaningful without exceeding the model's token limit. - **Option A** is the best strategy because it preserves the logical flow of the content while ensuring each chunk stays within the model's constraints, avoiding fragmented or incomplete sentences. - **Option B** may result in chapters that are too large to fit within the token limit, while **Option C** doesn't account for section lengths, which could also exceed the limit. - **Option D**, although it ensures the token limit is respected, could cut off sentences, leading to incomplete or less meaningful chunks for the model to process.
Author: LeetQuiz .
Ultimate access to all questions.
Question: 3
You are preparing a large legal document to be used in a generative AI model for text summarization. The document has many chapters, and each chapter contains multiple sections with varying lengths. The model you're using has a token limit of 2048 tokens for processing. Which of the following chunking strategies would best ensure efficient processing of the document without exceeding the token limit?
A
Chunk the document into sections, further splitting large sections into smaller chunks that respect sentence boundaries while staying within the 2048-token limit.
B
Chunk the document into chapters, ensuring each chapter fits within the model's token limit.
C
Chunk the entire document into sections, where each section is treated as one chunk regardless of length.
D
Dynamically chunk the document based on token count, ensuring that each chunk contains no more than 2048 tokens, even if it cuts off in the middle of a sentence.
No comments yet.