
Answer-first summary for fast verification
Answer: Chunk by paragraphs, overlapping the last sentence of each chunk with the next chunk.
Chunking by paragraphs with overlap helps maintain continuity and context across chunks. Overlapping the last sentence ensures that the context flows smoothly between chunks, reducing the risk of losing important information at chunk boundaries. This approach is particularly useful for summarizing long-form legal documents, where clauses and context may span across multiple paragraphs. - **B. Arbitrary 400-token segments without overlap** can break the flow of important legal clauses, leading to incomplete or disjointed summaries. - **C. Chunking by sentences with no overlap** may result in chunks that are too small and disconnected, losing the broader context necessary for accurately summarizing legal documents. - **D. Chunking by logical sections without overlap** might work for very well-structured documents, but legal documents often require continuity between sections, so overlapping is crucial for maintaining context. Thus, chunking by paragraphs with overlap ensures both coherence and context retention, making it the best approach for summarizing long legal documents effectively.
Author: LeetQuiz .
Ultimate access to all questions.
Question: 21
You are tasked with developing an AI-powered application using Databricks to summarize long-form legal documents. The documents can be thousands of words long, and the large language model (LLM) has a token limit of 4096 tokens. You need to decide on the optimal chunking strategy to ensure that the summarization captures the essential legal clauses accurately without missing important context. Which chunking strategy is most appropriate to generate an accurate and coherent summary, considering the token limit and the document structure?
A
Chunk by paragraphs, overlapping the last sentence of each chunk with the next chunk.
B
Chunk by arbitrary 400-token segments without overlapping content.
C
Chunk by sentences, with no overlap.
D
Chunk based on logical sections of the document, with no overlap.
No comments yet.