AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Correct Answer: C

Why Option C is correct:

Semantic chunking is the most appropriate approach for processing long technical documents because it respects the natural semantic boundaries in the text, ensuring that related concepts stay together in the same chunk.
RetrieveAndGenerate API is specifically designed for Amazon Bedrock to handle retrieval-augmented generation (RAG) scenarios where documents exceed context window limits. It dynamically retrieves only the most relevant chunks based on semantic similarity, ensuring the model receives contextually relevant information within its token limits.
The breakpoint percentile threshold of 95% ensures chunks are created at natural semantic boundaries rather than arbitrary token counts.
This approach addresses the core problem of truncated outputs by ensuring only relevant portions of documents are sent to the FM, staying within context window limits.

Why other options are incorrect:

Option A: Fixed-size chunking with sequential linking until reaching 200,000 tokens would still exceed the FM's context window limits. The approach of "linking multiple chunks sequentially" contradicts the problem statement about context window limitations.

Option B: While hierarchical chunking can be useful, using parent chunks of 8,000 tokens might still exceed some FM context windows. More importantly, this approach doesn't dynamically select only the most relevant content for each query.

Option D: Splitting documents into equal segments based on token count (80% of context window) doesn't respect semantic boundaries. Technical documents often have complex structures where splitting at arbitrary token counts could separate related concepts, leading to poor model performance. Additionally, processing segments independently before aggregation may lose cross-segment context.

Key Concepts:

Context Window Limitations: Foundation models have maximum token limits for input.
Retrieval-Augmented Generation (RAG): Technique to overcome context limits by retrieving only relevant document portions.
Semantic Chunking: Creating chunks based on semantic boundaries rather than fixed token counts.
RetrieveAndGenerate API: Amazon Bedrock's built-in solution for RAG implementations that handles retrieval and generation in a single API call.

Explanation:

Explanation

Correct Answer: C

Why Option C is correct:

Semantic chunking is the most appropriate approach for processing long technical documents because it respects the natural semantic boundaries in the text, ensuring that related concepts stay together in the same chunk.
RetrieveAndGenerate API is specifically designed for Amazon Bedrock to handle retrieval-augmented generation (RAG) scenarios where documents exceed context window limits. It dynamically retrieves only the most relevant chunks based on semantic similarity, ensuring the model receives contextually relevant information within its token limits.
The breakpoint percentile threshold of 95% ensures chunks are created at natural semantic boundaries rather than arbitrary token counts.
This approach addresses the core problem of truncated outputs by ensuring only relevant portions of documents are sent to the FM, staying within context window limits.

Why other options are incorrect:

Key Concepts:

Context Window Limitations: Foundation models have maximum token limits for input.
Retrieval-Augmented Generation (RAG): Technique to overcome context limits by retrieving only relevant document portions.
Semantic Chunking: Creating chunks based on semantic boundaries rather than fixed token counts.
RetrieveAndGenerate API: Amazon Bedrock's built-in solution for RAG implementations that handles retrieval and generation in a single API call.

Comments (0)

No comments yet.

An enterprise application uses an Amazon Bedrock foundation model (FM) to process and analyze 50 to 200 pages of technical documents. Users are experiencing inconsistent responses and receiving truncated outputs when processing documents that exceed the FM's context window limits.

Which solution will resolve this problem?

Real Exam

Community

DDucse

Last updated: March 23, 2026 at 10:35

Configure fixed-size chunking at 4,000 tokens for each chunk with 20% overlap. Use application-level logic to link multiple chunks sequentially until the FM's maximum context window of 200,000 tokens is reached before making inference calls.

0.0%

Use hierarchical chunking with parent chunks of 8,000 tokens and child chunks of 2,000 tokens. Use Amazon Bedrock Knowledge Bases built-in retrieval to automatically select relevant parent chunks based on query context. Configure overlap tokens to maintain semantic continuity.