AWS Certified Generative AI Developer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Explanation

Why Option B is correct:

Hierarchical chunking is specifically designed to preserve semantic context across related paragraphs and sections of documents. This approach creates parent-child relationships between chunks, where:
- Parent chunks (1,000 tokens) capture broader context and document structure
- Child chunks (200 tokens) capture finer-grained details
- Overlap between chunks (50 tokens) ensures continuity
Scientific papers have complex structure: The problem mentions queries spanning methodology, results, and discussion sections. Hierarchical chunking maintains the relationships between these sections by preserving the document hierarchy.
Scale requirements: With 25 million scientific papers, hierarchical chunking can efficiently handle large corpora while maintaining semantic relationships across the entire dataset.

Why other options are incorrect:

Option A (Fixed-size chunking): Arbitrary token-based splitting can break semantic units, especially at section boundaries. A 300-token chunk might cut through the middle of a methodology section, losing context.
Option C (Semantic chunking): While semantic chunking considers content meaning, it doesn't explicitly preserve hierarchical relationships between sections. The buffer size of 1 and 85% threshold may not adequately handle the complex structure of scientific papers.
Option D (No chunking, manual splitting): Not scalable for 25 million documents. Manual splitting is impractical and doesn't guarantee preservation of semantic context across related paragraphs.

Key takeaway: Hierarchical chunking is ideal for documents with complex structures (like scientific papers) where preserving relationships between sections (methodology→results→discussion) is crucial for accurate RAG performance.

Explanation:

Explanation

Why Option B is correct:

Hierarchical chunking is specifically designed to preserve semantic context across related paragraphs and sections of documents. This approach creates parent-child relationships between chunks, where:
- Parent chunks (1,000 tokens) capture broader context and document structure
- Child chunks (200 tokens) capture finer-grained details
- Overlap between chunks (50 tokens) ensures continuity
Scientific papers have complex structure: The problem mentions queries spanning methodology, results, and discussion sections. Hierarchical chunking maintains the relationships between these sections by preserving the document hierarchy.
Scale requirements: With 25 million scientific papers, hierarchical chunking can efficiently handle large corpora while maintaining semantic relationships across the entire dataset.

Why other options are incorrect:

Option A (Fixed-size chunking): Arbitrary token-based splitting can break semantic units, especially at section boundaries. A 300-token chunk might cut through the middle of a methodology section, losing context.
Option C (Semantic chunking): While semantic chunking considers content meaning, it doesn't explicitly preserve hierarchical relationships between sections. The buffer size of 1 and 85% threshold may not adequately handle the complex structure of scientific papers.
Option D (No chunking, manual splitting): Not scalable for 25 million documents. Manual splitting is impractical and doesn't guarantee preservation of semantic context across related paragraphs.

Comments (0)

No comments yet.

A pharmaceutical company is developing a Retrieval Augmented Generation (RAG) application that uses an Amazon Bedrock knowledge base. The knowledge base uses Amazon OpenSearch Service as a data source for more than 25 million scientific papers. Users report that the application produces inconsistent answers that cite irrelevant sections of papers when queries span methodology, results, and discussion sections of the papers.

The company needs to improve the knowledge base to preserve semantic context across related paragraphs on the scale of the entire corpus of data.

Which solution will meet these requirements?

Real Exam

Community

DDucse

Last updated: April 18, 2026 at 14:02

Configure the knowledge base to use fixed-size chunking. Set a 300-token maximum chunk size and a 10% overlap between chunks. Use an appropriate Amazon Bedrock embedding model.

9.1%

Configure the knowledge base to use hierarchical chunking. Use parent chunks that contain 1,000 tokens and child chunks that contain 200 tokens. Set a 50-token overlap between chunks.