AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Detailed Explanation

In a Retrieval Augmented Generation (RAG) pipeline, the goal is to balance computational efficiency with providing a near real-time user experience. The question specifically asks which steps should be implemented using offline batch processing to meet the requirements of daily content updates and near real-time user interaction.

Analysis of Each Option:

A: Generation of content embeddings - CORRECT

This involves converting published documents into vector embeddings using a language model (e.g., Amazon Titan, BERT).
This process is computationally intensive and doesn't need to happen in real-time since content is published daily, not continuously.
Batch processing allows efficient embedding generation for all new content at once, optimizing resource usage and ensuring embeddings are ready before user queries.

B: Generation of embeddings for user queries - INCORRECT

This must be done in real-time when users submit queries to enable immediate retrieval of relevant content.
Batch processing would introduce unacceptable latency, contradicting the near real-time requirement.

C: Creation of the search index - CORRECT

After embeddings are generated, they need to be indexed in a vector database (e.g., Amazon OpenSearch, FAISS, Pinecone) for efficient similarity search.
Index creation/updating can be done offline in batches when new content embeddings are available, typically after daily content updates.
This ensures the index is current without affecting real-time query performance.

D: Retrieval of relevant content - INCORRECT

This occurs during user interaction when the system searches the indexed embeddings to find content relevant to the query.
Must be real-time to maintain near real-time responsiveness.

E: Response generation for the user - INCORRECT

This involves using a large language model (LLM) to generate answers based on retrieved content and user queries.
Requires real-time processing to provide immediate responses to users.

Why A and C Are Optimal:

Content embedding generation (A) and search index creation (C) are preprocessing steps that depend only on the content itself, not on user interactions.
Both are resource-intensive operations that benefit from batch processing efficiency.
Performing these offline ensures that when users query the system, the infrastructure is prepared with pre-computed embeddings and an updated index, enabling fast retrieval and response.
This separation aligns with AWS best practices for RAG architectures, where embedding and indexing are often handled as asynchronous, scheduled jobs (e.g., using AWS Lambda, Amazon SageMaker, or AWS Batch), while query embedding, retrieval, and response generation are real-time services.

Why Other Options Are Less Suitable:

Options B, D, and E are inherently real-time components of the RAG pipeline. Implementing them with batch processing would violate the near real-time requirement, causing significant delays in user experience.
The publishing company's requirement for daily content updates makes A and C ideal for batch processing, as they can be scheduled to run after daily publications, keeping the system current without impacting user-facing performance.

Explanation:

Detailed Explanation

Analysis of Each Option:

A: Generation of content embeddings - CORRECT

This involves converting published documents into vector embeddings using a language model (e.g., Amazon Titan, BERT).
This process is computationally intensive and doesn't need to happen in real-time since content is published daily, not continuously.
Batch processing allows efficient embedding generation for all new content at once, optimizing resource usage and ensuring embeddings are ready before user queries.

B: Generation of embeddings for user queries - INCORRECT

This must be done in real-time when users submit queries to enable immediate retrieval of relevant content.
Batch processing would introduce unacceptable latency, contradicting the near real-time requirement.

C: Creation of the search index - CORRECT

After embeddings are generated, they need to be indexed in a vector database (e.g., Amazon OpenSearch, FAISS, Pinecone) for efficient similarity search.
Index creation/updating can be done offline in batches when new content embeddings are available, typically after daily content updates.
This ensures the index is current without affecting real-time query performance.

D: Retrieval of relevant content - INCORRECT

This occurs during user interaction when the system searches the indexed embeddings to find content relevant to the query.
Must be real-time to maintain near real-time responsiveness.

E: Response generation for the user - INCORRECT

This involves using a large language model (LLM) to generate answers based on retrieved content and user queries.
Requires real-time processing to provide immediate responses to users.

Why A and C Are Optimal:

Content embedding generation (A) and search index creation (C) are preprocessing steps that depend only on the content itself, not on user interactions.
Both are resource-intensive operations that benefit from batch processing efficiency.
Performing these offline ensures that when users query the system, the infrastructure is prepared with pre-computed embeddings and an updated index, enabling fast retrieval and response.
This separation aligns with AWS best practices for RAG architectures, where embedding and indexing are often handled as asynchronous, scheduled jobs (e.g., using AWS Lambda, Amazon SageMaker, or AWS Batch), while query embedding, retrieval, and response generation are real-time services.

Why Other Options Are Less Suitable:

Options B, D, and E are inherently real-time components of the RAG pipeline. Implementing them with batch processing would violate the near real-time requirement, causing significant delays in user experience.
The publishing company's requirement for daily content updates makes A and C ideal for batch processing, as they can be scheduled to run after daily publications, keeping the system current without impacting user-facing performance.

Comments (0)

No comments yet.

A publishing company has implemented a Retrieval Augmented Generation (RAG) solution to allow users to interact with published content. New content is published daily, and the company aims to deliver a near real-time user experience.

Which two steps in the RAG pipeline should be implemented using offline batch processing to meet these requirements?

Exam-Like

Last updated: February 8, 2026 at 20:17

Generation of content embeddings

33.3%

Generation of embeddings for user queries

11.1%

Creation of the search index