
Explanation:
Option B is the correct answer because it provides the best balance of meeting all requirements while minimizing development effort.
Semantic Search Capability: By using foundation models from Amazon Bedrock to generate vector embeddings for both restaurant data and user queries, the solution enables true semantic understanding of natural language queries rather than just keyword matching.
Performance Requirements: Amazon OpenSearch Service natively supports vector indexing and k-NN (k-nearest neighbors) search at scale, which can handle the large dataset (20M restaurants + 200M reviews) while achieving sub-500ms response times.
Development Effort: This solution leverages fully managed services:
Scalability: OpenSearch Service can scale cost-effectively during peak usage periods with auto-scaling capabilities.
Data Freshness: The solution can be designed to update embeddings hourly as restaurant details change.
Option A: Uses keyword-based search which doesn't support complex natural language queries effectively. It requires extensive custom development for analyzers and relevance tuning.
Option C: While pgvector is a valid approach, it requires significant development effort to implement vector search in PostgreSQL, manage performance tuning for 220M+ records, and build the Lambda infrastructure. PostgreSQL may struggle with the scale and performance requirements.
Option D: Amazon Bedrock Knowledge Base is designed for RAG (Retrieval Augmented Generation) use cases with smaller datasets. It may not be optimized for the scale of 220M+ records and may not meet the 500ms performance requirement for 95% of queries.
This solution follows AWS best practices for semantic search applications while minimizing custom development through the use of fully managed services.
Ultimate access to all questions.
No comments yet.
A company provides a service that helps users from around the world discover new restaurants. The service has 50 million monthly active users. The company wants to implement a semantic search solution across a database that contains 20 million restaurants and 200 million reviews. The company currently stores the data in PostgreSQL.
The solution must support complex natural language queries and return results for at least 95% of queries within 500 ms. The solution must maintain data freshness for restaurant details that update hourly. The solution must also scale cost-effectively during peak usage periods.
Which solution will meet these requirements with the LEAST development effort?
A
Migrate the restaurant data to Amazon OpenSearch Service. Implement keyword-based search rules that use custom analyzers and relevance tuning to find restaurants based on attributes such as cuisine type, features, and location. Create Amazon API Gateway HTTP API endpoints to transform user queries into structured search parameters.
B
Migrate the restaurant data to Amazon OpenSearch Service. Use a foundation model (FM) in Amazon Bedrock to generate vector embeddings from restaurant descriptions, reviews, and menu items. When users submit natural language queries, convert the queries to embeddings by using the same FM. Perform k-nearest neighbors (k-NN) searches to find semantically similar results.
C
Keep the restaurant data in PostgreSQL and implement a pgvector extension. Use a foundation model (FM) in Amazon Bedrock to generate vector embeddings from restaurant data. Store the vector embeddings directly in PostgreSQL. Create an AWS Lambda function to convert natural language queries to vector representations by using the same FM. Configure the Lambda function to perform similarity searches within the database.
D
Migrate restaurant data to an Amazon Bedrock knowledge base by using a custom ingestion pipeline. Configure the knowledge base to automatically generate embeddings from restaurant information. Use the Amazon Bedrock Retrieve API with built-in vector search capabilities to query the knowledge base directly by using natural language input.