
Explanation:
Option D is the best choice because it delivers true semantic search with the smallest operational footprint by combining a fully managed embedding service with an automatically scaling vector-capable database. The university's requirement is explicitly semantic: the metadata has no keywords, and the system must match abstracts based on similarity of meaning. This is a direct fit for an embeddings-based approach, where each abstract is converted into a vector representation and searched using vector similarity. Amazon Titan Embeddings in Amazon Bedrock provides a managed way to generate these vectors without hosting or maintaining an ML model, eliminating the operational work of model provisioning, patching, scaling, and lifecycle management.
For storage and retrieval, Amazon Aurora PostgreSQL Serverless with the pgvector extension supports vector storage and similarity search while minimizing infrastructure operations. Aurora Serverless reduces capacity planning and scaling tasks because it can automatically adjust to changes in workload, which is valuable for a university search application with variable usage patterns. With fewer than 1 million files, a PostgreSQL-based vector store is commonly operationally simpler than running a dedicated search cluster, while still meeting the requirement to query using both text-derived similarity and associated metadata filters stored alongside the vectors.
Option A can also enable vector search, but operating an OpenSearch domain typically introduces additional concerns such as domain sizing, shard strategy, cluster scaling, and performance tuning for k-NN workloads. Option C increases operational overhead the most because it requires deploying and operating a sentence-transformer model endpoint in SageMaker AI, including scaling, monitoring, and model management. Option B does not meet the semantic similarity requirement reliably because topic extraction is not equivalent to embedding-based semantic matching, especially when the metadata lacks keywords and the system must compare abstracts by meaning.
Therefore, D best satisfies semantic search needs with the least operational overhead.
Ultimate access to all questions.
No comments yet.
A university is building a search system for digitized academic papers. The system must allow researchers to find papers by semantic similarity of abstracts, not just keyword matching. The digitized files have no keywords or tags in their metadata, so the system must be able to match abstracts based on similarity of meaning. The university wants to minimize operational overhead. The system will have fewer than 1 million digitized files. Which solution will meet these requirements with the LEAST operational overhead?
A
Use Amazon SageMaker AI to deploy a sentence-transformer model endpoint. Use the endpoint to create vector embeddings for each digitized file. Store the embeddings in an Amazon OpenSearch Service domain that has the k-NN plugin enabled.
B
Use Amazon Comprehend to extract topics from the digitized files. Store the topics as metadata in an Amazon OpenSearch Service domain. Use the OpenSearch Service domain to search for papers by topic.
C
Use Amazon SageMaker AI to deploy a sentence-transformer model endpoint. Use the endpoint to create vector embeddings for each digitized file. Store the embeddings in an Amazon Aurora PostgreSQL Serverless database that has the pgvector extension.
D
Use Amazon Titan Embeddings in Amazon Bedrock to create vector representations of the digitized files. Store embeddings in an Amazon Aurora PostgreSQL Serverless database that has the pgvector extension.