AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A university uses an AI model to find research papers related to a student's query. The model converts the query and all papers into vectors, then finds the closest match. Which algorithm is used for this similarity search?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:26

Decision Trees

Cosine Similarity

Random Forest

Token Merging

Explanation:

Explanation

Cosine Similarity is the correct answer because:

Vector-based similarity search: When both queries and documents are converted into vectors (embeddings), cosine similarity is commonly used to measure the similarity between these vectors.
How it works: Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space. It ranges from -1 to 1, where:
- 1 indicates identical vectors (same direction)
- 0 indicates orthogonal vectors (no similarity)
- -1 indicates opposite vectors
Advantages for text similarity:
- Scale-invariant: Only considers the direction, not the magnitude of vectors
- Effective for high-dimensional data: Works well with word embeddings and document vectors
- Common in NLP: Widely used in information retrieval, recommendation systems, and semantic search
Why not the other options:
- Decision Trees (A): Used for classification and regression, not similarity search
- Random Forest (C): An ensemble method combining multiple decision trees, also for classification/regression
- Token Merging (D): Not a standard similarity algorithm; might refer to tokenization or text processing techniques
Real-world application: This scenario describes a typical semantic search or vector similarity search system, which is commonly implemented using cosine similarity with vector embeddings from models like BERT, Word2Vec, or other transformer-based models.

Key takeaway: When working with vector representations of text (embeddings), cosine similarity is the standard metric for measuring semantic similarity and finding the closest matches in vector space.

Powered ByGemini-3 Flash

Comments

Loading comments...