
Answer-first summary for fast verification
Answer: Cosine Similarity
## Explanation **Cosine Similarity** is the correct answer because: 1. **Vector-based similarity search**: When both queries and documents are converted into vectors (embeddings), cosine similarity is commonly used to measure the similarity between these vectors. 2. **How it works**: Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space. It ranges from -1 to 1, where: - 1 indicates identical vectors (same direction) - 0 indicates orthogonal vectors (no similarity) - -1 indicates opposite vectors 3. **Advantages for text similarity**: - **Scale-invariant**: Only considers the direction, not the magnitude of vectors - **Effective for high-dimensional data**: Works well with word embeddings and document vectors - **Common in NLP**: Widely used in information retrieval, recommendation systems, and semantic search 4. **Why not the other options**: - **Decision Trees (A)**: Used for classification and regression, not similarity search - **Random Forest (C)**: An ensemble method combining multiple decision trees, also for classification/regression - **Token Merging (D)**: Not a standard similarity algorithm; might refer to tokenization or text processing techniques 5. **Real-world application**: This scenario describes a typical **semantic search** or **vector similarity search** system, which is commonly implemented using cosine similarity with vector embeddings from models like BERT, Word2Vec, or other transformer-based models. **Key takeaway**: When working with vector representations of text (embeddings), cosine similarity is the standard metric for measuring semantic similarity and finding the closest matches in vector space.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.
A university uses an AI model to find research papers related to a student's query. The model converts the query and all papers into vectors, then finds the closest match. Which algorithm is used for this similarity search?
A
Decision Trees
B
Cosine Similarity
C
Random Forest
D
Token Merging