
Answer-first summary for fast verification
Answer: Cosine Similarity
## Explanation **Cosine Similarity** is the correct answer because: 1. **Vector-based similarity search**: When both the query and documents are converted into vectors (embeddings), cosine similarity is commonly used to measure the similarity between these vectors. 2. **How it works**: Cosine similarity calculates the cosine of the angle between two vectors in a multi-dimensional space. It ranges from -1 (completely opposite) to 1 (identical), with 0 indicating orthogonality (no similarity). 3. **Why not the other options**: - **Decision Trees** (A): Used for classification and regression tasks, not similarity search. - **Random Forest** (C): An ensemble method combining multiple decision trees, used for classification/regression, not similarity search. - **Token Merging** (D): Typically refers to text processing techniques like tokenization or stemming, not similarity measurement between vectors. 4. **Real-world application**: This scenario describes a typical **semantic search** or **information retrieval** system where documents and queries are embedded into vector space, and cosine similarity helps find the most relevant documents by measuring semantic similarity rather than just keyword matching. 5. **Alternative similarity measures**: While cosine similarity is most common, other distance metrics like Euclidean distance or dot product could also be used, but cosine similarity is preferred for text embeddings because it's insensitive to vector magnitude and focuses on direction (semantic meaning).
Author: Jin H
Ultimate access to all questions.
No comments yet.
A university uses an AI model to find research papers related to a student's query. The model converts the query and all papers into vectors, then finds the closest match. Which algorithm is used for this similarity search?
A
Decision Trees
B
Cosine Similarity
C
Random Forest
D
Token Merging