
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A university uses an AI model to find research papers related to a student's query. The model converts the query and all papers into vectors, then finds the closest match. Which algorithm is used for this similarity search?
A
Decision Trees
B
Cosine Similarity
C
Random Forest
D
Token Merging
Explanation:
Cosine Similarity is the correct answer because:
Vector-based similarity search: When both queries and documents are converted into vectors (embeddings), cosine similarity is commonly used to measure the similarity between these vectors.
How it works: Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space. It ranges from -1 to 1, where:
1 indicates identical vectors (same direction)
0 indicates orthogonal vectors (no similarity)
-1 indicates opposite vectors
Advantages for text similarity:
Scale-invariant: Only considers the direction, not the magnitude of vectors
Effective for high-dimensional data: Works well with word embeddings and document vectors
Common in NLP: Widely used in information retrieval, recommendation systems, and semantic search
Why not the other options:
Decision Trees (A): Used for classification and regression, not similarity search
Random Forest (C): An ensemble method combining multiple decision trees, also for classification/regression
Token Merging (D): Not a standard similarity algorithm; might refer to tokenization or text processing techniques
Real-world application: This scenario describes a typical semantic search or vector similarity search system, which is commonly implemented using cosine similarity with vector embeddings from models like BERT, Word2Vec, or other transformer-based models.
Key takeaway: When working with vector representations of text (embeddings), cosine similarity is the standard metric for measuring semantic similarity and finding the closest matches in vector space.