
Answer-first summary for fast verification
Answer: Self-Attention mechanism
## Explanation The correct answer is **B. Self-Attention mechanism**. ### Why Self-Attention is correct: 1. **Simultaneous processing**: The self-attention mechanism allows the Transformer model to process all words in a sentence simultaneously, unlike recurrent neural networks (RNNs) which process words sequentially. 2. **Global context**: Self-attention computes attention scores between every pair of words in the sentence, enabling the model to capture long-range dependencies and relationships regardless of distance. 3. **Parallel computation**: This architecture enables parallel processing during training, making Transformers more efficient than RNNs. ### Why Recurrent loops are incorrect: - **A. Recurrent loops** are characteristic of RNNs, which process sequences word by word in a sequential manner, not simultaneously. - RNNs suffer from vanishing gradient problems and struggle with long-range dependencies, whereas Transformers with self-attention handle these much better. ### Additional context: The Transformer architecture, introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017), revolutionized natural language processing by replacing recurrence with self-attention mechanisms, enabling more efficient parallel processing and better capture of contextual relationships.
Author: Jin H
Ultimate access to all questions.
No comments yet.