
Answer-first summary for fast verification
Answer: Self-Attention mechanism
## Explanation The correct answer is **B. Self-Attention mechanism**. ### Why Self-Attention is Correct: 1. **Simultaneous Processing**: The self-attention mechanism allows the Transformer model to process all words in a sentence simultaneously, unlike recurrent neural networks (RNNs) which process words sequentially. 2. **Global Context**: Self-attention computes attention scores between every pair of words in the sequence, enabling the model to capture long-range dependencies and relationships regardless of their positions in the sentence. 3. **Parallelization**: This parallel processing capability makes Transformers more efficient for training on modern hardware compared to recurrent architectures. ### Why Recurrent Loops is Incorrect: - **A. Recurrent loops** are characteristic of RNNs, LSTMs, and GRUs, which process sequences word-by-word in a sequential manner, not simultaneously. - Recurrent architectures suffer from issues like vanishing gradients and difficulty capturing long-range dependencies. ### Key Takeaway: The self-attention mechanism is the core innovation of the Transformer architecture that enables parallel processing and effective modeling of relationships between all words in a sequence simultaneously, making it highly effective for natural language processing tasks.
Author: Ritesh Yadav
Ultimate access to all questions.
No comments yet.