Explanation
The correct answer is B. Self-Attention mechanism.
Why Self-Attention is correct:
- Simultaneous processing: The self-attention mechanism allows the Transformer model to process all words in a sentence simultaneously, unlike recurrent neural networks (RNNs) which process words sequentially.
- Global context: Self-attention computes attention scores between every pair of words in the sentence, enabling the model to capture long-range dependencies and relationships regardless of distance.
- Parallel computation: This architecture enables parallel processing during training, making Transformers more efficient than RNNs.
Why Recurrent loops are incorrect:
- A. Recurrent loops are characteristic of RNNs, which process sequences word by word in a sequential manner, not simultaneously.
- RNNs suffer from vanishing gradient problems and struggle with long-range dependencies, whereas Transformers with self-attention handle these much better.
Additional context:
The Transformer architecture, introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017), revolutionized natural language processing by replacing recurrence with self-attention mechanisms, enabling more efficient parallel processing and better capture of contextual relationships.