
Answer-first summary for fast verification
Answer: To provide sequence order information to the model
Transformers use positional encodings to provide sequence order information to the model. Unlike recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that inherently capture sequence order through their architecture, Transformers process all tokens in parallel without any inherent notion of position. Positional encodings are added to the input embeddings to give the model information about the relative or absolute position of tokens in the sequence. This allows the model to understand word order, which is crucial for many natural language processing tasks where word position affects meaning. **Key points:** - Positional encodings can be learned or fixed (like sinusoidal functions) - They enable the model to distinguish between sequences like "dog bites man" vs "man bites dog" - Without positional encodings, Transformers would treat all permutations of tokens as equivalent - This is essential for tasks like translation, text generation, and sequence understanding
Author: Jin H
Ultimate access to all questions.
No comments yet.