
Explanation:
C is the correct answer because transformer-based language models fundamentally rely on self-attention mechanisms to capture contextual relationships within input sequences. This architectural innovation allows transformers to:
Process entire sequences in parallel rather than sequentially, enabling more efficient training and inference compared to recurrent neural networks (RNNs).
Capture long-range dependencies by allowing each token in the sequence to attend to all other tokens, regardless of their positional distance. This is crucial for understanding complex linguistic structures where meaning depends on relationships between distant words.
Compute attention weights dynamically based on the relevance between tokens, enabling the model to focus on the most important parts of the input when generating outputs.
A is incorrect because transformer-based models do not use convolutional layers as their primary mechanism. While some hybrid architectures exist, the core innovation of transformers is self-attention, not convolutional operations. Convolutional neural networks (CNNs) are better suited for capturing local patterns in grid-like data (e.g., images), not the global contextual relationships essential for language understanding.
B is incorrect because transformer-based models are not limited to text data. While originally designed for natural language processing tasks, transformer architectures have been successfully adapted to various modalities including:
D is incorrect because transformers do not process data sequences one element at a time in cyclic iterations. This description characterizes recurrent neural networks (RNNs) and their variants (LSTMs, GRUs), which process sequences sequentially with hidden states that carry information forward. In contrast, transformers process all tokens in parallel through self-attention, making them more computationally efficient for long sequences and better at capturing long-range dependencies without the vanishing gradient problems that plague RNNs.
These characteristics make transformers particularly effective for language modeling tasks where understanding context and relationships between words is paramount.
Ultimate access to all questions.
What is a characteristic of transformer-based language models?
A
Transformer-based language models use convolutional layers to apply filters across an input to capture local patterns through filtered views.
B
Transformer-based language models can process only text data.
C
Transformer-based language models use self-attention mechanisms to capture contextual relationships.
D
Transformer-based language models process data sequences one element at a time in cyclic iterations.
No comments yet.