AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

What is a key advantage of Transformers over RNN-based models?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:26

They process input sequentially, maintaining word order

They rely on convolution filters for speed

They allow parallel computation and handle long-term dependencies better

They require fewer parameters

Explanation:

Explanation

Transformers have several key advantages over RNN-based models:

1. Parallel Computation

RNNs process sequences sequentially (one token at a time), which makes them inherently slow for training.
Transformers process all tokens in a sequence simultaneously through self-attention mechanisms, enabling parallel computation and significantly faster training times.

2. Better Handling of Long-Term Dependencies

RNNs suffer from vanishing/exploding gradient problems when dealing with long sequences, making it difficult to capture long-range dependencies.
Transformers use self-attention mechanisms that can directly connect any two positions in the sequence, regardless of distance, allowing them to capture long-term dependencies more effectively.

3. Architectural Differences

Option A is incorrect: Transformers do NOT process input sequentially - this is actually a characteristic of RNNs.
Option B is incorrect: Transformers do not rely on convolution filters; they use attention mechanisms.
Option D is incorrect: Transformers typically have MORE parameters than RNNs due to their attention mechanisms and feed-forward networks.

Key Transformer Features:

Self-Attention: Allows the model to weigh the importance of different words in a sequence relative to each other
Positional Encoding: Injects information about word order since Transformers don't process sequentially
Multi-Head Attention: Enables the model to focus on different parts of the sequence simultaneously

This parallel processing capability and superior handling of long-range dependencies make Transformers particularly well-suited for large-scale language modeling tasks.

Powered ByGemini-3 Flash

Comments

Loading comments...