AWS Certified AI Practitioner

Ultimate access to all questions.

Explanation:

Explanation

Transformers have several key advantages over RNN-based models:

RNNs process sequences sequentially (one token at a time), which makes them slow and difficult to parallelize.
Transformers process all tokens in a sequence simultaneously using self-attention mechanisms, enabling efficient parallel computation on modern hardware (GPUs/TPUs).

RNNs suffer from vanishing/exploding gradient problems when processing long sequences, making it difficult to capture long-range dependencies.
Transformers use self-attention mechanisms that can directly connect any two positions in the sequence, regardless of distance, allowing them to capture long-term dependencies more effectively.

Option A: This describes RNNs, not Transformers. RNNs process input sequentially, while Transformers process all tokens in parallel.
Option B: Transformers don't use convolution filters; they use attention mechanisms. Convolution filters are used in CNNs.
Option D: Transformers typically have more parameters than RNNs due to their attention mechanisms and multiple layers.

Scalability: Transformers scale better with larger datasets and model sizes.
Global Context: Self-attention provides global context for each token, unlike RNNs which have limited context windows.
Training Efficiency: Parallel processing makes training faster and more efficient.

This architectural advantage is why Transformers have become the foundation for most state-of-the-art NLP models like BERT, GPT, and T5.

Explanation:

Transformers have several key advantages over RNN-based models:

RNNs process sequences sequentially (one token at a time), which makes them slow and difficult to parallelize.
Transformers process all tokens in a sequence simultaneously using self-attention mechanisms, enabling efficient parallel computation on modern hardware (GPUs/TPUs).

RNNs suffer from vanishing/exploding gradient problems when processing long sequences, making it difficult to capture long-range dependencies.
Transformers use self-attention mechanisms that can directly connect any two positions in the sequence, regardless of distance, allowing them to capture long-term dependencies more effectively.

Option A: This describes RNNs, not Transformers. RNNs process input sequentially, while Transformers process all tokens in parallel.
Option B: Transformers don't use convolution filters; they use attention mechanisms. Convolution filters are used in CNNs.
Option D: Transformers typically have more parameters than RNNs due to their attention mechanisms and multiple layers.

Scalability: Transformers scale better with larger datasets and model sizes.
Global Context: Self-attention provides global context for each token, unlike RNNs which have limited context windows.
Training Efficiency: Parallel processing makes training faster and more efficient.

This architectural advantage is why Transformers have become the foundation for most state-of-the-art NLP models like BERT, GPT, and T5.

No comments yet.

Real Exam

Community

JJin

Last updated: April 5, 2026 at 14:03

They process input sequentially, maintaining word order

18.2%

They rely on convolution filters for speed

9.1%

They allow parallel computation and handle long-term dependencies better

72.7%

They require fewer parameters