AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Explanation

This scenario describes a classic reinforcement learning problem because:

Reward/Penalty System: The AI receives rewards for faster routes and penalties for delays, which is the core mechanism of reinforcement learning where an agent learns through trial-and-error interactions with an environment.
Learning from Feedback: The system improves decisions based on feedback (rewards/penalties) from previous deliveries, which aligns with the reinforcement learning paradigm of learning optimal policies through environmental feedback.
Sequential Decision Making: Route optimization involves making a sequence of decisions (which turns to take, which roads to use) to reach an optimal outcome, which is characteristic of reinforcement learning problems.

Why other options are incorrect:

A) Unsupervised learning: This involves finding patterns in data without explicit labels or feedback, not learning from rewards/penalties.
B) Self-supervised learning: This uses the data itself to create supervisory signals, not external rewards/penalties.
D) Transfer learning: This involves applying knowledge from one domain to another, not learning through reward-based feedback.

Reinforcement learning is particularly well-suited for optimization problems where an agent must learn to make sequential decisions to maximize cumulative rewards over time, exactly as described in this delivery-route optimization scenario.

Explanation:

Explanation

This scenario describes a classic reinforcement learning problem because:

Reward/Penalty System: The AI receives rewards for faster routes and penalties for delays, which is the core mechanism of reinforcement learning where an agent learns through trial-and-error interactions with an environment.
Learning from Feedback: The system improves decisions based on feedback (rewards/penalties) from previous deliveries, which aligns with the reinforcement learning paradigm of learning optimal policies through environmental feedback.
Sequential Decision Making: Route optimization involves making a sequence of decisions (which turns to take, which roads to use) to reach an optimal outcome, which is characteristic of reinforcement learning problems.

Why other options are incorrect:

A) Unsupervised learning: This involves finding patterns in data without explicit labels or feedback, not learning from rewards/penalties.
B) Self-supervised learning: This uses the data itself to create supervisory signals, not external rewards/penalties.
D) Transfer learning: This involves applying knowledge from one domain to another, not learning through reward-based feedback.

Comments (0)

No comments yet.

A logistics company is developing a delivery-route optimization AI that improves decisions by learning from previous deliveries — rewarding faster routes and penalizing delays. Which learning approach best fits this problem?

Real Exam

Community

JJin

Last updated: February 18, 2026 at 14:05

Unsupervised learning

7.7%

Self-supervised learning

0.0%

Reinforcement learning

92.3%

Transfer learning