
Answer-first summary for fast verification
Answer: Reinforcement learning
## Explanation This scenario describes a classic **reinforcement learning** problem because: 1. **Reward/Penalty System**: The AI receives rewards for faster routes and penalties for delays, which is the core mechanism of reinforcement learning where an agent learns through trial-and-error interactions with an environment. 2. **Learning from Feedback**: The system improves decisions based on feedback (rewards/penalties) from previous deliveries, which aligns with the reinforcement learning paradigm of learning optimal policies through environmental feedback. 3. **Sequential Decision Making**: Route optimization involves making a sequence of decisions (which turns to take, which roads to use) to reach an optimal outcome, which is characteristic of reinforcement learning problems. **Why other options are incorrect:** - **A) Unsupervised learning**: This involves finding patterns in data without explicit labels or feedback, not learning from rewards/penalties. - **B) Self-supervised learning**: This uses the data itself to create supervisory signals, not external rewards/penalties. - **D) Transfer learning**: This involves applying knowledge from one domain to another, not learning through reward-based feedback. Reinforcement learning is particularly well-suited for optimization problems where an agent must learn to make sequential decisions to maximize cumulative rewards over time, exactly as described in this delivery-route optimization scenario.
Author: Jin H
Ultimate access to all questions.
No comments yet.
A logistics company is developing a delivery-route optimization AI that improves decisions by learning from previous deliveries — rewarding faster routes and penalizing delays. Which learning approach best fits this problem?
A
Unsupervised learning
B
Self-supervised learning
C
Reinforcement learning
D
Transfer learning