
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A team is training a warehouse robot using Amazon SageMaker RL. They notice that the robot reaches the destination quickly but takes risky shortcuts that lead to frequent collisions. Which adjustment should be made to improve safety?
A
Reduce the number of training episodes
B
Add a higher penalty for collisions in the reward function
C
Increase the learning rate to speed up exploration
D
Remove penalties to avoid discouraging movement
Explanation:
In reinforcement learning (RL), the reward function is crucial for guiding the agent's behavior. The robot is currently prioritizing speed (reaching the destination quickly) over safety (avoiding collisions). This indicates that the current reward function doesn't sufficiently penalize collisions.
Why option B is correct:
Adding a higher penalty for collisions in the reward function will make the robot learn to avoid collisions more effectively
The robot will learn that taking risky shortcuts with collisions results in lower overall rewards
This encourages the robot to find safer paths even if they take slightly longer
Why other options are incorrect:
A: Reduce the number of training episodes - This would likely worsen performance as the robot needs more training to learn safe behaviors
C: Increase the learning rate to speed up exploration - A higher learning rate might cause instability and doesn't directly address the safety issue
D: Remove penalties to avoid discouraging movement - This would make the problem worse by encouraging even more risky behavior
In Amazon SageMaker RL, adjusting the reward function is a common technique to shape desired behaviors in reinforcement learning agents.