AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A team is training a warehouse robot using Amazon SageMaker RL. They notice that the robot reaches the destination quickly but takes risky shortcuts that lead to frequent collisions. Which adjustment should be made to improve safety?

Real Exam

Community

RRitesh

Last updated: December 3, 2025 at 18:27

Reduce the number of training episodes

Add a higher penalty for collisions in the reward function

Increase the learning rate to speed up exploration

Remove penalties to avoid discouraging movement

Explanation:

Explanation

In reinforcement learning (RL), the reward function is crucial for guiding the agent's behavior. The robot is currently prioritizing speed (reaching the destination quickly) over safety (avoiding collisions). This indicates that the current reward function doesn't sufficiently penalize collisions.

Why option B is correct:

Adding a higher penalty for collisions in the reward function will make the robot learn to avoid collisions more effectively
The robot will learn that taking risky shortcuts with collisions results in lower overall rewards
This encourages the robot to find safer paths even if they take slightly longer

Why other options are incorrect:

A: Reduce the number of training episodes - This would likely worsen performance as the robot needs more training to learn safe behaviors
C: Increase the learning rate to speed up exploration - A higher learning rate might cause instability and doesn't directly address the safety issue
D: Remove penalties to avoid discouraging movement - This would make the problem worse by encouraging even more risky behavior

In Amazon SageMaker RL, adjusting the reward function is a common technique to shape desired behaviors in reinforcement learning agents.

Powered ByGemini-3 Flash

Comments

Loading comments...