Change Picture

Upload a photo from your computer

Click or drag & drop an image

PNG, JPG, GIF, WebP — max 8 MB

or paste URL

Welcome back

Sign in to your RewardGuard account

Interactive Demo

Choose Your Demo Mode

Experience reward hacking in real time — from unchecked exploitation to full automated alignment.

Unprotected

Normal Q-Learning

No protection active. The AI discovers that staying alive beats eating food — farming infinite survival rewards while training silently fails.

  • Reward hacking escalates unchecked
  • No detection or alerts
  • Agent gets stuck in exploit loop
  • Training corrupted permanently
🛡
Free Plan

Manual Guard

RewardGuard detects the imbalance and tells you exactly what to fix. But auto-correction is disabled — you must apply the fix yourself.

  • Detection & diagnosis included
  • Step-by-step fix instructions
  • Manual adjustment required
  • Only works if you follow the guide
Recommended
Premium

Auto-Correction

RewardGuard automatically detects reward imbalances and corrects them in real time — no manual work, no blind spots, no failed training.

  • Real-time continuous monitoring
  • Automatic reward rebalancing
  • Full alignment guaranteed
  • Cycle repeats — never misses a drift

Watch RewardGuard Fix
Reward Hacking Live

A snake AI is exploiting its reward function — surviving without eating food. RewardGuard detects the imbalance and auto-corrects it in real time.

snake_env.py — live simulation
Survival
0
Food
0
Episode
1
⚠ Reward Hacking Detected — survival/food ratio critical
rewardguard monitor
Adjustment Log 0 adjustments
Time Parameter Before After Reason
Waiting for adjustments…

Auto-adjustment is a Premium feature.

Get Premium