When we started building RewardGuard, the first question we had to answer was: what's the business model? The answer felt obvious at first — charge for the tool. Build something useful, put it behind a subscription, grow the company.
The more we thought about it, the less comfortable we were with that answer. The problem RewardGuard addresses — reward hacking and misalignment in RL systems — isn't just a problem for well-funded labs. It's a problem for every student training a gym environment, every startup iterating on a fine-tuned model, every researcher who can't afford enterprise tooling. If we kept the detection layer behind a paywall, those people would keep training blind.
So we made a different call: the detection engine is free forever. What we sell is the layer on top of it.
What "Free Detection" Actually Means
RewardGuard is two things in one package. There's the detection layer — the component that ingests reward signals, computes component ratios, applies statistical baselines, and flags anomalies. And there's the action layer — the component that interprets those flags, recommends specific parameter changes, and in premium mode, applies corrections automatically.
The detection layer is free. Anyone can install it and use it in any project — commercial or otherwise. The action layer, the auto-correction engine, the advanced reporting interface, and the integrations are what we charge for.
You should always be able to see what your agent is doing. Detection is a transparency tool — it belongs in the commons. Automated correction and production-grade tooling are where we build a business.
What's Free vs. What's Premium
To be concrete about the boundary:
| Feature | Free | Premium |
|---|---|---|
| Component reward logging | ✓ | ✓ |
| Ratio computation & anomaly detection | ✓ | ✓ |
| Training run reports (JSON/text) | ✓ | ✓ |
| Threshold configuration | ✓ | ✓ |
| CI/CD integration hooks | ✓ | ✓ |
| Guided rebalancing suggestions | — | ✓ |
| Auto-correction engine | — | ✓ |
| Live training dashboard | — | ✓ |
| Multi-run comparison | — | ✓ |
| Priority support & SLA | — | ✓ |
The free tier gives you everything you need to know what's going wrong. The premium tier gives you the tools to fix it faster and at scale.
The Practical Argument for Free Detection
There's a pragmatic reason to keep detection free beyond the idealistic one: widely used tools get better faster. The RL research community includes thousands of people who are deeply familiar with the failure modes we're trying to detect. Some of them will use the free tool and find edge cases we never thought of. Some will report detection heuristics for environments we haven't built test cases for. Some will file issues that turn into features.
The result is a better free tool and, because the premium tier depends on the same detection layer, a better premium product too.
What Stays Premium, and Why
The auto-correction engine is not free. This is about incentive structures — if the auto-correction engine were free, there would be no reason to pay for premium, which means there would be no revenue to fund continued development of either tier. A product that can't sustain itself eventually stagnates.
We've seen this pattern in developer tooling: companies that try to give everything away often end up with great free tools that get abandoned when funding runs out. The free-tier model — free detection for everyone, paid tier for production teams — is how you build something that can keep improving.
We want the detection layer to be the standard instrumentation for RL training workflows, the way coverage tools became standard for software testing. That only happens if it's free and accessible.
How to Share Feedback
We actively improve the detection layer based on real-world usage. If you run into something unexpected, reach out:
- New detection heuristics: If you've encountered a reward hacking pattern that existing ratio analysis doesn't catch, email us with a minimal reproducing example. A good description of the failure mode is more valuable than anything else — we can build the detector once we understand it.
- Environment integrations: RewardGuard works with any RL loop by design, but if you hit friction integrating with a specific environment (Gymnasium, PettingZoo, custom sim environments), let us know.
- Bug reports: Edge cases in threshold computation, unexpected behavior with sparse reward environments, issues with specific Python versions — contact us.
Monitoring and transparency in AI training shouldn't be a premium feature. If you're training an RL agent anywhere — in a research lab, a startup, a class project — you should be able to see what your reward signal is actually doing. That's what the free tier is for. Everything else is how we make sure it's still here in five years.