Tutorials, case studies, and research notes on reward hacking, RL alignment, and building safer AI systems.
No spam. Deep-dives on reward hacking, alignment research, and RL best practices — when we publish, not on a schedule.