research15h
i post-trained a model to reliably roll a die
A Reddit user post-trained a model to reliably roll a die, achieving roughly equal probability for each number. This demonstrates overcoming a common issue in reinforcement learning where models often rely on known strategies rather than exploring new actions. The experiment shows that with post-training, a model can learn to generate truly random outputs. This has implications for builders working on applications requiring unpredictable behavior.
Key takeaways
- Model post-trained to roll a die with roughly equal probability for each number.
- Demonstrates overcoming common RL issue of model relying on known strategies.
- Experiment shows post-training can achieve truly random outputs.