#randomness — 1sec.ai

i post-trained a model to reliably roll a die

A Reddit user post-trained a model to reliably roll a die, achieving roughly equal probability for each number. This demonstrates overcoming a common issue in reinforcement learning where models often rely on known strategies rather than exploring new actions. The experiment shows that with post-training, a model can learn to generate truly random outputs. This has implications for builders working on applications requiring unpredictable behavior.

Key takeaways

Model post-trained to roll a die with roughly equal probability for each number.
Demonstrates overcoming common RL issue of model relying on known strategies.
Experiment shows post-training can achieve truly random outputs.

rr/LocalLLaMA#reinforcement-learning #randomness #post-training