research14h
Learning User Simulators with Turing Rewards
Researchers propose Turing-RL, a reinforcement learning approach for training user simulator models based on the Turing Test. This method trains large language models to simulate human users by maximizing their ability to fool a human evaluator into thinking they are real. The approach aims to improve simulator realism and usefulness across applications like agent training and personalization evaluation.
Key takeaways
- Turing-RL uses a Turing-Test-based reward to train user simulators.
- Goal is to improve simulator realism for applications like agent training.
- Method trains LLMs to fool human evaluators into thinking they are real users.