Learning User Simulators with Turing Rewards

aarXivscore 0.33

Researchers propose Turing-RL, a reinforcement learning approach for training user simulator models based on the Turing Test. This method trains large language models to simulate human users by maximizing their ability to fool a human evaluator into thinking they are real. The approach aims to improve simulator realism and usefulness across applications like agent training and personalization evaluation.

Key takeaways

Turing-RL uses a Turing-Test-based reward to train user simulators.
Goal is to improve simulator realism for applications like agent training.
Method trains LLMs to fool human evaluators into thinking they are real users.

#reinforcement-learning #user-simulation #turing-test

Read the original

Feed

research14h ago

Learning User Simulators with Turing Rewards

aarXiv

Key takeaways

Turing-RL uses a Turing-Test-based reward to train user simulators.
Goal is to improve simulator realism for applications like agent training.
Method trains LLMs to fool human evaluators into thinking they are real users.

#reinforcement-learning #user-simulation #turing-test

Read at arXiv