Back to feed
research14h ago
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning
A new model-based approach to preference-based RL actively directs exploration by jointly reasoning over uncertainties in reward, dynamics, and value functions, improving sample efficiency and addressing the limitations of existing methods.
Key takeaways
- Introduces a model-based approach to preference-based RL.
- Jointly reasons over uncertainties in reward, dynamics, and value functions.
- Active exploration for improved sample efficiency.
research14h ago
UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning
A new model-based approach to preference-based RL actively directs exploration by jointly reasoning over uncertainties in reward, dynamics, and value functions, improving sample efficiency and addressing the limitations of existing methods.
Key takeaways
- Introduces a model-based approach to preference-based RL.
- Jointly reasons over uncertainties in reward, dynamics, and value functions.
- Active exploration for improved sample efficiency.