1sec.ai
Back to feed
research14h ago

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

aarXivscore 0.33

A new model-based approach to preference-based RL actively directs exploration by jointly reasoning over uncertainties in reward, dynamics, and value functions, improving sample efficiency and addressing the limitations of existing methods.

Key takeaways

  • Introduces a model-based approach to preference-based RL.
  • Jointly reasons over uncertainties in reward, dynamics, and value functions.
  • Active exploration for improved sample efficiency.
research14h ago

UBP2: Uncertainty-Balanced Preference Planning for Efficient Preference-based Reinforcement Learning

A new model-based approach to preference-based RL actively directs exploration by jointly reasoning over uncertainties in reward, dynamics, and value functions, improving sample efficiency and addressing the limitations of existing methods.

Key takeaways

  • Introduces a model-based approach to preference-based RL.
  • Jointly reasons over uncertainties in reward, dynamics, and value functions.
  • Active exploration for improved sample efficiency.