1sec.ai
Back to feed
research18h ago

Pareto Q-Learning with Reward Machines

aarXivscore 0.36

Researchers introduced Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm that combines Pareto Q-Learning and Q-Learning with Reward Machines. PQLRM approximates the Pareto front by maintaining sets of vector-valued Q-estimates and exploits the factored automaton structure of the reward signal. This algorithm enables efficient handling of complex reward structures in multi-objective tasks. You can explore the approach in a new research paper.

Key takeaways

  • PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
  • Approximates Pareto front with vector-valued Q-estimates.
  • Exploits factored automaton structure of reward signal.
research18h ago

Pareto Q-Learning with Reward Machines

Researchers introduced Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm that combines Pareto Q-Learning and Q-Learning with Reward Machines. PQLRM approximates the Pareto front by maintaining sets of vector-valued Q-estimates and exploits the factored automaton structure of the reward signal. This algorithm enables efficient handling of complex reward structures in multi-objective tasks. You can explore the approach in a new research paper.

Key takeaways

  • PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
  • Approximates Pareto front with vector-valued Q-estimates.
  • Exploits factored automaton structure of reward signal.