Pareto Q-Learning with Reward Machines

aarXivscore 0.36

Researchers introduced Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm that combines Pareto Q-Learning and Q-Learning with Reward Machines. PQLRM approximates the Pareto front by maintaining sets of vector-valued Q-estimates and exploits the factored automaton structure of the reward signal. This algorithm enables efficient handling of complex reward structures in multi-objective tasks. You can explore the approach in a new research paper.

Key takeaways

PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
Approximates Pareto front with vector-valued Q-estimates.
Exploits factored automaton structure of reward signal.

#multi-objective #reinforcement-learning #research

Read the original

Feed

research18h ago

Pareto Q-Learning with Reward Machines

aarXiv

Key takeaways

PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
Approximates Pareto front with vector-valued Q-estimates.
Exploits factored automaton structure of reward signal.

#multi-objective #reinforcement-learning #research

Read at arXiv