#multi-objective — 1sec.ai

Pareto Q-Learning with Reward Machines

Researchers introduced Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm that combines Pareto Q-Learning and Q-Learning with Reward Machines. PQLRM approximates the Pareto front by maintaining sets of vector-valued Q-estimates and exploits the factored automaton structure of the reward signal. This algorithm enables efficient handling of complex reward structures in multi-objective tasks. You can explore the approach in a new research paper.

Key takeaways

PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
Approximates Pareto front with vector-valued Q-estimates.
Exploits factored automaton structure of reward signal.

aarXiv#multi-objective #reinforcement-learning #research