research18h
Pareto Q-Learning with Reward Machines
Researchers introduced Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm that combines Pareto Q-Learning and Q-Learning with Reward Machines. PQLRM approximates the Pareto front by maintaining sets of vector-valued Q-estimates and exploits the factored automaton structure of the reward signal. This algorithm enables efficient handling of complex reward structures in multi-objective tasks. You can explore the approach in a new research paper.
Key takeaways
- PQLRM combines Pareto Q-Learning and Q-Learning with Reward Machines.
- Approximates Pareto front with vector-valued Q-estimates.
- Exploits factored automaton structure of reward signal.