#causal inference — 1sec.ai

Wasserstein Policy Learning for Distributional Outcomes

Offline policy learning is studied for distribution-valued outcomes, where each potential outcome is a probability measure on R and the reward is defined through a utility functional applied to the potential outcomes. The Wasserstein distance is used to define the reward, and the goal is to learn a policy that maximizes the empirical welfare defined as the mean of the scalar-valued potential outcomes.

Key takeaways

Offline policy learning studied for distribution-valued outcomes.
Wasserstein distance used to define reward.
Utility functional applied to define reward.

aarXiv#causal inference #offline policy learning #distribution-valued outcomes