Wasserstein Policy Learning for Distributional Outcomes

aarXivscore 0.36

Offline policy learning is studied for distribution-valued outcomes, where each potential outcome is a probability measure on R and the reward is defined through a utility functional applied to the potential outcomes. The Wasserstein distance is used to define the reward, and the goal is to learn a policy that maximizes the empirical welfare defined as the mean of the scalar-valued potential outcomes.

Key takeaways

Offline policy learning studied for distribution-valued outcomes.
Wasserstein distance used to define reward.
Utility functional applied to define reward.

#causal inference #offline policy learning #distribution-valued outcomes

Read the original

Feed

research18h ago

Wasserstein Policy Learning for Distributional Outcomes

aarXiv

Key takeaways

Offline policy learning studied for distribution-valued outcomes.
Wasserstein distance used to define reward.
Utility functional applied to define reward.

#causal inference #offline policy learning #distribution-valued outcomes

Read at arXiv