#on-policy-learning — 1sec.ai

Learning from the Self-future: On-policy Self-distillation for dLLMs

Researchers introduced d-OPSD, an on-policy self-distillation framework designed for diffusion large language models (dLLMs). Existing methods for autoregressive models don't work for dLLMs due to their different generation processes. The new framework adapts OPSD to dLLMs' arbitrary-order generation. This development could help improve the efficiency and performance of dLLMs.

Key takeaways

d-OPSD is the first OPSD framework for diffusion LLMs.
Existing OPSD methods are not suitable for dLLMs due to their autoregressive nature.
d-OPSD adapts to dLLMs' arbitrary-order generation process.

aarXiv#diffusion-llms #self-distillation #on-policy-learning