Tag

#self-distillation

Every item tagged self-distillation, newest first.

3 items

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Researchers propose rubric-conditioned self-distillation, a new method for post-training reasoning language models that reduces reliance on expensive and potentially noisy chain-of-thought annotations. This approach uses evaluative feedback to improve model performance without requiring detailed rationales. The method aims to enhance model accuracy and efficiency by leveraging verified rewards.

Key takeaways

Rubric-conditioned self-distillation reduces need for chain-of-thought annotations.
Method uses evaluative feedback to improve model performance.
Approach aims to enhance model accuracy and efficiency.

aarXiv#reasoning-language-models #post-training #self-distillation

research18h

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

Researchers propose ViGOS, a visually grounded on-policy self-distillation framework for multimodal large language models (MLLMs). The method aims to prevent shortcuts in training where the model relies too heavily on text targets rather than image inputs. ViGOS guides the student model to use both visual and textual information effectively. This approach can improve the robustness of MLLMs in tasks that require multimodal understanding.

Key takeaways

ViGOS framework proposed for visually grounded on-policy self-distillation in MLLMs.
Method aims to prevent shortcuts relying on text targets over image inputs.
Improves robustness in multimodal tasks requiring both visual and textual understanding.

aarXiv#multimodal-learning #self-distillation #llms

research1d

Learning from the Self-future: On-policy Self-distillation for dLLMs

Researchers introduced d-OPSD, an on-policy self-distillation framework designed for diffusion large language models (dLLMs). Existing methods for autoregressive models don't work for dLLMs due to their different generation processes. The new framework adapts OPSD to dLLMs' arbitrary-order generation. This development could help improve the efficiency and performance of dLLMs.

Key takeaways

d-OPSD is the first OPSD framework for diffusion LLMs.
Existing OPSD methods are not suitable for dLLMs due to their autoregressive nature.
d-OPSD adapts to dLLMs' arbitrary-order generation process.

aarXiv#diffusion-llms #self-distillation #on-policy-learning