#llm-reasoning — 1sec.ai

Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

Researchers propose MAST, a mechanism-guided method for selectively unlearning RLVR-induced reasoning in LLMs with less collateral damage than full-parameter updates. Evaluations on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base show MAST preserves performance on MATH and GSM8K benchmarks while reducing damage. This approach helps builders mitigate negative effects of RLVR on reasoning tasks.

Key takeaways

MAST reduces collateral damage from unlearning RLVR-induced reasoning.
Preserves performance on MATH and GSM8K benchmarks.
Evaluated on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base models.

aarXiv#llm-reasoning #selective-unlearning #mechanism-guided