other1d ago

Be wary of Qwen/Claude distillations - they're often worse than the base model

rr/LocalLLaMAscore 0.27

A Reddit user warns that distilled/finetuned models like Qwopus, based on Qwen or Claude, often perform worse than their base models. The user aims to inform, not criticize, creators of these models. This issue may apply to other distilled models, such as Gemma 4/Claude. You should evaluate these models carefully before using them.

Key takeaways

Distilled Qwen/Claude models can be worse than base models.
Issue may apply to other distilled models like Gemma 4/Claude.
User aims to inform, not criticize, model creators.

#local-llm #distillation #model-performance

Read the original

other1d ago

Be wary of Qwen/Claude distillations - they're often worse than the base model

A Reddit user warns that distilled/finetuned models like Qwopus, based on Qwen or Claude, often perform worse than their base models. The user aims to inform, not criticize, creators of these models. This issue may apply to other distilled models, such as Gemma 4/Claude. You should evaluate these models carefully before using them.

Key takeaways

Distilled Qwen/Claude models can be worse than base models.
Issue may apply to other distilled models like Gemma 4/Claude.
User aims to inform, not criticize, model creators.

#local-llm #distillation #model-performance

Read at r/LocalLLaMA