1sec.ai

Tag

#llm-training

Every item tagged llm-training, newest first.

2 items

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Researchers analyzed token-level entropy dynamics in GRPO, a post-training paradigm for LLMs, and found a credit assignment mismatch causing policy entropy collapse. They propose STARE, a method to reweight advantages and stabilize policy entropy. This addresses a key limitation of GRPO, enabling more stable training of complex reasoning in LLMs. You can apply STARE to improve GRPO's performance in your own LLM training workflows.

Key takeaways
  • GRPO suffers from policy entropy collapse due to token-level credit assignment mismatch.
  • STARE reweights advantages to stabilize policy entropy in GRPO.
  • STARE improves stability of complex reasoning training in LLMs.
otherApr 26

Databricks ❤️ Hugging Face: up to 40% faster training and tuning of Large Language Models

Databricks and Hugging Face collaborated on optimized LLM training and fine-tuning workflows. The integration enables up to 40% faster training and tuning of large language models. You can deploy these optimized workflows on Databricks' cloud infrastructure. This partnership aims to make large-scale LLM development more efficient.

Key takeaways
  • Up to 40% faster LLM training and tuning.
  • Optimized workflows available on Databricks' cloud infrastructure.
  • Partnership targets efficient large-scale LLM development.