models2h ago

poolside/Laguna-M.1 · Hugging Face - 225B-A23B

rr/LocalLLaMAscore 0.49

poolside released Laguna M.1, a 225B parameter Mixture-of-Experts model targeting agentic coding and long-horizon tasks. The model has 23B activated parameters per token and uses a 70-layer MoE transformer architecture. It features high-capacity expert routing with 256 experts and top-k=16 routing. You can explore Laguna M.1 on the Hugging Face platform.

Key takeaways

225B total parameters, 23B activated per token
70-layer MoE transformer architecture
256 experts with top-k=16 routing

#mixture-of-experts #transformer #agentic-coding

Read the original

models2h ago

poolside/Laguna-M.1 · Hugging Face - 225B-A23B

poolside released Laguna M.1, a 225B parameter Mixture-of-Experts model targeting agentic coding and long-horizon tasks. The model has 23B activated parameters per token and uses a 70-layer MoE transformer architecture. It features high-capacity expert routing with 256 experts and top-k=16 routing. You can explore Laguna M.1 on the Hugging Face platform.

Key takeaways

225B total parameters, 23B activated per token
70-layer MoE transformer architecture
256 experts with top-k=16 routing

#mixture-of-experts #transformer #agentic-coding

Read at r/LocalLLaMA