1sec.ai
Back to feed
research1d ago

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

aarXivscore 0.23

Researchers developed Ternary Mamba, a method for compressing State Space Models like Mamba-2 through grouped quantization-aware training. This approach enables significant memory reduction without extensive retraining from scratch. The compressed model achieves 48.1% zero-shot accuracy across 7 tasks, making it suitable for edge deployment where memory is limited. Builders can apply this method to optimize models for low-memory environments.

Key takeaways

  • Ternary Mamba compresses Mamba-2 1.3B model from 2,687 MB to 744 MB.
  • Achieves 48.1% zero-shot accuracy on 7-task average.
  • Reduces token budget by 1,000x compared to training from scratch.
research1d ago

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

Researchers developed Ternary Mamba, a method for compressing State Space Models like Mamba-2 through grouped quantization-aware training. This approach enables significant memory reduction without extensive retraining from scratch. The compressed model achieves 48.1% zero-shot accuracy across 7 tasks, making it suitable for edge deployment where memory is limited. Builders can apply this method to optimize models for low-memory environments.

Key takeaways

  • Ternary Mamba compresses Mamba-2 1.3B model from 2,687 MB to 744 MB.
  • Achieves 48.1% zero-shot accuracy on 7-task average.
  • Reduces token budget by 1,000x compared to training from scratch.