Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

aarXivscore 0.23

Researchers developed Ternary Mamba, a method for compressing State Space Models like Mamba-2 through grouped quantization-aware training. This approach enables significant memory reduction without extensive retraining from scratch. The compressed model achieves 48.1% zero-shot accuracy across 7 tasks, making it suitable for edge deployment where memory is limited. Builders can apply this method to optimize models for low-memory environments.

Key takeaways

Ternary Mamba compresses Mamba-2 1.3B model from 2,687 MB to 744 MB.
Achieves 48.1% zero-shot accuracy on 7-task average.
Reduces token budget by 1,000x compared to training from scratch.

#state-space-models #quantization #model-compression

Read the original

Feed

research1d ago

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

aarXiv

Key takeaways

Ternary Mamba compresses Mamba-2 1.3B model from 2,687 MB to 744 MB.
Achieves 48.1% zero-shot accuracy on 7-task average.
Reduces token budget by 1,000x compared to training from scratch.

#state-space-models #quantization #model-compression

Read at arXiv