Cheapest way to run GLM 5.x locally that's not a unified memory system?

The discussion explores cost-effective ways to locally run GLM 5.x models, focusing on 4bit quantization. Users share experiences with CPU-only setups like Sapphire Rapids ES 56core + DDR5 and multi-GPU configurations with partial offloading. The conversation aims to identify viable options for running large models like GLM 5.x outside unified memory systems. You can consider various hardware configurations for efficient local deployment.

Key takeaways

Sapphire Rapids ES 56core + DDR5 is a potential option for running GLM 5.x locally.
Multi-GPU setups with partial offloading are also being explored.
The discussion is not limited to GLM 5.x, but also applies to similarly sized models.

#local-llm #hardware #quantization

Read the original

Feed

other1d ago

Cheapest way to run GLM 5.x locally that's not a unified memory system?

rr/LocalLLaMA

Key takeaways

Sapphire Rapids ES 56core + DDR5 is a potential option for running GLM 5.x locally.
Multi-GPU setups with partial offloading are also being explored.
The discussion is not limited to GLM 5.x, but also applies to similarly sized models.

#local-llm #hardware #quantization

Read at r/LocalLLaMA