Tag

#memory-management

Every item tagged memory-management, newest first.

2 items

Lemonade v10.8: auto memory management, cloud offload, Omni improvements, and call your local models as MCP tools

Lemonade v10.8 brings auto memory management, cloud offload, and Omni improvements, allowing for dynamic VRAM management, automatic context sizing, and easier model switching. The update was driven by 20 contributors in 7 days. You can now call local models as MCP tools, streamlining workflows. These changes aim to enhance performance and user experience.

Key takeaways

20 contributors in 7 days for v10.8 release
Dynamic VRAM management auto-unloads idle models
Automatic context sizing based on available memory and model architecture

rr/LocalLLaMA#local-llm #memory-management #cloud-integration

other14h

llama.cpp - how to free up even more space on your GPU

llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.

Key takeaways

llama.cpp has stable RAM usage with no memory leaks.
--n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
Seeking tips to free up more memory for larger context sizes.

rr/LocalLLaMA#local-llm #gpu-optimization #memory-management