llama.cpp - how to free up even more space on your GPU
llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.
Key takeaways
- llama.cpp has stable RAM usage with no memory leaks.
- --n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
- Seeking tips to free up more memory for larger context sizes.
llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.
Key takeaways
- llama.cpp has stable RAM usage with no memory leaks.
- --n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
- Seeking tips to free up more memory for larger context sizes.