other14h ago

llama.cpp - how to free up even more space on your GPU

rr/LocalLLaMAscore 0.33

llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.

Key takeaways

llama.cpp has stable RAM usage with no memory leaks.
--n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
Seeking tips to free up more memory for larger context sizes.

#local-llm #gpu-optimization #memory-management

Read the original

other14h ago

llama.cpp - how to free up even more space on your GPU

llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.

Key takeaways

llama.cpp has stable RAM usage with no memory leaks.
--n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
Seeking tips to free up more memory for larger context sizes.

#local-llm #gpu-optimization #memory-management

Read at r/LocalLLaMA