1sec.ai
Back to feed
other14h ago

llama.cpp - how to free up even more space on your GPU

rr/LocalLLaMAscore 0.33

llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.

Key takeaways

  • llama.cpp has stable RAM usage with no memory leaks.
  • --n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
  • Seeking tips to free up more memory for larger context sizes.
other14h ago

llama.cpp - how to free up even more space on your GPU

llama.cpp has improved RAM usage, eliminating memory leaks and allowing efficient GPU usage with models like Qwen3.6-27B-UD-Q5_K_XL. The author seeks advice on further reducing memory usage to increase context size on their eGPU setup with a 3090. They currently use --n-gpu-layers 99 --no-mmap --mlock. You can experiment with adjusting these parameters or explore quantization techniques.

Key takeaways

  • llama.cpp has stable RAM usage with no memory leaks.
  • --n-gpu-layers 99 --no-mmap --mlock config avoids regular RAM usage.
  • Seeking tips to free up more memory for larger context sizes.