didn't work with ollama: out of memory

#46
by AlekseyStart - opened

I have Ryzen 9 7950x3d 64 gb RAM, rtx 3090 24 gb VRAM and trying to run DeepSeek-R1-UD-IQ1

it's work for me however, the model does not use RAM and VRAM, only the cash and the swap file, so the speed is 1,42 token per second:
$ llama-b4872-bin-ubuntu-x64/build/bin/llama-cli --model DeepSeek-R1-UD-IQ1_S.gguf --cache-type-k q4_0 --threads 24 --prio 2 --temp 0.6 --ctx-size 8192 --seed 3407 --n-gpu-layers 7 -no-cnv --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"

I want to launch a model on ollama, merge it, make Modelfile:

Modelfile
FROM ./DeepSeek-R1-UD-IQ1.gguf

PARAMETER num_ctx 4096
PARAMETER temperature 0.6

but I get an error:
$ OLLAMA_GPULAYERS=7 OLLAMA_MMLOCK=0 ollama run DeepSeek-R1-UD-IQ1
Error: llama runner process has terminated: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 6612361216
llama_init_from_model: failed to allocate compute buffers

how to set up ollama correctly?

You are such an idiot, can't you add virtual memory?

there is 128 GB of swap, this does not eliminate the error

You are such an idiot, you increased the virtual memory to 512GB

Sign up or log in to comment