OOMS on 8 GB GPU, is it normal?

#2
by tanimazsin130 - opened

It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.

same happens to me :/

I use the corpus of BeIR/nq to generate sentences. Here is my test results (use_fp16=True, Linux, A800 GPU):

  • model.encode(sentences, batch_size=128, max_length=512): 5.9GB / GPU
  • model.encode(sentences, batch_size=200, max_length=512): 7.6GB / GPU
  • model.encode(sentences, batch_size=256, max_length=512): 9.0GB / GPU
  • model.encode(sentences, batch_size=256, max_length=256): 5.7GB / GPU

The default parameters are batch_size=256, max_length=512, so it's normal if you run the examples directly. To solve the problem, you have two choices:

  • set shorter max_length, if the sentences consists mostly of short sequences
  • set smaller batch_size

in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW

I've been trying to evaluate it. It took 20GB of my GPU. Is there any way that it can be prevented?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment