BAAI/bge-m3 · OOMS on 8 GB GPU, is it normal?

tanimazsin130

Jan 31, 2024

It gives OOM error even though use_fp16=True is set. is this normal? I am running on 8 gb rtx 3070 graphics card.

MarcRibs

Jan 31, 2024

same happens to me :/

hanhainebula

Jan 31, 2024

I use the corpus of BeIR/nq to generate sentences. Here is my test results (use_fp16=True, Linux, A800 GPU):

model.encode(sentences, batch_size=128, max_length=512): 5.9GB / GPU
model.encode(sentences, batch_size=200, max_length=512): 7.6GB / GPU
model.encode(sentences, batch_size=256, max_length=512): 9.0GB / GPU
model.encode(sentences, batch_size=256, max_length=256): 5.7GB / GPU

The default parameters are batch_size=256, max_length=512, so it's normal if you run the examples directly. To solve the problem, you have two choices:

set shorter max_length, if the sentences consists mostly of short sequences
set smaller batch_size

prudant

Feb 11, 2024

•

edited Feb 12, 2024

in my case for model.encode(sentences, batch_size=1, max_length=5000): 10.5GB VRAM
i'm testing the model for multilang retrieve and re-rank and works pretty good, but demands a lot of VRAM, i dont know if quants are possible with this model's arch, but loading in 8 bits would be a FTW

nafi-ahmed

5 days ago

I've been trying to evaluate it. It took 20GB of my GPU. Is there any way that it can be prevented?