How to use model across multiple GPUs

#28
by aswad546 - opened

Hello,

Thank you for sharing this model. This may be a basic question but I have two A100 GPUs with 80 and 40GB of VRAM respectively and I want to use the model mainly for inference. I know I cannot fit the full 16 bit model on my setup but there is a version available that has 8 bit quantized weights. The parameters for that model are 95GB and can ideally fit on my setup with a relatively smaller context window. But I am unclear on how I can split the model across GPUs so that I can do inference since the model cannot fit on one GPU in any scenario.

Any pointers would be appreciated.

Thank you!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment