Help with clarifying something please - Can i run this on a single 4090?
#3
by
Restrected
- opened
I have two 4090s in a epyc 32 core server, but i see alot of smaller models actually perform impresively in very little hardware (relatively), like my m3 max macbook pro. It runs 7B models beautifully.
I am trying to install a few LLMs that are running concurently, so im running them on the server with a webpage for access, but i need to figure out how to run bigger models like this in a single card. Anyone can direct me somewhere?