benchmark test use vllm ? input/output=500/2000 ?

#6
by chuanyizjc - opened

image.png

now test, nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 throughtput ~1k, dont 4x improve . want to know why ?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment