benchmark test use vllm ? input/output=500/2000 ?
#6
by
chuanyizjc
- opened
now test, nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 throughtput ~1k, dont 4x improve . want to know why ?
now test, nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 throughtput ~1k, dont 4x improve . want to know why ?