A quick test using M1 Max (64G) and Word
#16 opened 1 day ago
by
gptlocalhost

Awesome model! Can we get a version with a larger context window?
#15 opened 1 day ago
by
seall0

Fix template when add_generation_prompt=true
#14 opened 4 days ago
by
matteogeniaccio
It supports Serbo-Croatian language very well!
2
2
#13 opened 5 days ago
by
JLouisBiz

GPTQ or AWQ Quants
2
#12 opened 5 days ago
by
guialfaro
Great job, thanks for this model.
4
4
#11 opened 6 days ago
by
Dampfinchen
recommended sampling parameters?
1
#10 opened 8 days ago
by
AaronFeng753
Can we have some more popular benchmarks
1
#8 opened 9 days ago
by
rombodawg

The model is the best for coding.
3
3
#7 opened 12 days ago
by
AekDevDev

When running with a single GPU, I get an error saying the VRAM is insufficient. However, when using multiple GPUs on a single machine, there are many errors. My vllm version is 0.8.4.
1
#6 opened 12 days ago
by
hanson888

BitsAndBytes quantization inference error
1
#5 opened 12 days ago
by
chengfy

Some bug when using function call with vllm==0.8.4
2
#4 opened 13 days ago
by
waple

SimpleQA Scores Are WAY off
4
5
#3 opened 14 days ago
by
phil111
Need fp8 version for inerface
1
#2 opened 15 days ago
by
iwaitu

RuntimeError: CUDA error: device-side assert triggered
#1 opened 15 days ago
by
DsnTgr