UPDATED FIXED!! Template problem?
I used llama.cpp and this dropped:
common_chat_templates_init: failed to parse chat template (defaulting to chatml): Expected value expression at row 18, column 30:
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
Not sure if it's a support or template problem.
I used llama.cpp and this dropped:
common_chat_templates_init: failed to parse chat template (defaulting to chatml): Expected value expression at row 18, column 30: {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %} {%- for message in messages[::-1] %}
Not sure if it's a support or template problem.
https://www.reddit.com/r/LocalLLaMA/comments/1kab9po/bug_in_unsloth_qwen3_gguf_chat_template/
It's a llama.cpp chat template problem - we're working on a fix and will be reuploading ALL of our models :(
The models should currently work as normal on Ollama, LM STudio etc I think
Be sure to fix the template for LM Studio too please! Don't want to have to upload again after this! The template on the LM Studio GGUF from their repo, which comes from bartowski I think do work. I had Claude compare the ones from the Unsloth Qwen3 8B 128K Q5 GGUF and LM Studios Qwen3 8B Q4 GGUF and it said they were basically the same by LM Studios was a bit more verbose the way it did a couple lines.
May I ask... How to launch Qwen3 without thinking mode using llama-server?
Be sure to fix the template for LM Studio too please! Don't want to have to upload again after this! The template on the LM Studio GGUF from their repo, which comes from bartowski I think do work. I had Claude compare the ones from the Unsloth Qwen3 8B 128K Q5 GGUF and LM Studios Qwen3 8B Q4 GGUF and it said they were basically the same by LM Studios was a bit more verbose the way it did a couple lines.
Our chat templates are the same. Our one should work as normal in LM Studio as well. It seems to be a llama.cpp isolated issue
May I ask... How to launch Qwen3 without thinking mode using llama-server?
You can just pass a system prompt with /no_think in it?
Thanks.
So, just to throw some info in here (may or may not be related to quants): I am using LM Studio 0.3.15, CUDA 12 Runtime version 1.28.0, and I had issues with the default jinja template for both 32b and 30b using UD-Q4KXL. I had to replace the default with bartowski's jinja template and it started working.
So, just to throw some info in here (may or may not be related to quants): I am using LM Studio 0.3.15, CUDA 12 Runtime version 1.28.0, and I had issues with the default jinja template for both 32b and 30b using UD-Q4KXL. I had to replace the default with bartowski's jinja template and it started working.
Oh ok thanks for letting us know! we're still gonna do an entire reupload of all the models
So, just to throw some info in here (may or may not be related to quants): I am using LM Studio 0.3.15, CUDA 12 Runtime version 1.28.0, and I had issues with the default jinja template for both 32b and 30b using UD-Q4KXL. I had to replace the default with bartowski's jinja template and it started working.
Oh ok thanks for letting us know! we're still gonna do an entire reupload of all the models
You are not using Xet? Huggingface said it would be faster to upload and download.
@Noonecares52647842 @cduk @phazei @CHNtentes @1ikhan
Hey guys we reuploaded all of them. Should work on any platform and they now ALL Work! Pls let us know how it goes
@Noonecares52647842 @cduk @phazei @CHNtentes @1ikhan
Hey guys we reuploaded all of them. Should work on any platform and they now ALL Work! Pls let us know how it goes
Working as far as I can tell. Thanks!