eos_token becomes <|im_end|> after GGUF conversion with llama.cpp, and generation never terminates

by mmnga - opened 10 days ago

10 days ago

I’d like to report an issue I encountered when converting a model to GGUF with llama.cpp: the eos_token is set to <|im_end|>, which prevents generation from terminating properly.
Updating tokenizer_config.json as follows resolves the problem:

  "eos_token": "<|im_end|>",

→

  "eos_token": "<｜end▁of▁sentence｜>",

After making this change, the model stops correctly at the end-of-sentence token.
Additionally, I hope this information is helpful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment