microsoft/Phi-3-mini-4k-instruct-onnx · How to turn off byte-fallback for Phi-3's tokenizer?

I have been trying out Phi-3 models and it's been a wonderful experience.

However, sometimes the tokenizer throws exception:

The line of code

text =self.tokenizer.decode(output_tokens)

throws Exception: 'utf-8' codec can't decode byte 0xf0 in position 10283: invalid continuation byte

Most of the time this happened when the model's output was quite long (~800 words, and if count in the brackets, dots, ... it's ~1.4k element; this is still far from the max_length 4196 imo)

I have researched around and find out that this can be fixed by turning off the byte-fallback of the BPE tokenizer, then the tokenizer will ignore the non-utf8 tokens.

I have tried

Tweaked the tokenizer.json file:

Set the model/byte_fallback to false
and remove the item {"type": "ByteFallback"} in decoder/decoders section

but the errors still happens.

I am using the mini-4k-intruct onnx-cuda-int14 version, btw.

I wonder

Why did my changes not work and is there anyway to fix this?
Thanks for every help and suggestion!

(Note: This is also posted as an issue on Phi-3CookBook github repo: issue #14)