If possible, can you also share the "vocab.yml" file you used in the training?
According to my first evaluations, it seems very successful. If possible, can you also share the "vocab.yml" file you used in the training? I want to convert this model to .safetensors format, which can be used outside of marian-decoder. The "convert_marian_to_pytorch.py" script used for this and found in the huggingface library also requires the "vocab.yml" file. The "model.tr-en.vocab" file you shared does not work for us.
I found a solution, I am sharing it so that those who need it can use it.
import sentencepiece as spm
import yaml
sp = spm.SentencePieceProcessor()
sp.load('model.tr-en.spm')
vocab = {}
for i in range(sp.get_piece_size()):
token = sp.id_to_piece(i)
vocab[token] = i
with open('vocab.yml', 'w', encoding='utf-8') as f:
yaml.dump(vocab, f, allow_unicode=True)