Transformers
ctranslate2
int8
float16
Composer
MosaicML
llm-foundry

request for fixing inference script to get controlled the text output generation

#1
by g30rv17ys - opened

@michaelfeil .

thank you for the model that can be loaded in 8bit.
when I try to run inference, it keeps on generating texts.
Whereas , I just want an answer to my question.

Could you edit the inference script such that we only get precise answers and not uncontrolled generations.
I've attached a screenshot below of what output i am getting, hope you can suggest a fix for this.
Screenshot from 2023-05-30 23-24-31.png

There is no easy fix for this, a typical problem in any library.
As in https://github.com/michaelfeil/hf-hub-ctranslate2/blob/e236f006593fb00633f9874fe414c87bd9735813/hf_hub_ctranslate2/translate.py#LL309C1-L334C70
You might want to set end_token =["User“, „user“] or so.

thanks for the reply @michaelfeil .
i noticed that you can get perfect answers in the mpt-7b-chat space : https://huggingface.co/spaces/mosaicml/mpt-7b-chat
could you have a look at the app.py file.

maybe it can suggest a fix to the issue.

how would you re-write the inference example you mentioned in the readme to sort of resolve the issue even a bit.could you paste the updated inference script in your next reply?

say for eg , in the below script

from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer

model_name = "michaelfeil/ct2fast-mpt-7b-chat"
model = GeneratorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
)
outputs = model.generate(
text=["User: what is the largest mountain on mars? Bot:"],
max_length=256
)
print(outputs)

it would be great if the output was :
Bot : The largest mountain on mars in Olympus Mons.

It actually gives the right answer , but it then continues with some other text. I want it to stop at Olympus Mons. @michaelfeil

also you can check this notebook for eg @michaelfeil
https://colab.research.google.com/drive/1s2beg55SEV8ICHufhwcflYtUNskTX8kO

it ends the sentence perfectly!

You might set end_token=[„User“], as keyword to model.generate

@michaelfeil , i updated the code as follows

from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer

model_name = "michaelfeil/ct2fast-mpt-7b-chat"
model = GeneratorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
)
outputs = model.generate(
text=["User: what is the largest mountain on mars? Bot:"],
end_token=["User"],
max_length=256
)

print(outputs)

but i am still getting same continues output :

Screenshot from 2023-05-31 00-04-50.png

User and Bot are also no special tokens, they are just dummies I used for a bunch of other models.
Not sure why the stop tokens do not work in this case.

Can’t support, if you found a good solution, feel free to share it here.

Okay, stop tokens are ["<|im_end|>", "<|endoftext|>"])
And messages should be formatted in the style

f"<|im_start|>user\n{item[0]}<|im_end|>",
f"<|im_start|>assistant\n{item[1]}<|im_end|>",
]

See the app.py in mpt chat space.

@michaelfeil could you paste the whole script here if possible?

Sorry, can’t provide further help.

michaelfeil changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment