How is reward calculation done during inference in this model?

#17
by arunasank - opened

This model seems to be trained using sDPO instead of DPO. How is reward calculation done in this model during inference, for an assistant response to a question?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment