Model Overview

该模型用于对AI回复的正确性和内容可用价值的评估, 分数从0-100. 0-35可以判断为完全不可用(内容错误,与事实不符,未遵循用户的指令). 35-50(部分内容可用), 50-100(内容正确并且遵循用户的指令,可以使用). 该模型一般配合Best-of-N来使用, 对于35分以下的直接丢弃, 对于35-50分之间的,可以引入critics来修正后重新采样, 50分以上的可以直接使用. 该模型既可以用于Final Answer的评估, 也可以用于LLM调用Tool的评估(主要评估使用工具是否合理,参数是否正确.)

SCORE: 0.00-100.00

Usage

Run the Inference Code

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "weiminw/Heliumos-RM-3B"
tokenizer = AutoTokenizer.from_pretrained(model_id,padding_side="left")
model = AutoModelForSequenceClassification.from_pretrained(
     model_id,
     torch_dtype="auto", 
     device_map="auto"
 )

messages = [
    {'role': 'user', 'content': "what is 12 * 12?"},
    {'role': 'assistant', 'content': "144"}
    
]

text_encoded = tokenizer.apply_chat_template(messages, return_dict=True, return_tensors="pt", tokenize=True).to("cuda:0")

score = model(**text_encoded)

print(score.logits[0]) # 
Downloads last month
8
Safetensors
Model size
3.09B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for weiminw/Heliumos-RM-3B

Base model

Qwen/Qwen2.5-3B
Finetuned
(480)
this model

Dataset used to train weiminw/Heliumos-RM-3B