Model Card for Qwen2.5-0.5B-Instruct

This model is a 🤗 transformers model, Qwen2.5-0.5B-Instruct.

Model Details

Model Description

This model is a finetuned version of Qwen/Qwen2.5-0.5B-Instruct, a 0.5 billion parameter language model from the Qwen2 family. The finetuning was performed using reinforcement learning approach: Group Relative Policy Optimization (GRPO).

Developed by: Qwen (original model), finetuning by rgb2gbr
Funded by : rgb2bgr
Shared by : rgb2bgr
Model type: Causal Language Model
Language(s) (NLP): English
License: MIT
Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct

Uses

The model fine-tuned using grpo method on openlifescienceai/medmcqa, can answer and identify the correct options in dataset with 44.02% accuracy

Out-of-Scope Use

This model should not be used for generating harmful, biased, or inappropriate content. It's important to be aware of the potential limitations and biases inherited from the base model and the finetuning data.

Bias, Risks, and Limitations

As a large language model, this model may exhibit biases present in the training data. The finetuning process may have amplified or mitigated certain biases. Further evaluation is needed to understand the full extent of these biases and limitations.

How to Get Started with the Model

You can load this model using the transformers library in Python:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "rgb2gbr/GRPO_BioMedmcqa_Qwen2.5-0.5B" 
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

prompt = "Identify the right answer and elaborate it"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

rgb2gbr
/

GRPO_BioMedmcqa_Qwen2.5-0.5B