Text Generation
Transformers
Safetensors
qwen2
text-generation-inference
conversational
Logo

🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

📋 Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

🔧 Method

The experimentation process was conducted using veRL, focusing on:

  • Implementation of PPO algorithm with a 0.5B parameter model
  • Running training experiments on a small dataset
  • Testing veRL's framework capabilities in handling RL tasks
  • Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

📊 VLMU Benchmark

EVALUATION DATE STEM 🔬 SOCIAL SCIENCE 🌍 HUMANITIES 📚 OTHERS 🎯 AVG ⭐
07/02/2025 23.18 32.84 32.71 33.67 29.43

🤝 Contributors

Developed with ❤️ by BlossomAI


Star ⭐️ this repo if you find it valuable!
Downloads last month
6
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BlossomsAI/BloomVN-0.5B-ppo

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(312)
this model
Adapters
21 models
Quantizations
2 models

Dataset used to train BlossomsAI/BloomVN-0.5B-ppo