Project of MoE reward model

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

shengyi-qian updated a model 9 days ago

MoeReward/rl_checkpoints

shengyi-qian published a model 23 days ago

MoeReward/rl_checkpoints

zyhang1998 updated a dataset 28 days ago

MoeReward/combined_rlhf_dataset_grpo_imdb_main

View all activity

models 6

MoeReward/rl_checkpoints

Updated 9 days ago

MoeReward/lora_checkpoint

Updated about 1 month ago

MoeReward/reward_lora_qwen_1_5_base

Updated Mar 21 • 1

MoeReward/reward_qwen_1_5

Updated Mar 17 • 1

MoeReward/reward_lora_qwen_1_5

MoeReward/sft_full_param_qwen_1_5

Updated Mar 16 • 1

datasets 49

MoeReward/combined_rlhf_dataset_grpo_imdb_main

Viewer • Updated 28 days ago • 4k • 126

MoeReward/combined_rlhf_dataset_grpo_metamath_main

Viewer • Updated 28 days ago • 4k • 108

MoeReward/combined_rlhf_dataset_grpo_arc_main

Viewer • Updated 28 days ago • 4k • 105

MoeReward/combined_rlhf_dataset_grpo_nq_main

Viewer • Updated 28 days ago • 4k • 104

MoeReward/combined_rlhf_dataset_grpo_equal_dist

Viewer • Updated 28 days ago • 4k • 65

MoeReward/preference_dataset_stepmath_ood

Viewer • Updated 28 days ago • 10.8k • 63

MoeReward/combined_preference_dataset_ood

Updated 28 days ago • 28

MoeReward/combined_rlhf_dataset_alpaca

Viewer • Updated 29 days ago • 52k • 51

MoeReward/combined_rlhf_dataset_math

Viewer • Updated 29 days ago • 40k • 68

MoeReward/combined_rlhf_dataset_code

Viewer • Updated 29 days ago • 20k • 57