Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 5 days ago • 31
mEdIT Collection Collection of the publicly available mEdIT dataset and instruction-tuned models for multilingual text revision. • 3 items • Updated May 17, 2024 • 2
Spivavtor Collection Dataset and models from the paper "Spivavtor: An Instruction Tuned Ukrainian Text Editing Model" (accepted at the Third Ukrainian NLP Workshop). • 3 items • Updated May 3, 2024 • 1