mt5-base Reranker RU mMARCO/v2 50/50 Native Transliterated Queries

This is a variation of Unicamp's mt5-base Reranker initially finetuned on mMARCOv/2.

The queries are a 50/50 split between native Russian and transliterated Russian to English text using uroman.

The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR.

Downloads last month
1
Safetensors
Model size
582M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for andreaschari/mt5-RU_MMARCO_50_MIXED

Finetuned
(12)
this model

Dataset used to train andreaschari/mt5-RU_MMARCO_50_MIXED

Collection including andreaschari/mt5-RU_MMARCO_50_MIXED