TextDetox 2025 Starter Kit
Collection
https://pan.webis.de/clef25/pan25-web/text-detoxification.html
•
7 items
•
Updated
This is an instance of cardiffnlp/twitter-xlm-roberta-large-2022 that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.
Now, the models covers 15 languages from various language families:
Language | Code | F1 Score |
---|---|---|
English | en | 0.9071 |
Russian | ru | 0.9022 |
Ukrainian | uk | 0.9075 |
German | de | 0.6528 |
Spanish | es | 0.7430 |
Arabic | ar | 0.6207 |
Amharic | am | 0.6676 |
Hindi | hi | 0.7171 |
Chinese | zh | 0.6483 |
Italian | it | 0.7597 |
French | fr | 0.9114 |
Hinglish | hin | 0.7051 |
Hebrew | he | 0.8911 |
Japanese | ja | 0.8725 |
Tatar | tt | 0.6542 |
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('textdetox/twitter-xlmr-toxicity-classifier')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/twitter-xlmr-toxicity-classifier')
batch = tokenizer.encode("You are amazing!", return_tensors="pt")
output = model(batch)
# idx 0 for neutral, idx 1 for toxic
The model is prepared for TextDetox 2025 Shared Task evaluation.
Citation TBD soon.
Base model
cardiffnlp/twitter-xlm-roberta-large-2022