🧠 Model Card

This model is a fine-tuned version of UBC-NLP/MARBERTv2, optimized for multiclass text classification in Egyptian Arabic. It classifies input text into one of the following five categories:

  • Neutral
  • Offensive
  • Sexism
  • Racism
  • Religious Discrimination

It is particularly useful for content moderation, hate speech analysis, and Arabic NLP research in dialectal contexts.

📚 Dataset

The model was fine-tuned on a custom annotated dataset: IbrahimAmin/egyptian-arabic-hate-speech, which contains thousands of Egyptian-Arabic social media texts labeled by category.

🔧 How to Use

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = AutoModelForSequenceClassification.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification")
tokenizer = AutoTokenizer.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, device=device)
result = classifier("مبحبش الخلايجه")
print(result)

⚠️ Limitations & Biases

  • Trained specifically on Egyptian Arabic; performance may degrade on MSA or other dialects.
  • Social and political content may introduce bias in predictions.
  • Borderline and sarcastic content may be misclassified.

⚠️ Disclaimer

This model is intended for research and content moderation purposes and is not meant to offend, harm, or promote discrimination against any individual or group. It is important to use this model responsibly and consider the context in which it is applied. Any offensive content detected by the model should be treated with caution and handled appropriately.

👏 Acknowledgement

Model fine-tuning, data collection, annotation and pre-processing for this work were performed as part of a Graduation Project from the Faculty of Engineering, AASTMT, Computer Engineering Program.

📖 Citation

If you use this model in your work, please cite:

@INPROCEEDINGS{10009167,
  author={Ahmed, Ibrahim and Abbas, Mostafa and Hatem, Rany and Ihab, Andrew and Fahkr, Mohamed Waleed},
  booktitle={2022 20th International Conference on Language Engineering (ESOLEC)}, 
  title={Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification}, 
  year={2022},
  volume={20},
  number={},
  pages={170-174},
  keywords={Social networking (online);Text categorization;Hate speech;Blogs;Transformers;Natural language processing;Task analysis;Arabic Hate Speech;Natural Language Processing;Transformers;Text Classification},
  doi={10.1109/ESOLEC54569.2022.10009167}}
Downloads last month
3
Safetensors
Model size
163M params
Tensor type
F32
·
Inference Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification

Base model

UBC-NLP/MARBERTv2
Finetuned
(13)
this model

Dataset used to train IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification