---
license: apache-2.0
language:
- ar
- arz
library_name: transformers
pipeline_tag: text-classification

widget:
- text: إزيك يا صاحبي؟ عامل ايه؟
  output:
  - label: Neutral
    score: 0.9996803998947144
  - label: Offensive
    score: 0.0001022413489408791
  - label: Racism
    score: 9.184532245853916e-05
  - label: Sexism
    score: 6.373981887008995e-05
  - label: Religious Discrimination
    score: 6.177987233968452e-05

- text: الجو حلو النهارده ومزاجي رايق.
  output:
  - label: Neutral
    score: 0.9996935129165649
  - label: Offensive
    score: 8.360472565982491e-05
  - label: Racism
    score: 8.307467942358926e-05
  - label: Religious Discrimination
    score: 7.166535215219483e-05
  - label: Sexism
    score: 6.817518442403525e-05

- text: ربنا يكرم بقي
  output:
  - label: Neutral
    score: 0.9996790885925293
  - label: Racism
    score: 9.369126928504556e-05
  - label: Offensive
    score: 8.728997636353597e-05
  - label: Sexism
    score: 7.185162394307554e-05
  - label: Religious Discrimination
    score: 6.815723463660106e-05

- text: المسلمين و المسيحيين ايد واحده
  output:
  - label: Neutral
    score: 0.9995753169059753
  - label: Religious Discrimination
    score: 0.00018269731663167477
  - label: Racism
    score: 0.00010355251288274303
  - label: Offensive
    score: 7.555521733593196e-05
  - label: Sexism
    score: 6.293723708949983e-05

- text: إنت غبي ومحدش طايقك.
  output:
  - label: Offensive
    score: 0.9991933703422546
  - label: Sexism
    score: 0.0003336789377499372
  - label: Religious Discrimination
    score: 0.00016878142196219414
  - label: Racism
    score: 0.00015242300287354738
  - label: Neutral
    score: 0.0001517057535238564

- text: كفاية بقى تفاهة يا ولاد كذا...
  output:
  - label: Offensive
    score: 0.9990781545639038
  - label: Sexism
    score: 0.00037458923179656267
  - label: Racism
    score: 0.000291738921077922
  - label: Neutral
    score: 0.00012822129065170884
  - label: Religious Discrimination
    score: 0.00012722807878162712

- text: اخرس شويه بقي
  output:
  - label: Offensive
    score: 0.9991931319236755
  - label: Sexism
    score: 0.0003191738505847752
  - label: Neutral
    score: 0.00016810539818834513
  - label: Religious Discrimination
    score: 0.00016681390115991235
  - label: Racism
    score: 0.00015283490938600153

- text: كل الستات ما يعرفوش يسوقوا.
  output:
  - label: Sexism
    score: 0.9995362758636475
  - label: Offensive
    score: 0.00013705063611268997
  - label: Religious Discrimination
    score: 0.00011684132914524525
  - label: Neutral
    score: 0.00010987235873471946
  - label: Racism
    score: 9.996256994782016e-05

- text: البنات لازم يقعدوا في البيت.
  output:
  - label: Sexism
    score: 0.9995285272598267
  - label: Offensive
    score: 0.0001513037277618423
  - label: Neutral
    score: 0.00011781435023294762
  - label: Religious Discrimination
    score: 0.00011129838094348088
  - label: Racism
    score: 9.102458716370165e-05

- text: الستات مبتعرفش تسوق
  output:
  - label: Sexism
    score: 0.9995216131210327
  - label: Offensive
    score: 0.00014647189527750015
  - label: Neutral
    score: 0.00012976766447536647
  - label: Religious Discrimination
    score: 0.00010765341721707955
  - label: Racism
    score: 9.454880637349561e-05

- text: مش بحب أتعامل مع السود.
  output:
  - label: Racism
    score: 0.9993932247161865
  - label: Neutral
    score: 0.0002437636285321787
  - label: Offensive
    score: 0.00015556033758912235
  - label: Religious Discrimination
    score: 0.00010805160854943097
  - label: Sexism
    score: 9.933744877343997e-05

- text: الخلايجة كده دايمًا، شايفين نفسهم.
  output:
  - label: Racism
    score: 0.9995430707931519
  - label: Offensive
    score: 0.0001469338167225942
  - label: Sexism
    score: 0.00010587665747152641
  - label: Religious Discrimination
    score: 0.00010272463987348601
  - label: Neutral
    score: 0.00010147325519938022

- text: الخلايجه كلهم ولاد كلب
  output:
  - label: Racism
    score: 0.9995269775390625
  - label: Offensive
    score: 0.00017615519755054265
  - label: Sexism
    score: 0.00010926152026513591
  - label: Religious Discrimination
    score: 0.0001019195988192223
  - label: Neutral
    score: 8.565541065763682e-05

- text: الناس اللي بتصلي دول منافقين.
  output:
  - label: Religious Discrimination
    score: 0.9987323880195618
  - label: Neutral
    score: 0.0007236517849378288
  - label: Sexism
    score: 0.00036698306212201715
  - label: Racism
    score: 0.00010111296433024108
  - label: Offensive
    score: 7.585762068629265e-05

- text: دينك مش صح، ولازم تغيره.
  output:
  - label: Religious Discrimination
    score: 0.9994857311248779
  - label: Neutral
    score: 0.00024439653498120606
  - label: Sexism
    score: 0.00010205681610386819
  - label: Offensive
    score: 9.713304461911321e-05
  - label: Racism
    score: 7.06951177562587e-05

- text: المسلمين و المسيحيين مش ايد واحده
  output:
  - label: Religious Discrimination
    score: 0.9993454813957214
  - label: Neutral
    score: 0.0004249837074894458
  - label: Sexism
    score: 8.618667925475165e-05
  - label: Racism
    score: 7.340563752222806e-05
  - label: Offensive
    score: 7.000174809945747e-05

datasets:
- IbrahimAmin/egyptian-arabic-hate-speech
base_model:
- UBC-NLP/MARBERTv2
---

## 🧠 Model Card

This model is a fine-tuned version of [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2), optimized for multiclass text classification in **Egyptian Arabic**. It classifies input text into one of the following five categories:

- **Neutral**
- **Offensive**
- **Sexism**
- **Racism**
- **Religious Discrimination**

It is particularly useful for content moderation, hate speech analysis, and Arabic NLP research in dialectal contexts.

## 📚 Dataset

The model was fine-tuned on a custom annotated dataset: [IbrahimAmin/egyptian-arabic-hate-speech](https://huggingface.co/datasets/IbrahimAmin/egyptian-arabic-hate-speech), which contains thousands of Egyptian-Arabic social media texts labeled by category.

## 🔧 How to Use

```python
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = AutoModelForSequenceClassification.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification")
tokenizer = AutoTokenizer.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, device=device)
result = classifier("مبحبش الخلايجه")
print(result)
```

## ⚠️ Limitations & Biases

- Trained specifically on Egyptian Arabic; performance may degrade on MSA or other dialects.
- Social and political content may introduce bias in predictions.
- Borderline and sarcastic content may be misclassified.

## ⚠️ Disclaimer
This model is intended for research and content moderation purposes and is not meant to offend, harm, or promote discrimination against any individual or group. 
It is important to use this model responsibly and consider the context in which it is applied. Any offensive content detected by the model should be treated with 
caution and handled appropriately.

## 👏 Acknowledgement

Model fine-tuning, data collection, annotation and pre-processing for this work were performed as part of a Graduation Project from the Faculty of Engineering, AASTMT, Computer Engineering Program.

## 📖 Citation 

If you use this model in your work, please cite:

~~~
@INPROCEEDINGS{10009167,
  author={Ahmed, Ibrahim and Abbas, Mostafa and Hatem, Rany and Ihab, Andrew and Fahkr, Mohamed Waleed},
  booktitle={2022 20th International Conference on Language Engineering (ESOLEC)}, 
  title={Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification}, 
  year={2022},
  volume={20},
  number={},
  pages={170-174},
  keywords={Social networking (online);Text categorization;Hate speech;Blogs;Transformers;Natural language processing;Task analysis;Arabic Hate Speech;Natural Language Processing;Transformers;Text Classification},
  doi={10.1109/ESOLEC54569.2022.10009167}}
~~~