--- license: apache-2.0 language: - ar - arz library_name: transformers pipeline_tag: text-classification widget: - text: إزيك يا صاحبي؟ عامل ايه؟ output: - label: Neutral score: 0.9996803998947144 - label: Offensive score: 0.0001022413489408791 - label: Racism score: 9.184532245853916e-05 - label: Sexism score: 6.373981887008995e-05 - label: Religious Discrimination score: 6.177987233968452e-05 - text: الجو حلو النهارده ومزاجي رايق. output: - label: Neutral score: 0.9996935129165649 - label: Offensive score: 8.360472565982491e-05 - label: Racism score: 8.307467942358926e-05 - label: Religious Discrimination score: 7.166535215219483e-05 - label: Sexism score: 6.817518442403525e-05 - text: ربنا يكرم بقي output: - label: Neutral score: 0.9996790885925293 - label: Racism score: 9.369126928504556e-05 - label: Offensive score: 8.728997636353597e-05 - label: Sexism score: 7.185162394307554e-05 - label: Religious Discrimination score: 6.815723463660106e-05 - text: المسلمين و المسيحيين ايد واحده output: - label: Neutral score: 0.9995753169059753 - label: Religious Discrimination score: 0.00018269731663167477 - label: Racism score: 0.00010355251288274303 - label: Offensive score: 7.555521733593196e-05 - label: Sexism score: 6.293723708949983e-05 - text: إنت غبي ومحدش طايقك. output: - label: Offensive score: 0.9991933703422546 - label: Sexism score: 0.0003336789377499372 - label: Religious Discrimination score: 0.00016878142196219414 - label: Racism score: 0.00015242300287354738 - label: Neutral score: 0.0001517057535238564 - text: كفاية بقى تفاهة يا ولاد كذا... output: - label: Offensive score: 0.9990781545639038 - label: Sexism score: 0.00037458923179656267 - label: Racism score: 0.000291738921077922 - label: Neutral score: 0.00012822129065170884 - label: Religious Discrimination score: 0.00012722807878162712 - text: اخرس شويه بقي output: - label: Offensive score: 0.9991931319236755 - label: Sexism score: 0.0003191738505847752 - label: Neutral score: 0.00016810539818834513 - label: Religious Discrimination score: 0.00016681390115991235 - label: Racism score: 0.00015283490938600153 - text: كل الستات ما يعرفوش يسوقوا. output: - label: Sexism score: 0.9995362758636475 - label: Offensive score: 0.00013705063611268997 - label: Religious Discrimination score: 0.00011684132914524525 - label: Neutral score: 0.00010987235873471946 - label: Racism score: 9.996256994782016e-05 - text: البنات لازم يقعدوا في البيت. output: - label: Sexism score: 0.9995285272598267 - label: Offensive score: 0.0001513037277618423 - label: Neutral score: 0.00011781435023294762 - label: Religious Discrimination score: 0.00011129838094348088 - label: Racism score: 9.102458716370165e-05 - text: الستات مبتعرفش تسوق output: - label: Sexism score: 0.9995216131210327 - label: Offensive score: 0.00014647189527750015 - label: Neutral score: 0.00012976766447536647 - label: Religious Discrimination score: 0.00010765341721707955 - label: Racism score: 9.454880637349561e-05 - text: مش بحب أتعامل مع السود. output: - label: Racism score: 0.9993932247161865 - label: Neutral score: 0.0002437636285321787 - label: Offensive score: 0.00015556033758912235 - label: Religious Discrimination score: 0.00010805160854943097 - label: Sexism score: 9.933744877343997e-05 - text: الخلايجة كده دايمًا، شايفين نفسهم. output: - label: Racism score: 0.9995430707931519 - label: Offensive score: 0.0001469338167225942 - label: Sexism score: 0.00010587665747152641 - label: Religious Discrimination score: 0.00010272463987348601 - label: Neutral score: 0.00010147325519938022 - text: الخلايجه كلهم ولاد كلب output: - label: Racism score: 0.9995269775390625 - label: Offensive score: 0.00017615519755054265 - label: Sexism score: 0.00010926152026513591 - label: Religious Discrimination score: 0.0001019195988192223 - label: Neutral score: 8.565541065763682e-05 - text: الناس اللي بتصلي دول منافقين. output: - label: Religious Discrimination score: 0.9987323880195618 - label: Neutral score: 0.0007236517849378288 - label: Sexism score: 0.00036698306212201715 - label: Racism score: 0.00010111296433024108 - label: Offensive score: 7.585762068629265e-05 - text: دينك مش صح، ولازم تغيره. output: - label: Religious Discrimination score: 0.9994857311248779 - label: Neutral score: 0.00024439653498120606 - label: Sexism score: 0.00010205681610386819 - label: Offensive score: 9.713304461911321e-05 - label: Racism score: 7.06951177562587e-05 - text: المسلمين و المسيحيين مش ايد واحده output: - label: Religious Discrimination score: 0.9993454813957214 - label: Neutral score: 0.0004249837074894458 - label: Sexism score: 8.618667925475165e-05 - label: Racism score: 7.340563752222806e-05 - label: Offensive score: 7.000174809945747e-05 datasets: - IbrahimAmin/egyptian-arabic-hate-speech base_model: - UBC-NLP/MARBERTv2 --- ## 🧠 Model Card This model is a fine-tuned version of [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2), optimized for multiclass text classification in **Egyptian Arabic**. It classifies input text into one of the following five categories: - **Neutral** - **Offensive** - **Sexism** - **Racism** - **Religious Discrimination** It is particularly useful for content moderation, hate speech analysis, and Arabic NLP research in dialectal contexts. ## 📚 Dataset The model was fine-tuned on a custom annotated dataset: [IbrahimAmin/egyptian-arabic-hate-speech](https://huggingface.co/datasets/IbrahimAmin/egyptian-arabic-hate-speech), which contains thousands of Egyptian-Arabic social media texts labeled by category. ## 🔧 How to Use ```python import torch from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification device = "cuda:0" if torch.cuda.is_available() else "cpu" model = AutoModelForSequenceClassification.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification") tokenizer = AutoTokenizer.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-classification") classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, device=device) result = classifier("مبحبش الخلايجه") print(result) ``` ## ⚠️ Limitations & Biases - Trained specifically on Egyptian Arabic; performance may degrade on MSA or other dialects. - Social and political content may introduce bias in predictions. - Borderline and sarcastic content may be misclassified. ## ⚠️ Disclaimer This model is intended for research and content moderation purposes and is not meant to offend, harm, or promote discrimination against any individual or group. It is important to use this model responsibly and consider the context in which it is applied. Any offensive content detected by the model should be treated with caution and handled appropriately. ## 👏 Acknowledgement Model fine-tuning, data collection, annotation and pre-processing for this work were performed as part of a Graduation Project from the Faculty of Engineering, AASTMT, Computer Engineering Program. ## 📖 Citation If you use this model in your work, please cite: ~~~ @INPROCEEDINGS{10009167, author={Ahmed, Ibrahim and Abbas, Mostafa and Hatem, Rany and Ihab, Andrew and Fahkr, Mohamed Waleed}, booktitle={2022 20th International Conference on Language Engineering (ESOLEC)}, title={Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification}, year={2022}, volume={20}, number={}, pages={170-174}, keywords={Social networking (online);Text categorization;Hate speech;Blogs;Transformers;Natural language processing;Task analysis;Arabic Hate Speech;Natural Language Processing;Transformers;Text Classification}, doi={10.1109/ESOLEC54569.2022.10009167}} ~~~