πŸ›‘οΈ LexiGuard: Misogyny, Misandry & Toxicity Detection in English and Slovak

LexiGuard is a multilingual multitask model designed to detect and classify offensive language, with a focus on misogyny, misandry, and toxicity levels in English. The model also supports Slovak, making it suitable for multilingual analysis of social media content.

It performs dual classification:

  1. Category: Misogyny, Misandry, or Neutral
  2. Toxicity level: Low, Medium, or High

The model is based on xlm-roberta-base and was fine-tuned on a custom dataset primarily in English, with additional annotated samples in Slovak.


🧠 Model Overview

  • Base model: xlm-roberta-base
  • Tasks: Multitask classification (2 output heads)
  • Primary language: English
  • Secondary language: Slovak
  • Use case: Detecting offensive, sexist, or toxic comments in multilingual social media

πŸ› οΈ Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Megyy/lexiguard")
model = AutoModelForSequenceClassification.from_pretrained("Megyy/lexiguard")

text = "Women are useless in politics."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# outputs.logits contains predictions for both tasks

Note: The model has two output heads:

  • Head 1: Category (misogyny/misandry/neutral)
  • Head 2: Toxicity (low/medium/high)

πŸ“Š Label Definitions

Task 1 – Category Classification

  • 0: Neutral
  • 1: Misogyny
  • 2: Misandry

Task 2 – Toxicity Prediction

  • 0: Low
  • 1: Medium
  • 2: High

πŸ§ͺ Training Data

  • Over 5,000 manually annotated comments
  • Domain: Online discussions, social media, and forums
  • Language distribution:
    • ~80% English
    • ~20% Slovak

πŸ“ Model Files

  • pytorch_model.bin / model.safetensors: model weights
  • config.json: model configuration
  • tokenizer.json, vocab.txt, etc.: tokenizer files
  • README.md: model card

πŸ“š Citation

If you use this model in your work, please cite:

@bachelorsthesis{majercakova2025lexiguard,
  title={LexiGuard: Offensive Language Detection in English and Slovak Social Media},
  author={Magdalena Majercakova},
  year={2025},
  note={Bachelor's thesis, TUKE},
}

πŸ‘¨β€πŸ’» Author

Developed by Magdaléna MajerčÑkovÑ as part of a Bachelor's Thesis
Supervised by Ing. Zuzana SokolovΓ‘, PhD
Faculty of Electrical Engineering and Informatics, TUKE (2025)


Downloads last month
23
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support