Machine-generated text detection prevents language model collapse

This model is part of the research presented in the paper Machine-generated text detection prevents language model collapse, which proposes an approach to prevent model collapse based on importance sampling from a machine-generated text detector. The official implementation and training scripts are available in the GitHub repository: GeorgeDrayson/model_collapse

Usage

To use the model for detecting machine-generated text:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("GeorgeDrayson/modernbert-mage")
model = AutoModelForSequenceClassification.from_pretrained("GeorgeDrayson/modernbert-mage")

text = "Your input text here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(f"Probability of machine-generated text: {probabilities[0][1].item():.4f}")

Citation

If you use this model or find the research helpful, please cite:

@article{drayson2025machine,
  title={Machine-generated text detection prevents language model collapse},
  author={Drayson, George and Yilmaz, Emine and Lampos, Vasileios},
  journal={arXiv preprint arXiv:2502.15654},
  year={2025}
}
Downloads last month
16
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GeorgeDrayson/modernbert-ai-detection

Finetuned
(511)
this model

Dataset used to train GeorgeDrayson/modernbert-ai-detection