metadata
license: apache-2.0
datasets:
- yaful/MAGE
language:
- en
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: text-classification
Machine-generated text detection prevents language model collapse
This model is part of the research presented in the paper Machine-generated text detection prevents language model collapse, which proposes an approach to prevent model collapse based on importance sampling from a machine-generated text detector. The official implementation and training scripts are available in the GitHub repository: GeorgeDrayson/model_collapse
Usage
To use the model for detecting machine-generated text:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("GeorgeDrayson/modernbert-mage")
model = AutoModelForSequenceClassification.from_pretrained("GeorgeDrayson/modernbert-mage")
text = "Your input text here."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(f"Probability of machine-generated text: {probabilities[0][1].item():.4f}")
Citation
If you use this model or find the research helpful, please cite:
@article{drayson2025machine,
title={Machine-generated text detection prevents language model collapse},
author={Drayson, George and Yilmaz, Emine and Lampos, Vasileios},
journal={arXiv preprint arXiv:2502.15654},
year={2025}
}