|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- yaful/MAGE |
|
language: |
|
- en |
|
base_model: |
|
- answerdotai/ModernBERT-base |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Machine-generated text detection prevents language model collapse |
|
|
|
This model is part of the research presented in the paper [Machine-generated text detection prevents language model collapse](https://arxiv.org/abs/2502.15654), which proposes an approach to prevent model collapse based on importance sampling from a machine-generated text detector. The official implementation and training scripts are available in the GitHub repository: [GeorgeDrayson/model_collapse](https://github.com/GeorgeDrayson/model_collapse) |
|
|
|
## Usage |
|
|
|
To use the model for detecting machine-generated text: |
|
|
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("GeorgeDrayson/modernbert-mage") |
|
model = AutoModelForSequenceClassification.from_pretrained("GeorgeDrayson/modernbert-mage") |
|
|
|
text = "Your input text here." |
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model(**inputs) |
|
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
print(f"Probability of machine-generated text: {probabilities[0][1].item():.4f}") |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model or find the research helpful, please cite: |
|
|
|
|
|
```bibtex |
|
@article{drayson2025machine, |
|
title={Machine-generated text detection prevents language model collapse}, |
|
author={Drayson, George and Yilmaz, Emine and Lampos, Vasileios}, |
|
journal={arXiv preprint arXiv:2502.15654}, |
|
year={2025} |
|
} |
|
``` |