Model Card for Model ID
Model Details
IN PROGRESS
A classifier model for analyzing news articles to reveal their political slant. Using machine learning, the system provides probabilistic scores across the political spectrum: liberal, center, and conservative.
Model Description
- Developed by: kangelamw
- Funded by [optional]: Personal/Private
- Shared by [optional]: Personal/Private
- Model type: Classification
- License: MIT
Model Sources [optional]
- Repository: Github
Uses
- Equips readers with a critical lens to recognize hidden ideological influences and navigate media manipulation by revealing the subtle ideological currents that can unconsciously shape perception
- Transforms bias assessment from subjective guesswork to data-driven analysis, providing a quantitative approach to understanding media political leanings.
- Shows potential biases in news reporting to promote balance and objectivity in journalism.
Direct Use
[More Information Needed]
Out-of-Scope Use
Not Suitable For:
- High-stakes decision-making environments where fairness and accountability are crucial.
- Automated moderation or policy enforcement without human oversight.
- Use in contexts where misclassification could cause harm or reinforce negative stereotypes.
Users should avoid using this model for political campaigning, propaganda, or any application that might promote bias or misinformation.
Bias, Risks, and Limitations
Data Bias:
- The model is trained on datasets that might incorporate historical or cultural biases related to political parties and opinions. This may impact fairness across different groups. Interpretability:
- The softmax outputs indicate probabilistic estimates rather than absolute truths. Interpret results with caution. **Overgeneralization:
- Relying solely on this model for assessing political bias can lead to oversimplification of complex sociopolitical views. Risk Mitigation:
- Complement model outputs with human judgment.
- Perform additional validation against a diverse test set to uncover potential bias.
- Regularly update and audit the model to account for shifts in political discourse. Limitations: The model may not generalize well to texts that have a context or structure significantly different from the training data. Additionally, subtle nuances in language might not be captured accurately, leading to potential misclassification.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
model_id = "kangelamw/RoBERTa-political-bias-classifier-softmax"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
# Example input text
text = "Your sample text goes here."
inputs = tokenizer(text, return_tensors="pt")
# Get predictions
outputs = model(**inputs)
print(outputs)
Training Details
Training Data
[More Information Needed]
Training Procedure
Model Initialization:
Start from a pre-trained RoBERTa model that has general language understanding capabilities.Fine-Tuning Approach:
The model was adapted to political bias classification by appending a softmax classification layer and training on the specialized dataset. Key training parameters such as learning rate, batch size, and the number of epochs were optimized during the fine-tuning process.Validation and Optimization:
A portion of the dataset was set aside for validation to monitor performance and avoid overfitting.
Training Hyperparameters
- Training regime: fp32
training_args = TrainingArguments(
output_dir=model_path,
do_train=True,
do_eval=True,
do_predict=True,
eval_strategy="steps",
eval_steps=150,
eval_accumulation_steps=4,
logging_strategy="steps",
logging_steps=300,
save_strategy="steps",
save_steps=300,
num_train_epochs=5,
learning_rate=2e-5,
lr_scheduler_type="linear",
warmup_ratio=0.1,
weight_decay=0.01,
load_best_model_at_end=True,
metric_for_best_model="f1",
greater_is_better=True,
report_to="tensorboard",
resume_from_checkpoint=True,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
gradient_accumulation_steps=4,
gradient_checkpointing=True
)
Evaluation
Results
Metric | Value |
---|---|
eval_accuracy | 0.9204 |
eval_f1 | 0.9206 |
eval_cross_entropy | 0.2789 |
eval_kl_divergence | 0.2789 |
epoch | 4.9875 |
Summary
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 12GB Nvidia RTX 3060TI
- Hours used: Approximately 8-16 hours everyday for 2-3 weeks for fine-tuning and inference
- Cloud Provider: None - personal workstation/local machine
- Compute Region: North America
- Carbon Emitted: // Not in the list of hardwares on the calculator
Citation [optional]
@article{DBLP:journals/corr/abs-1907-11692,
author = {Yinhan Liu and
Myle Ott and
Naman Goyal and
Jingfei Du and
Mandar Joshi and
Danqi Chen and
Omer Levy and
Mike Lewis and
Luke Zettlemoyer and
Veselin Stoyanov},
title = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
journal = {CoRR},
volume = {abs/1907.11692},
year = {2019},
url = {http://arxiv.org/abs/1907.11692},
archivePrefix = {arXiv},
eprint = {1907.11692},
timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Model Card Contact
- Downloads last month
- 8
Model tree for kangelamw/RoBERTa-political-bias-classifier-softmax
Base model
FacebookAI/roberta-base