Model Card for Model ID

Model Details

IN PROGRESS

A classifier model for analyzing news articles to reveal their political slant. Using machine learning, the system provides probabilistic scores across the political spectrum: liberal, center, and conservative.

Model Description

Developed by: kangelamw
Funded by [optional]: Personal/Private
Shared by [optional]: Personal/Private
Model type: Classification
License: MIT

Model Sources [optional]

Repository: Github

Uses

Equips readers with a critical lens to recognize hidden ideological influences and navigate media manipulation by revealing the subtle ideological currents that can unconsciously shape perception
Transforms bias assessment from subjective guesswork to data-driven analysis, providing a quantitative approach to understanding media political leanings.
Shows potential biases in news reporting to promote balance and objectivity in journalism.

Direct Use

[More Information Needed]

Out-of-Scope Use

Not Suitable For:

High-stakes decision-making environments where fairness and accountability are crucial.
Automated moderation or policy enforcement without human oversight.
Use in contexts where misclassification could cause harm or reinforce negative stereotypes.

Users should avoid using this model for political campaigning, propaganda, or any application that might promote bias or misinformation.

Bias, Risks, and Limitations

Data Bias:

The model is trained on datasets that might incorporate historical or cultural biases related to political parties and opinions. This may impact fairness across different groups. Interpretability:
The softmax outputs indicate probabilistic estimates rather than absolute truths. Interpret results with caution. **Overgeneralization:
Relying solely on this model for assessing political bias can lead to oversimplification of complex sociopolitical views. Risk Mitigation:
Complement model outputs with human judgment.
Perform additional validation against a diverse test set to uncover potential bias.
Regularly update and audit the model to account for shifts in political discourse. Limitations: The model may not generalize well to texts that have a context or structure significantly different from the training data. Additionally, subtle nuances in language might not be captured accurately, leading to potential misclassification.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

model_id = "kangelamw/RoBERTa-political-bias-classifier-softmax"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

# Example input text
text = "Your sample text goes here."
inputs = tokenizer(text, return_tensors="pt")

# Get predictions
outputs = model(**inputs)
print(outputs)

Training Details

Training Data

[More Information Needed]

Training Procedure

Model Initialization:
Start from a pre-trained RoBERTa model that has general language understanding capabilities.
Fine-Tuning Approach:
The model was adapted to political bias classification by appending a softmax classification layer and training on the specialized dataset. Key training parameters such as learning rate, batch size, and the number of epochs were optimized during the fine-tuning process.
Validation and Optimization:
A portion of the dataset was set aside for validation to monitor performance and avoid overfitting.

Training Hyperparameters

Training regime: fp32

training_args = TrainingArguments(
  output_dir=model_path,
  do_train=True,
  do_eval=True,
  do_predict=True,
  
  eval_strategy="steps",
  eval_steps=150,
  eval_accumulation_steps=4,
  
  logging_strategy="steps",
  logging_steps=300,
  
  save_strategy="steps",
  save_steps=300,
  num_train_epochs=5,
  
  learning_rate=2e-5,
  lr_scheduler_type="linear",
  warmup_ratio=0.1,
  weight_decay=0.01,
  
  load_best_model_at_end=True,
  metric_for_best_model="f1",
  greater_is_better=True,
  
  report_to="tensorboard",
  resume_from_checkpoint=True,
  
  per_device_eval_batch_size=8,
  per_device_train_batch_size=8,
  gradient_accumulation_steps=4,
  gradient_checkpointing=True
)

Evaluation

Results

Metric	Value
eval_accuracy	0.9204
eval_f1	0.9206
eval_cross_entropy	0.2789
eval_kl_divergence	0.2789
epoch	4.9875

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 12GB Nvidia RTX 3060TI
Hours used: Approximately 8-16 hours everyday for 2-3 weeks for fine-tuning and inference
Cloud Provider: None - personal workstation/local machine
Compute Region: North America
Carbon Emitted: // Not in the list of hardwares on the calculator

Citation [optional]

@article{DBLP:journals/corr/abs-1907-11692,
  author    = {Yinhan Liu and
               Myle Ott and
               Naman Goyal and
               Jingfei Du and
               Mandar Joshi and
               Danqi Chen and
               Omer Levy and
               Mike Lewis and
               Luke Zettlemoyer and
               Veselin Stoyanov},
  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
  journal   = {CoRR},
  volume    = {abs/1907.11692},
  year      = {2019},
  url       = {http://arxiv.org/abs/1907.11692},
  archivePrefix = {arXiv},
  eprint    = {1907.11692},
  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Model Card Contact

You can find me on Github or LinkedIn.

kangelamw
/

RoBERTa-political-bias-classifier-softmax