Safetensors
roberta

Model Card for Model ID

Model Details

IN PROGRESS

A classifier model for analyzing news articles to reveal their political slant. Using machine learning, the system provides probabilistic scores across the political spectrum: liberal, center, and conservative.

Model Description

  • Developed by: kangelamw
  • Funded by [optional]: Personal/Private
  • Shared by [optional]: Personal/Private
  • Model type: Classification
  • License: MIT

Model Sources [optional]

Uses

  1. Equips readers with a critical lens to recognize hidden ideological influences and navigate media manipulation by revealing the subtle ideological currents that can unconsciously shape perception
  2. Transforms bias assessment from subjective guesswork to data-driven analysis, providing a quantitative approach to understanding media political leanings.
  3. Shows potential biases in news reporting to promote balance and objectivity in journalism.

Direct Use

[More Information Needed]

Out-of-Scope Use

Not Suitable For:

  • High-stakes decision-making environments where fairness and accountability are crucial.
  • Automated moderation or policy enforcement without human oversight.
  • Use in contexts where misclassification could cause harm or reinforce negative stereotypes.

Users should avoid using this model for political campaigning, propaganda, or any application that might promote bias or misinformation.

Bias, Risks, and Limitations

Data Bias:

  • The model is trained on datasets that might incorporate historical or cultural biases related to political parties and opinions. This may impact fairness across different groups. Interpretability:
  • The softmax outputs indicate probabilistic estimates rather than absolute truths. Interpret results with caution. **Overgeneralization:
  • Relying solely on this model for assessing political bias can lead to oversimplification of complex sociopolitical views. Risk Mitigation:
  • Complement model outputs with human judgment.
  • Perform additional validation against a diverse test set to uncover potential bias.
  • Regularly update and audit the model to account for shifts in political discourse. Limitations: The model may not generalize well to texts that have a context or structure significantly different from the training data. Additionally, subtle nuances in language might not be captured accurately, leading to potential misclassification.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

model_id = "kangelamw/RoBERTa-political-bias-classifier-softmax"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

# Example input text
text = "Your sample text goes here."
inputs = tokenizer(text, return_tensors="pt")

# Get predictions
outputs = model(**inputs)
print(outputs)

Training Details

Training Data

[More Information Needed]

Training Procedure

  • Model Initialization:
    Start from a pre-trained RoBERTa model that has general language understanding capabilities.

  • Fine-Tuning Approach:
    The model was adapted to political bias classification by appending a softmax classification layer and training on the specialized dataset. Key training parameters such as learning rate, batch size, and the number of epochs were optimized during the fine-tuning process.

  • Validation and Optimization:
    A portion of the dataset was set aside for validation to monitor performance and avoid overfitting.

Training Hyperparameters

  • Training regime: fp32
training_args = TrainingArguments(
  output_dir=model_path,
  do_train=True,
  do_eval=True,
  do_predict=True,
  
  eval_strategy="steps",
  eval_steps=150,
  eval_accumulation_steps=4,
  
  logging_strategy="steps",
  logging_steps=300,
  
  save_strategy="steps",
  save_steps=300,
  num_train_epochs=5,
  
  learning_rate=2e-5,
  lr_scheduler_type="linear",
  warmup_ratio=0.1,
  weight_decay=0.01,
  
  load_best_model_at_end=True,
  metric_for_best_model="f1",
  greater_is_better=True,
  
  report_to="tensorboard",
  resume_from_checkpoint=True,
  
  per_device_eval_batch_size=8,
  per_device_train_batch_size=8,
  gradient_accumulation_steps=4,
  gradient_checkpointing=True
)

Evaluation

Results

Metric Value
eval_accuracy 0.9204
eval_f1 0.9206
eval_cross_entropy 0.2789
eval_kl_divergence 0.2789
epoch 4.9875

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 12GB Nvidia RTX 3060TI
  • Hours used: Approximately 8-16 hours everyday for 2-3 weeks for fine-tuning and inference
  • Cloud Provider: None - personal workstation/local machine
  • Compute Region: North America
  • Carbon Emitted: // Not in the list of hardwares on the calculator

Citation [optional]

@article{DBLP:journals/corr/abs-1907-11692,
  author    = {Yinhan Liu and
               Myle Ott and
               Naman Goyal and
               Jingfei Du and
               Mandar Joshi and
               Danqi Chen and
               Omer Levy and
               Mike Lewis and
               Luke Zettlemoyer and
               Veselin Stoyanov},
  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
  journal   = {CoRR},
  volume    = {abs/1907.11692},
  year      = {2019},
  url       = {http://arxiv.org/abs/1907.11692},
  archivePrefix = {arXiv},
  eprint    = {1907.11692},
  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Model Card Contact

You can find me on Github or LinkedIn.

Downloads last month
8
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kangelamw/RoBERTa-political-bias-classifier-softmax

Finetuned
(1573)
this model

Dataset used to train kangelamw/RoBERTa-political-bias-classifier-softmax