Sentiment Analysis Model (Vibescribe)

Vibescribe built with Hugging Face Transformers, fine-tuned on IMDB reviews.

Setup

Clone the repository:

git clone https://github.com/your-username/sentiment-analysis
cd sentiment-analysis

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

huggingface-cli login

Project Structure

sentiment-analysis/
├── requirements.txt
├── train.py
├── inference.py
├── utils.py
└── README.md

Files to Create

requirements.txt

transformers==4.37.2
datasets==2.16.1
torch==2.1.2
scikit-learn==1.4.0

utils.py

from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    return {
        'accuracy': accuracy_score(labels, preds),
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

inference.py

from transformers import pipeline

def load_model(model_path):
    return pipeline("sentiment-analysis", model=model_path)

def predict(classifier, text):
    return classifier(text)

if __name__ == "__main__":
    model_path = "your-username/sentiment-analysis-model"
    classifier = load_model(model_path)
    
    # Example prediction
    text = "This movie was really great!"
    result = predict(classifier, text)
    print(f"Text: {text}\nSentiment: {result}")

Training

Update model configuration in train.py:

training_args = TrainingArguments(
    output_dir="sentiment-analysis-model",
    hub_model_id="your-username/sentiment-analysis-model",  # Change this
    ...
)

Start training:

python train.py

Making Predictions

from inference import load_model, predict

classifier = load_model("your-username/sentiment-analysis-model")
result = predict(classifier, "Your text here")

Model Details

Base model: DistilBERT
Dataset: IMDB Reviews
Task: Binary sentiment classification (positive/negative)
Training time: ~2-3 hours on GPU
Model size: ~260MB

Performance Metrics

Accuracy: ~91-93%
F1 Score: ~91-92%
Precision: ~90-91%
Recall: ~91-92%

Contributing

Fork the repository
Create feature branch
Commit changes
Push to branch
Open pull request

License

MIT License