Model Card for 11128093-11066053-nli

A binary Natural Language Inference classifier fine-tuned on the provided COMP34812 dataset using the Mamba state space model.

Model Details

Model Description

This model extends the state-spaces/mamba-130m architecture for binary NLI tasks (entailment vs. non-entailment). It uses a custom classification head and was fine-tuned on the COMP34812 NLI dataset.

Developed by: Patrick Mermelstein Lyons and Dev Soneji
Language(s): English
Model type: Supervised
Model architecture: Non-Transformers (Selective State Spaces)
Finetuned from model [optional]: state-spaces/mamba-130m

Model Resources

Repository: https://huggingface.co/state-spaces/mamba-130m
Paper or documentation: https://arxiv.org/pdf/2312.00752.pdf

Training Details

Training Data

The COMP34812 NLI train dataset (closed-source task-specific dataset). 24.4K pairs of premise-hypothesis pairs, each with a binary entailment label.

Training Procedure

Training Hyperparameters

  - learning_rate: 5e-5
  - train_batch_size: 4
  - eval_batch_size: 16
  - num_train_epochs: 5
  - lr_scheduler_type: cosine
  - warmup_ratio: 0.1

Speeds, Sizes, Times

  - total training time: 1 hour 17 minutes
  - number of epochs: 5
  - model size: ~500MB

Evaluation

Testing Data & Metrics

Testing Data

The COMP34812 NLI dev dataset (closed-source task-specific dataset). 6.7K pairs of premise-hypothesis pairs, each with a binary entailment label.

Metrics

  - Accuracy
  - Matthews Correlation Coefficient (MCC)

Results

The model achieved an accuracy of 82.4% and an MCC of 0.649.

Technical Specifications

Hardware

  - GPU: NVIDIA T4 (Google Colab)
  - VRAM: 15.0 GB
  - RAM: 12.7 GB
  - Disk: 2 GB for model and data

Software

  - Python 3.10+
  - PyTorch
  - HuggingFace Transformers
  - mamba-ssm
  - datasets, evaluate, accelerate

Bias, Risks, and Limitations

The model is limited to binary entailment detection and is trained exclusively on the COMP34812 dataset. Generalization outside of this dataset is untested. Sentence pairs longer than 128 tokens will be trunacted.

Additional Information

Model checkpoints and tokenizer available at https://huggingface.co/patrickmlml/mamba_nli_ensemble. Hyperparameters were determined by closely following referenced literature.

patrickmlml
/

mamba_nli_ensemble