Uploaded finetuned model

  • Developed by: themex1380
  • License: apache-2.0
  • Finetuned from model : CohereLabs/aya-expanse-8b

This cohere model was trained 2x faster with Unsloth and Huggingface's TRL library.

It took around 2h, with an A40 48GB, the training cost being 3h * 0.4$/h ~= 1.2$, and around 0.5$ of messing around, and getting a feel for the parameteres, so the total was ~2.0$.

Parameters:

  • Rank: 24
  • Rank Alpha: 48
  • Per Device Train Batch Size: 2
  • Grad Accum Steps: 2
  • Warmup Ratio: 0.05
  • Num Train Epochs: 1
  • Learning Rate: 8e-5
  • Optimizer: adamw_8bit
  • Weight Decay: 0.01
  • Max Grad Norm: 3.0
  • Lr Scheduler Type: Linear

Runpod Machine:

  • GPU: 1 x A40 48GB PCIE
  • Container Disk: 100GB
  • Volume Disk: 20GB
  • Docker Image: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for themex1380/Aya-Expanse-8B-Storywriter-Exp-5

Finetuned
(29)
this model
Quantizations
1 model

Dataset used to train themex1380/Aya-Expanse-8B-Storywriter-Exp-5

Collection including themex1380/Aya-Expanse-8B-Storywriter-Exp-5