Uploaded finetuned model

Developed by: themex1380
License: apache-2.0
Finetuned from model : CohereLabs/aya-expanse-8b

This cohere model was trained 2x faster with Unsloth and Huggingface's TRL library.

It took around 2h, with an A40 48GB, the training cost being 3h * 0.4$/h ~= 1.2$, and around 0.5$ of messing around, and getting a feel for the parameteres, so the total was ~2.0$.

Parameters:

Rank: 24
Rank Alpha: 48
Per Device Train Batch Size: 2
Grad Accum Steps: 2
Warmup Ratio: 0.05
Num Train Epochs: 1
Learning Rate: 8e-5
Optimizer: adamw_8bit
Weight Decay: 0.01
Max Grad Norm: 3.0
Lr Scheduler Type: Linear

Runpod Machine:

GPU: 1 x A40 48GB PCIE
Container Disk: 100GB
Volume Disk: 20GB
Docker Image: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

themex1380
/

Aya-Expanse-8B-Storywriter-Exp-5

Uploaded finetuned model

Parameters:

Runpod Machine:

Model tree for themex1380/Aya-Expanse-8B-Storywriter-Exp-5

Dataset used to train themex1380/Aya-Expanse-8B-Storywriter-Exp-5

Collection including themex1380/Aya-Expanse-8B-Storywriter-Exp-5

Storywriter Series