leaderboard-pr-bot's picture
Adding Evaluation Results
468a95a
|
raw
history blame
1.7 kB
metadata
datasets:
  - chargoddard/Open-Platypus-Chat
language:
  - en
tags:
  - llama

Experimental ReLoRA-trained model using the OpenPlatypus dataset. Ran for one epoch, with three lora restarts.

Not recommended for use yet. Mostly tossing this up for testing.

Base model was llama2-22b-blocktriangular.

Relevant training parameters:

adapter: qlora
load_in_4bit: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.001
lora_target_linear: true
relora_steps: 150
relora_warmup_steps: 10
gradient_accumulation_steps: 2
micro_batch_size: 3

Uses the same prompt format as Ypotryll-22b. Prefix messages with " ***System:", " ***Query:", or " ***Response:", paying attention to whitespace.

Built with Axolotl

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.11
ARC (25-shot) 57.51
HellaSwag (10-shot) 82.36
MMLU (5-shot) 54.94
TruthfulQA (0-shot) 43.62
Winogrande (5-shot) 77.11
GSM8K (5-shot) 6.29
DROP (3-shot) 42.9