Safetensors
English
llama
Not-For-All-Audiences
lemonilia commited on
Commit
3ad4eb6
·
verified ·
1 Parent(s): 0db43f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -9,15 +9,16 @@ datasets:
9
  - lemonilia/Herrsimian
10
  ---
11
 
12
- # Llama-3.1-Herrsimian-8B (v0.4)
13
  ![Son Biten - crop - edited](https://files.catbox.moe/a0o04c.png)
14
 
15
  **Herrsimian** is a limited-scope NSFW roleplaying model intended to replicate the writing style of a certain prolific roleplayer who used to actively participate on a few different roleplay forums until the end of 2022. It's also a test bench for studying how much mileage can be obtained with a very limited amount of high-quality training samples of consistent style—mainly by training well into the overfitting regime.
16
 
17
  The model, finetuned on _Llama-3.1-8B-base_, was trained with a multi-user, multi-character paradigm, where user or model turns don't necessarily alternate as in most other instruction prompting formats. As of present however the most commonly used LLM frontends cannot readily take advantage of this feature.
18
 
19
- **Note:** this version is not final. There is still some work to be done on the dataset to increase general usability, but prose quality should be representitative of the final model.
20
 
 
21
  - **2024-09-04 (v0.4)**: increased RP data to 100 samples, some bugfixes. Slightly decreased learning rate.
22
  - **2024-09-02 (v0.3)**: ① Added character descriptions and scenarios to the RP data using a variety of formats, which should make the model more flexible. ② Changed training hyperparameters. ③ Updated SillyTavern context preset. ④ Removed the 8-bit GGUF quantization as upon testing it appeared to have noticeably lower quality than the BF16 weights, for some inexplicable reason.
23
  - **2024-09-01 (v0.2)**: increased RP data to 94 samples / 710k tokens. Maybe now more hallucination-prone?
@@ -113,7 +114,7 @@ Avoid using Repetition/Frequency/Presence Penalty, as they will penalize punctua
113
  ## Training details
114
  [Unsloth](https://github.com/unslothai/unsloth) was used on a single RTX3090 24GB GPU with QLoRA finetuning.
115
 
116
- As of writing, 94 manually curated, mostly human-sourced samples have been used, for a total of about 720k unique tokens at 8k tokens context length (1300k tokens would have been possible with 32k+ tokens context). The preformatted data was finetuned as raw text (no masking of user turns or labels) for 5 epochs in total using a WSD scheduler without warmup, with constant learning rate for 1 epochs, followed by immediate decay to 10% of the initial learning rate.
117
 
118
  The learning rate was initially chosen so that the lowest eval loss on a representative long sample would occur within one epoch. Overfitting (eval loss increasing beyond the minimum point) happened after that.
119
 
@@ -139,7 +140,7 @@ lr_scheduler_kwargs = {
139
  ```
140
 
141
  ### Eval / Train loss graph
142
- ![Train/loss graph](https://files.catbox.moe/y82ote.png)
143
 
144
  ## Questions and Answers
145
  **Q. Isn't the model overfitted?**
 
9
  - lemonilia/Herrsimian
10
  ---
11
 
12
+ # Llama-3.1-Herrsimian-8B (v0.5)
13
  ![Son Biten - crop - edited](https://files.catbox.moe/a0o04c.png)
14
 
15
  **Herrsimian** is a limited-scope NSFW roleplaying model intended to replicate the writing style of a certain prolific roleplayer who used to actively participate on a few different roleplay forums until the end of 2022. It's also a test bench for studying how much mileage can be obtained with a very limited amount of high-quality training samples of consistent style—mainly by training well into the overfitting regime.
16
 
17
  The model, finetuned on _Llama-3.1-8B-base_, was trained with a multi-user, multi-character paradigm, where user or model turns don't necessarily alternate as in most other instruction prompting formats. As of present however the most commonly used LLM frontends cannot readily take advantage of this feature.
18
 
19
+ **Note:** There is still some work to be done on the dataset to increase general usability, but prose quality should be representitative of the final model.
20
 
21
+ - **2024-09-05 (v0.5)**: increased RP data to 131 samples. This is probably the final content update as I don't have any more data from the same source worth including.
22
  - **2024-09-04 (v0.4)**: increased RP data to 100 samples, some bugfixes. Slightly decreased learning rate.
23
  - **2024-09-02 (v0.3)**: ① Added character descriptions and scenarios to the RP data using a variety of formats, which should make the model more flexible. ② Changed training hyperparameters. ③ Updated SillyTavern context preset. ④ Removed the 8-bit GGUF quantization as upon testing it appeared to have noticeably lower quality than the BF16 weights, for some inexplicable reason.
24
  - **2024-09-01 (v0.2)**: increased RP data to 94 samples / 710k tokens. Maybe now more hallucination-prone?
 
114
  ## Training details
115
  [Unsloth](https://github.com/unslothai/unsloth) was used on a single RTX3090 24GB GPU with QLoRA finetuning.
116
 
117
+ As of writing, 131 manually curated, mostly human-sourced samples have been used, for a total of about 1.02M unique tokens at 8k tokens context length (2.45M tokens would have been possible with 52k+ tokens context). The preformatted data was finetuned as raw text (no masking of user turns or labels) for 5 epochs in total using a WSD scheduler without warmup, with constant learning rate for 1 epochs, followed by immediate decay to 10% of the initial learning rate.
118
 
119
  The learning rate was initially chosen so that the lowest eval loss on a representative long sample would occur within one epoch. Overfitting (eval loss increasing beyond the minimum point) happened after that.
120
 
 
140
  ```
141
 
142
  ### Eval / Train loss graph
143
+ ![Train/loss graph](https://files.catbox.moe/28ysxb.png)
144
 
145
  ## Questions and Answers
146
  **Q. Isn't the model overfitted?**