Update README.md
Browse files
README.md
CHANGED
@@ -9,15 +9,16 @@ datasets:
|
|
9 |
- lemonilia/Herrsimian
|
10 |
---
|
11 |
|
12 |
-
# Llama-3.1-Herrsimian-8B (v0.
|
13 |

|
14 |
|
15 |
**Herrsimian** is a limited-scope NSFW roleplaying model intended to replicate the writing style of a certain prolific roleplayer who used to actively participate on a few different roleplay forums until the end of 2022. It's also a test bench for studying how much mileage can be obtained with a very limited amount of high-quality training samples of consistent style—mainly by training well into the overfitting regime.
|
16 |
|
17 |
The model, finetuned on _Llama-3.1-8B-base_, was trained with a multi-user, multi-character paradigm, where user or model turns don't necessarily alternate as in most other instruction prompting formats. As of present however the most commonly used LLM frontends cannot readily take advantage of this feature.
|
18 |
|
19 |
-
**Note:**
|
20 |
|
|
|
21 |
- **2024-09-04 (v0.4)**: increased RP data to 100 samples, some bugfixes. Slightly decreased learning rate.
|
22 |
- **2024-09-02 (v0.3)**: ① Added character descriptions and scenarios to the RP data using a variety of formats, which should make the model more flexible. ② Changed training hyperparameters. ③ Updated SillyTavern context preset. ④ Removed the 8-bit GGUF quantization as upon testing it appeared to have noticeably lower quality than the BF16 weights, for some inexplicable reason.
|
23 |
- **2024-09-01 (v0.2)**: increased RP data to 94 samples / 710k tokens. Maybe now more hallucination-prone?
|
@@ -113,7 +114,7 @@ Avoid using Repetition/Frequency/Presence Penalty, as they will penalize punctua
|
|
113 |
## Training details
|
114 |
[Unsloth](https://github.com/unslothai/unsloth) was used on a single RTX3090 24GB GPU with QLoRA finetuning.
|
115 |
|
116 |
-
As of writing,
|
117 |
|
118 |
The learning rate was initially chosen so that the lowest eval loss on a representative long sample would occur within one epoch. Overfitting (eval loss increasing beyond the minimum point) happened after that.
|
119 |
|
@@ -139,7 +140,7 @@ lr_scheduler_kwargs = {
|
|
139 |
```
|
140 |
|
141 |
### Eval / Train loss graph
|
142 |
-

|
13 |

|
14 |
|
15 |
**Herrsimian** is a limited-scope NSFW roleplaying model intended to replicate the writing style of a certain prolific roleplayer who used to actively participate on a few different roleplay forums until the end of 2022. It's also a test bench for studying how much mileage can be obtained with a very limited amount of high-quality training samples of consistent style—mainly by training well into the overfitting regime.
|
16 |
|
17 |
The model, finetuned on _Llama-3.1-8B-base_, was trained with a multi-user, multi-character paradigm, where user or model turns don't necessarily alternate as in most other instruction prompting formats. As of present however the most commonly used LLM frontends cannot readily take advantage of this feature.
|
18 |
|
19 |
+
**Note:** There is still some work to be done on the dataset to increase general usability, but prose quality should be representitative of the final model.
|
20 |
|
21 |
+
- **2024-09-05 (v0.5)**: increased RP data to 131 samples. This is probably the final content update as I don't have any more data from the same source worth including.
|
22 |
- **2024-09-04 (v0.4)**: increased RP data to 100 samples, some bugfixes. Slightly decreased learning rate.
|
23 |
- **2024-09-02 (v0.3)**: ① Added character descriptions and scenarios to the RP data using a variety of formats, which should make the model more flexible. ② Changed training hyperparameters. ③ Updated SillyTavern context preset. ④ Removed the 8-bit GGUF quantization as upon testing it appeared to have noticeably lower quality than the BF16 weights, for some inexplicable reason.
|
24 |
- **2024-09-01 (v0.2)**: increased RP data to 94 samples / 710k tokens. Maybe now more hallucination-prone?
|
|
|
114 |
## Training details
|
115 |
[Unsloth](https://github.com/unslothai/unsloth) was used on a single RTX3090 24GB GPU with QLoRA finetuning.
|
116 |
|
117 |
+
As of writing, 131 manually curated, mostly human-sourced samples have been used, for a total of about 1.02M unique tokens at 8k tokens context length (2.45M tokens would have been possible with 52k+ tokens context). The preformatted data was finetuned as raw text (no masking of user turns or labels) for 5 epochs in total using a WSD scheduler without warmup, with constant learning rate for 1 epochs, followed by immediate decay to 10% of the initial learning rate.
|
118 |
|
119 |
The learning rate was initially chosen so that the lowest eval loss on a representative long sample would occur within one epoch. Overfitting (eval loss increasing beyond the minimum point) happened after that.
|
120 |
|
|
|
140 |
```
|
141 |
|
142 |
### Eval / Train loss graph
|
143 |
+

|
144 |
|
145 |
## Questions and Answers
|
146 |
**Q. Isn't the model overfitted?**
|