Update README.md
Browse files
README.md
CHANGED
@@ -34,8 +34,10 @@ As mentioned above, this was done in "phases", each with a separate dataset. Mos
|
|
34 |
Once all LoRAs were trained, I separately merged them into the base model then I used [mergekit](https://github.com/arcee-ai/mergekit) [(config)](https://huggingface.co/rAIfle/WAIDWML-Phi4-8x14B-bf16/blob/main/mergekit_moe_config.yml) to "merge" them into a MoE. I chose to initialize the router randomly as I was going to training that part later. After that, I trained the routing layers for 8 epochs with `lr = 1e-6` and `grimulkan/LimaRP-augmented` as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.
|
35 |
|
36 |
## Recommended Settings
|
37 |
-
Phi-4 format.
|
38 |
-
|
|
|
|
|
39 |
|
40 |
## FAQ
|
41 |
```
|
|
|
34 |
Once all LoRAs were trained, I separately merged them into the base model then I used [mergekit](https://github.com/arcee-ai/mergekit) [(config)](https://huggingface.co/rAIfle/WAIDWML-Phi4-8x14B-bf16/blob/main/mergekit_moe_config.yml) to "merge" them into a MoE. I chose to initialize the router randomly as I was going to training that part later. After that, I trained the routing layers for 8 epochs with `lr = 1e-6` and `grimulkan/LimaRP-augmented` as the dataset. It took roughly 8.5 hours on a 6xA40 instance on RunPod.
|
35 |
|
36 |
## Recommended Settings
|
37 |
+
Phi-4 format. What I used for my tests:
|
38 |
+
- Temp 1
|
39 |
+
- minP 0.05
|
40 |
+
|
41 |
|
42 |
## FAQ
|
43 |
```
|