Update README.md
Browse files
README.md
CHANGED
@@ -4,40 +4,103 @@ library_name: transformers
|
|
4 |
tags:
|
5 |
- mergekit
|
6 |
- merge
|
7 |
-
|
|
|
|
|
8 |
---
|
9 |
-
#
|
|
|
|
|
|
|
|
|
10 |
|
11 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
12 |
|
13 |
-
|
14 |
-
### Merge Method
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
-
###
|
19 |
|
20 |
-
|
21 |
-
* output/pre
|
22 |
-
* output/donor
|
23 |
|
24 |
### Configuration
|
25 |
|
26 |
The following YAML configuration was used to produce this model:
|
27 |
|
28 |
```yaml
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
dtype: float32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
merge_method: slerp
|
|
|
32 |
parameters:
|
33 |
t:
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
- layer_range: [0, 32]
|
40 |
-
model: output/pre
|
41 |
-
- layer_range: [0, 32]
|
42 |
-
model: output/donor
|
43 |
```
|
|
|
4 |
tags:
|
5 |
- mergekit
|
6 |
- merge
|
7 |
+
- llama
|
8 |
+
- conversational
|
9 |
+
license: llama3
|
10 |
---
|
11 |
+
# L3-Tyche-8B-v1.0
|
12 |
+
|
13 |
+

|
14 |
+
|
15 |
+
## About:
|
16 |
|
17 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
18 |
|
19 |
+
**Recommended Samplers:**
|
|
|
20 |
|
21 |
+
```
|
22 |
+
Temperature - 1.3
|
23 |
+
TFS - 0.96
|
24 |
+
Smoothing Factor - 0.3
|
25 |
+
Smoothing Curve - 1.1
|
26 |
+
Repetition Penalty - 1.08
|
27 |
+
```
|
28 |
|
29 |
+
### Merge Method
|
30 |
|
31 |
+
This model was merged a series of model stock and lora merges, followed by ExPO and an attention swap. It uses a mix of smart and roleplay centered models to improve performance.
|
|
|
|
|
32 |
|
33 |
### Configuration
|
34 |
|
35 |
The following YAML configuration was used to produce this model:
|
36 |
|
37 |
```yaml
|
38 |
+
---
|
39 |
+
models:
|
40 |
+
- model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85
|
41 |
+
- model: Nitral-AI/Hathor_Respawn-L3-8B-v0.8
|
42 |
+
- model: ChaoticNeutrals/Hathor_RP-v.01-L3-8B
|
43 |
+
- model: Sao10K/L3-8B-Stheno-v3.2
|
44 |
+
- model: yodayo-ai/nephra_v1.0
|
45 |
+
- model: HiroseKoichi/L3-8B-Lunar-Stheno
|
46 |
+
- model: Jellywibble/lora_120k_pref_data_ep2
|
47 |
+
- model: Jellywibble/qlora_120k_pref_data_ep1
|
48 |
+
- model: Jellywibble/meseca-20062024-c1
|
49 |
+
- model: Hastagaras/Jamet-8B-L3-MK.V-Blackroot
|
50 |
+
- model: Cas-Warehouse/Llama-3-SOVL-MopeyMule-Blackroot-8B
|
51 |
+
- model: ResplendentAI/Nymph_8B+Azazelle/RP_Format_QuoteAsterisk_Llama3
|
52 |
+
- model: R136a1/Bungo-L3-8B
|
53 |
+
- model: maldv/badger-mu-llama-3-8b
|
54 |
+
- model: TheDrummer/Llama-3SOME-8B-v2
|
55 |
+
- model: Magpie-Align/Llama-3-8B-Magpie-Align-v0.1+Azazelle/Llama3_RP_ORPO_LoRA
|
56 |
+
- model: grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge+Azazelle/Llama-3-8B-Abomination-LORA
|
57 |
+
- model: NousResearch/Hermes-2-Pro-Llama-3-8B+mpasila/Llama-3-Instruct-LiPPA-LoRA-8B
|
58 |
+
- model: MaziyarPanahi/Llama-3-8B-Instruct-v0.8+Azazelle/Llama-3-Sunfall-8b-lora
|
59 |
+
- model: openchat/openchat-3.6-8b-20240522+Azazelle/BlueMoon_Llama3
|
60 |
+
- model: collaiborateorg/Collaiborator-MEDLLM-Llama-3-8B-v2+Azazelle/llama3-8b-hikikomori-v0.4
|
61 |
+
- model: grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
|
62 |
+
merge_method: model_stock
|
63 |
+
base_model: failspy/Meta-Llama-3-8B-Instruct-abliterated-v3
|
64 |
+
dtype: float32
|
65 |
+
vocab_type: bpe
|
66 |
+
name: hq_rp
|
67 |
+
|
68 |
+
---
|
69 |
+
# ExPO
|
70 |
+
models:
|
71 |
+
- model: hq_rp
|
72 |
+
parameters:
|
73 |
+
weight: 1.3
|
74 |
+
merge_method: task_arithmetic
|
75 |
+
base_model: failspy/Meta-Llama-3-8B-Instruct-abliterated-v3
|
76 |
+
parameters:
|
77 |
+
normalize: false
|
78 |
dtype: float32
|
79 |
+
vocab_type: bpe
|
80 |
+
name: pre
|
81 |
+
|
82 |
+
---
|
83 |
+
# Attention Donor
|
84 |
+
models:
|
85 |
+
- model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85
|
86 |
+
- model: Sao10K/L3-8B-Stheno-v3.2
|
87 |
+
merge_method: model_stock
|
88 |
+
base_model: failspy/Meta-Llama-3-8B-Instruct-abliterated-v3
|
89 |
+
dtype: float32
|
90 |
+
vocab_type: bpe
|
91 |
+
name: donor
|
92 |
+
|
93 |
+
---
|
94 |
+
# Attention swap?
|
95 |
+
models:
|
96 |
+
- model: pre
|
97 |
merge_method: slerp
|
98 |
+
base_model: donor
|
99 |
parameters:
|
100 |
t:
|
101 |
+
- filter: mlp
|
102 |
+
value: 0
|
103 |
+
- value: 1
|
104 |
+
dtype: float32
|
105 |
+
vocab_type: bpe
|
|
|
|
|
|
|
|
|
106 |
```
|