root commited on
Commit
954388d
·
1 Parent(s): c1e4a1b

fixed README

Browse files
Files changed (1) hide show
  1. README.md +117 -10
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- license: apache-2.0
 
 
3
  tags:
4
  - moe
5
  - frankenmoe
@@ -11,18 +13,11 @@ tags:
11
  base_model:
12
  - llama-3-sqrt-crocodile-v0.0A/sqrt-talker
13
  - llama-3-sqrt-crocodile-v0.0A/the-operator
14
- license: other
15
- license_name: llama3
16
- license_link: LICENSE
17
  ---
18
 
19
  # llama-3-sqrt-crocodile-v0.0A
20
 
21
- llama-3-sqrt-crocodile-v0.0A is a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
22
- * [llama-3-sqrt-crocodile-v0.0A/sqrt-talker](https://huggingface.co/llama-3-sqrt-crocodile-v0.0A/sqrt-talker)
23
- * [llama-3-sqrt-crocodile-v0.0A/the-operator](https://huggingface.co/llama-3-sqrt-crocodile-v0.0A/the-operator)
24
-
25
- ## 🧩 Configuration
26
 
27
  ```yaml
28
  base_model: llama-3-sqrt-crocodile-v0.0A/Uninstruct-Uncensored
@@ -38,7 +33,119 @@ experts:
38
  - "Good at structured tasks"
39
  - "Programmatic instruction following"
40
  ```
41
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## 💻 Usage
43
 
44
  ```python
 
1
  ---
2
+ license: other
3
+ license_name: llama3
4
+ license_link: LICENSE
5
  tags:
6
  - moe
7
  - frankenmoe
 
13
  base_model:
14
  - llama-3-sqrt-crocodile-v0.0A/sqrt-talker
15
  - llama-3-sqrt-crocodile-v0.0A/the-operator
 
 
 
16
  ---
17
 
18
  # llama-3-sqrt-crocodile-v0.0A
19
 
20
+ ## 🧩 Configuration-moe
 
 
 
 
21
 
22
  ```yaml
23
  base_model: llama-3-sqrt-crocodile-v0.0A/Uninstruct-Uncensored
 
33
  - "Good at structured tasks"
34
  - "Programmatic instruction following"
35
  ```
36
+ ## 🧩 Configuration-mega
37
+ ```yaml
38
+ models:
39
+ - model: Orenguteng/Lexi-Llama-3-8B-Uncensored
40
+ parameters:
41
+ weight: [0.2, 0.3, 0.4, 0.6]
42
+ layer_range: [0, 32]
43
+ - model: NousResearch/Meta-Llama-3-8B
44
+ parameters:
45
+ weight: [0.6, 0.2, 0.2, 0.1]
46
+ layer_range: [0, 32]
47
+ - model: NousResearch/Meta-Llama-3-8B-Instruct
48
+ parameters:
49
+ weight: [0.2, 0.3, 0.85, 0.3]
50
+ layer_range: [0, 32]
51
+ merge_method: dare_linear
52
+ base_model: NousResearch/Meta-Llama-3-8B-Instruct
53
+ dtype: bfloat16
54
+ name: Uninstruct-Uncensored
55
+ ---
56
+ models:
57
+ - model: cognitivecomputations/dolphin-2.9-llama3-8b
58
+ parameters:
59
+ weight: [0.25, 0.4, 0.35, 0.35]
60
+ density: [0.3, 0.45, 0.2, 0.6]
61
+ layer_range: [0, 32]
62
+ - model: NousResearch/Meta-Llama-3-8B
63
+ parameters:
64
+ weight: [0.15, 0.25, 0.05, 0]
65
+ density: [0.2, 0.3, 0.4, 0.1]
66
+ - model: Undi95/Llama-3-Unholy-8B
67
+ parameters:
68
+ weight: [0.4, 0.25, 0.45, 0.35]
69
+ density: [0.2, 0.15, 1.5, 0.1]
70
+ layer_range: [0, 32]
71
+ - model: Uninstruct-Uncensored
72
+ parameters:
73
+ weight: [0.3, 0.1, 0.25, 0.3]
74
+ density: [0.3, 0.15, 2.5, 0.2]
75
+ layer_range: [0, 32]
76
+ merge_method: dare_ties
77
+ base_model: Uninstruct-Uncensored
78
+ dtype: bfloat16
79
+ name: augmented-dolphin-hap
80
+ ---
81
+ models:
82
+ - model: vicgalle/Configurable-Llama-3-8B-v0.3
83
+ parameters:
84
+ weight: [0.5, 0.3, 0.1]
85
+ - model: hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode
86
+ parameters:
87
+ weight: 0.5
88
+ - model: Trelis/Meta-Llama-3-8B-Instruct-function-calling
89
+ parameters:
90
+ weight: 0.3
91
+ layer_range: [0, 32]
92
+ - model: Rookie/Llama-3-8B-Instruct-Chinese
93
+ parameters:
94
+ weight: 0.2
95
+ layer_range: [0, 32]
96
+ - model: Uninstruct-Uncensored
97
+ parameters:
98
+ weight: [0.7, 0.4, 0.25, 0.1]
99
+ layer_range: [0, 32]
100
+ merge_method: model_stock
101
+ base_model: Uninstruct-Uncensored
102
+ dtype: bfloat16
103
+ name: the-operator
104
+ ---
105
+ models:
106
+ - model: vicgalle/Configurable-Llama-3-8B-v0.3
107
+ parameters:
108
+ weight: 0.7
109
+ - model: hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode
110
+ parameters:
111
+ weight: 0.1
112
+ - model: Trelis/Meta-Llama-3-8B-Instruct-function-calling
113
+ parameters:
114
+ weight: 0.03
115
+ layer_range: [0, 32]
116
+ - model: Rookie/Llama-3-8B-Instruct-Chinese
117
+ parameters:
118
+ weight: 0.07
119
+ layer_range: [0, 32]
120
+ - model: Uninstruct-Uncensored
121
+ parameters:
122
+ weight: 0.1
123
+ layer_range: [0, 32]
124
+ merge_method: model_stock
125
+ base_model: Uninstruct-Uncensored
126
+ dtype: bfloat16
127
+ name: her-calculator
128
+ ---
129
+ models:
130
+ - model: her-calculator
131
+ parameters:
132
+ density: 0.7 # density gradient
133
+ weight: [0.7, 0.5, 0.1, 0.8]
134
+ - model: augmented-dolphin-hap
135
+ parameters:
136
+ weight: 0.7
137
+ merge_method: slerp
138
+ base_model: her-calculator
139
+ parameters:
140
+ t:
141
+ - filter: self_attn
142
+ value: [0, 0.5, 0.3, 0.7, 1]
143
+ - filter: mlp
144
+ value: [1, 0.5, 0.7, 0.3, 0]
145
+ - value: 0.5 # fallback for rest of tensors
146
+ dtype: float16
147
+ name: sqrt-talker
148
+ ```
149
  ## 💻 Usage
150
 
151
  ```python