Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ base_model:
|
|
11 |
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
-
# rwkv7-
|
15 |
|
16 |
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
|
@@ -29,7 +29,7 @@ This is RWKV-7 model under flash-linear attention format.
|
|
29 |
- **Model type:** RWKV7
|
30 |
- **Language(s) (NLP):** English
|
31 |
- **License:** Apache-2.0
|
32 |
-
- **Parameter count:**
|
33 |
- **Tokenizer:** GPT-NeoX 20B tokenizer
|
34 |
|
35 |
### Model Sources
|
@@ -38,7 +38,6 @@ This is RWKV-7 model under flash-linear attention format.
|
|
38 |
|
39 |
- **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
|
40 |
- **Paper:** With in Progress
|
41 |
-
- **Weights:** Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth
|
42 |
|
43 |
## Uses
|
44 |
|
@@ -56,8 +55,8 @@ pip install 'transformers>=4.48.0'
|
|
56 |
You can use this model just as any other HuggingFace models:
|
57 |
```python
|
58 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
59 |
-
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-
|
60 |
-
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-
|
61 |
```
|
62 |
|
63 |
## Training Details
|
@@ -74,9 +73,9 @@ This model is trained on the Pile with a total of 332 billion tokens.
|
|
74 |
|
75 |
#### Metrics
|
76 |
|
77 |
-
`lambada_openai`: ppl
|
78 |
|
79 |
-
`piqa`: acc
|
80 |
|
81 |
## FAQ
|
82 |
Q: safetensors metadata is none.
|
|
|
11 |
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
+
# rwkv7-421M-pile
|
15 |
|
16 |
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
|
|
|
29 |
- **Model type:** RWKV7
|
30 |
- **Language(s) (NLP):** English
|
31 |
- **License:** Apache-2.0
|
32 |
+
- **Parameter count:** 421M
|
33 |
- **Tokenizer:** GPT-NeoX 20B tokenizer
|
34 |
|
35 |
### Model Sources
|
|
|
38 |
|
39 |
- **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
|
40 |
- **Paper:** With in Progress
|
|
|
41 |
|
42 |
## Uses
|
43 |
|
|
|
55 |
You can use this model just as any other HuggingFace models:
|
56 |
```python
|
57 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
58 |
+
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-421M-pile', trust_remote_code=True)
|
59 |
+
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-421M-pile', trust_remote_code=True)
|
60 |
```
|
61 |
|
62 |
## Training Details
|
|
|
73 |
|
74 |
#### Metrics
|
75 |
|
76 |
+
`lambada_openai`: ppl 7.21 acc 57.9%
|
77 |
|
78 |
+
`piqa`: acc 69.2%
|
79 |
|
80 |
## FAQ
|
81 |
Q: safetensors metadata is none.
|