Text Generation
Transformers
Safetensors
English
rwkv7
custom_code
ZhangRC commited on
Commit
c39901a
·
verified ·
1 Parent(s): 2259670

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -11,7 +11,7 @@ base_model:
11
  pipeline_tag: text-generation
12
  ---
13
 
14
- # rwkv7-168M-pile
15
 
16
  <!-- Provide a quick summary of what the model is/does. -->
17
 
@@ -29,7 +29,7 @@ This is RWKV-7 model under flash-linear attention format.
29
  - **Model type:** RWKV7
30
  - **Language(s) (NLP):** English
31
  - **License:** Apache-2.0
32
- - **Parameter count:** 168M
33
  - **Tokenizer:** GPT-NeoX 20B tokenizer
34
 
35
  ### Model Sources
@@ -38,7 +38,6 @@ This is RWKV-7 model under flash-linear attention format.
38
 
39
  - **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
40
  - **Paper:** With in Progress
41
- - **Weights:** Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth
42
 
43
  ## Uses
44
 
@@ -56,8 +55,8 @@ pip install 'transformers>=4.48.0'
56
  You can use this model just as any other HuggingFace models:
57
  ```python
58
  from transformers import AutoModelForCausalLM, AutoTokenizer
59
- model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
60
- tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
61
  ```
62
 
63
  ## Training Details
@@ -74,9 +73,9 @@ This model is trained on the Pile with a total of 332 billion tokens.
74
 
75
  #### Metrics
76
 
77
- `lambada_openai`: ppl 14.2 acc 45.6%
78
 
79
- `piqa`: acc 65.5%
80
 
81
  ## FAQ
82
  Q: safetensors metadata is none.
 
11
  pipeline_tag: text-generation
12
  ---
13
 
14
+ # rwkv7-421M-pile
15
 
16
  <!-- Provide a quick summary of what the model is/does. -->
17
 
 
29
  - **Model type:** RWKV7
30
  - **Language(s) (NLP):** English
31
  - **License:** Apache-2.0
32
+ - **Parameter count:** 421M
33
  - **Tokenizer:** GPT-NeoX 20B tokenizer
34
 
35
  ### Model Sources
 
38
 
39
  - **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
40
  - **Paper:** With in Progress
 
41
 
42
  ## Uses
43
 
 
55
  You can use this model just as any other HuggingFace models:
56
  ```python
57
  from transformers import AutoModelForCausalLM, AutoTokenizer
58
+ model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-421M-pile', trust_remote_code=True)
59
+ tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-421M-pile', trust_remote_code=True)
60
  ```
61
 
62
  ## Training Details
 
73
 
74
  #### Metrics
75
 
76
+ `lambada_openai`: ppl 7.21 acc 57.9%
77
 
78
+ `piqa`: acc 69.2%
79
 
80
  ## FAQ
81
  Q: safetensors metadata is none.