NeoChen1024 commited on
Commit
ffc9510
·
verified ·
1 Parent(s): ac7a9ee

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,5 +1,58 @@
1
- ---
2
- license: other
3
- license_name: wtfpl
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: wtfpl
3
+ language:
4
+ - en
5
+ - zh
6
+ - ja
7
+ - de
8
+ datasets:
9
+ - JosephusCheung/GuanacoDataset
10
+ - meta-math/MetaMathQA
11
+ - jondurbin/airoboros-3.1
12
+ - WizardLM/WizardLM_evol_instruct_V2_196k
13
+ - RyokoAI/ShareGPT52K
14
+ - RyokoAI/Fandom23K
15
+ - milashkaarshif/MoeGirlPedia_wikitext_raw_archive
16
+ - wikipedia
17
+ - wiki_lingua
18
+ - garage-bAInd/Open-Platypus
19
+ - LDJnr/Puffin
20
+ - BAAI/COIG
21
+ - TigerResearch/tigerbot-zhihu-zh-10k
22
+ - liwu/MNBVC
23
+ - teknium/openhermes
24
+ - CausalLM/Refined-Anime-Text
25
+ - microsoft/orca-math-word-problems-200k
26
+ - m-a-p/CodeFeedback-Filtered-Instruction
27
+ ---
28
+
29
+ **Sorry, it's no longer available on Hugging Face. Please reach out to those who have already downloaded it. If you have a copy, please refrain from re-uploading it to Hugging Face.**
30
+
31
+ **Due to repeated conflicts with HF and what we perceive as their repeated misuse of the "Contributor Covenant Code of Conduct," we have lost confidence in the platform and decided to temporarily suspend all new download access requests. It appears to us that HF's original intention has been abandoned in pursuit of commercialization, and they no longer prioritize the well-being of the community.**
32
+
33
+
34
+ Demo: [![](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/JosephusCheung/CausalLM-35B-long-Q6K-GGUF)
35
+
36
+ # 35b-beta-long
37
+
38
+ This release, CausalLM/35b-beta-long, represents the culmination of our experience and accumulated training data in fine-tuning large language models. We are open-sourcing these weights to foster development within the open-source community.
39
+
40
+ We chose Cohere's multilingual, 35B-parameter with long context [CohereForAI/c4ai-command-r-v01] MHA model as our base. In our evaluation, it proved to be the most responsive to the quality of training data throughout the Supervised Fine-Tuning process, outperforming other open-source LLMs. Although its initial SFT/RL focuses on specific tasks and comes with a non-commercial license, we believe it's currently the best foundation for personal and internal use cases.
41
+
42
+ Utilizing extensive factual content from web crawls, we synthesized over 30 million multi-turn dialogue data entries, grounded in multiple web-pages or documents. This process involved substantial human oversight and a data pipeline designed to ensure high quality. The model was then trained on this data in full 128K context using BF16 precision. We also incorporated widely-used open-source dialogue datasets to enhance general conversational fluency.
43
+
44
+ Our data synthesis approach addressed crucial limitations in typical LLM training corpora. LLMs often struggle to extract thematic summaries, key information, or perform comparisons at the paragraph or document level. Therefore, we focused on generating fact-based data using multiple documents within a long context setting. This involved leveraging existing SOTA LLMs with human guidance to synthesize information through thematic summarization, information extraction, and comparison of source materials.
45
+
46
+ This approach yielded significant improvements in model performance during fine-tuning. We observed reductions in hallucinations, enhanced long-context capabilities, and improvements in general abilities such as math, coding, and knowledge recall. The training process incorporated both the original source material and the synthesized outputs, further reinforcing the model's ability to recall and utilize abstract concepts embedded within the pre-training data. Our analysis revealed that this combination of original and synthesized data was crucial for achieving a more balanced performance profile. Intermediate checkpoints and models trained solely on synthesized data are also released for research purposes.
47
+
48
+ Compared to the original task-specific model, our further fine-tuned model demonstrates more robust recall in long-context scenarios without requiring specific document formatting or prompt engineering. This fine-tuned model also exhibits performance comparable to models twice its size in quantifiable benchmarks.
49
+
50
+ As this model has only undergone SFT, it may still exhibit biases or generate undesirable content. We implemented basic safety measures using open-source refusal datasets to mitigate outputs related to illegal activities, NSFW content, and violence. However, further Reinforcement Learning is necessary for robust alignment with human values.
51
+
52
+ ## Please note
53
+
54
+ Tokenizer is different from cohere - and chat template is **ChatML**.
55
+
56
+ Pressure Testing from: https://github.com/LeonEricsson/llmcontext
57
+
58
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63468a143ea42ee2cb49ddd1/2XbONpyTeMH1qWCtE9ziH.png)
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "35b",
3
+ "architectures": [
4
+ "CohereForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 5,
9
+ "eos_token_id": 6,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 8192,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 22528,
14
+ "layer_norm_eps": 1e-05,
15
+ "logit_scale": 0.0625,
16
+ "max_position_embeddings": 8192,
17
+ "model_max_length": 131072,
18
+ "model_type": "cohere",
19
+ "num_attention_heads": 64,
20
+ "num_hidden_layers": 40,
21
+ "num_key_value_heads": 64,
22
+ "pad_token_id": 0,
23
+ "pretraining_tp": 1,
24
+ "rms_norm_eps": 1e-05,
25
+ "rope_theta": 8000000.0,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.38.2",
28
+ "use_cache": true,
29
+ "vocab_size": 256000,
30
+ "quantization_config": {
31
+ "quant_method": "exl2",
32
+ "version": "0.2.2",
33
+ "bits": 4.0,
34
+ "head_bits": 8,
35
+ "calibration": {
36
+ "rows": 115,
37
+ "length": 2048,
38
+ "dataset": "(default)"
39
+ }
40
+ }
41
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 5,
4
+ "eos_token_id": 6,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.38.2"
7
+ }
huggingface-metadata.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ url: https://huggingface.co/CausalLM/35b-beta-long
2
+ branch: main
3
+ download date: 2024-09-15 22:30:39
4
+ sha256sum:
5
+ 5428fa31fd03765d5c0eb14d3680ba058ee1e0eca4b25140092bb9d669914bbf model-00001-of-00015.safetensors
6
+ c8d70c9ce69e42faf9616e1bda1448c2766f00fcaa20800d3beda8302cbb8e5c model-00002-of-00015.safetensors
7
+ 10f1162a4f10ebf07324293635b6b9ee3509a835dc47a409abd87b92203f4d26 model-00003-of-00015.safetensors
8
+ 6d3d16b8c67947bbfbd37c3b235f50337aa4e0b8450a5c1f21d216bb75456e59 model-00004-of-00015.safetensors
9
+ 120a4429056e6efd04b1d2756b3e625bd829c4b458203d1cbf1e2e8a7b678489 model-00005-of-00015.safetensors
10
+ 478c89965e4390aa458d52bbef525f95cb69eb277db1f8454ad3b0dbd8b52b7c model-00006-of-00015.safetensors
11
+ 5d71536b1c2a5c33f27330b19010f7493c1599898207dc57aa1e7e38767a4c2b model-00007-of-00015.safetensors
12
+ 57a64b41fcf22f9fd4855f542dac9d99aae242c9ad1245d34d2b71c428fe32aa model-00008-of-00015.safetensors
13
+ 7ad83531189bb6d9456710a903396ec02987be03f6539048b85f1ac25a01dd10 model-00009-of-00015.safetensors
14
+ b18986af87bed9d98c7b9deff616540b7721c379113668191bf8f848e5a050fc model-00010-of-00015.safetensors
15
+ 5ac15fdc4368f7a3532b7e114938aa5e8e50db07f01962bc3801240b9939d9c1 model-00011-of-00015.safetensors
16
+ b9ae9ccf809835bfd3c3466c80b1377da957896b34a7090614a508220bd7c1df model-00012-of-00015.safetensors
17
+ b058ba038b230322ef83091c8a91731d384eb6ca11058a9f5df38a7c3da3df83 model-00013-of-00015.safetensors
18
+ 7f6db7c3e17ce948ac5202197613ad977a2dd6a8e474e30076f5146571a4a0a4 model-00014-of-00015.safetensors
19
+ cd1fabad5e9533b25b07d107ba37c5f580bc7a8c1871b794baefdae3fa976b76 model-00015-of-00015.safetensors
20
+ 3ec24d1fe80ac960489b2004b7399ea561799de2fae774bd5a9234c13e6a0726 tokenizer.json
measurement.json ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 69961662464
4
+ },
5
+ "weight_map": {
6
+ "model.embed_tokens.weight": "model-00001-of-00015.safetensors",
7
+ "model.layers.0.input_layernorm.weight": "model-00002-of-00015.safetensors",
8
+ "model.layers.0.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
9
+ "model.layers.0.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
10
+ "model.layers.0.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
11
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
12
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
13
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
14
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
15
+ "model.layers.1.input_layernorm.weight": "model-00002-of-00015.safetensors",
16
+ "model.layers.1.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
17
+ "model.layers.1.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
18
+ "model.layers.1.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
19
+ "model.layers.1.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
20
+ "model.layers.1.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
21
+ "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
22
+ "model.layers.1.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
23
+ "model.layers.10.input_layernorm.weight": "model-00005-of-00015.safetensors",
24
+ "model.layers.10.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
25
+ "model.layers.10.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
26
+ "model.layers.10.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
27
+ "model.layers.10.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
28
+ "model.layers.10.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
29
+ "model.layers.10.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
30
+ "model.layers.10.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
31
+ "model.layers.11.input_layernorm.weight": "model-00005-of-00015.safetensors",
32
+ "model.layers.11.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
33
+ "model.layers.11.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
34
+ "model.layers.11.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
35
+ "model.layers.11.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
36
+ "model.layers.11.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
37
+ "model.layers.11.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
38
+ "model.layers.11.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
39
+ "model.layers.12.input_layernorm.weight": "model-00006-of-00015.safetensors",
40
+ "model.layers.12.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
41
+ "model.layers.12.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
42
+ "model.layers.12.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
43
+ "model.layers.12.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
44
+ "model.layers.12.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
45
+ "model.layers.12.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
46
+ "model.layers.12.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
47
+ "model.layers.13.input_layernorm.weight": "model-00006-of-00015.safetensors",
48
+ "model.layers.13.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
49
+ "model.layers.13.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
50
+ "model.layers.13.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
51
+ "model.layers.13.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
52
+ "model.layers.13.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
53
+ "model.layers.13.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
54
+ "model.layers.13.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
55
+ "model.layers.14.input_layernorm.weight": "model-00006-of-00015.safetensors",
56
+ "model.layers.14.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
57
+ "model.layers.14.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
58
+ "model.layers.14.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
59
+ "model.layers.14.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
60
+ "model.layers.14.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
61
+ "model.layers.14.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
62
+ "model.layers.14.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
63
+ "model.layers.15.input_layernorm.weight": "model-00007-of-00015.safetensors",
64
+ "model.layers.15.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
65
+ "model.layers.15.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
66
+ "model.layers.15.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
67
+ "model.layers.15.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
68
+ "model.layers.15.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
69
+ "model.layers.15.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
70
+ "model.layers.15.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
71
+ "model.layers.16.input_layernorm.weight": "model-00007-of-00015.safetensors",
72
+ "model.layers.16.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
73
+ "model.layers.16.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
74
+ "model.layers.16.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
75
+ "model.layers.16.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
76
+ "model.layers.16.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
77
+ "model.layers.16.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
78
+ "model.layers.16.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
79
+ "model.layers.17.input_layernorm.weight": "model-00007-of-00015.safetensors",
80
+ "model.layers.17.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
81
+ "model.layers.17.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
82
+ "model.layers.17.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
83
+ "model.layers.17.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
84
+ "model.layers.17.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
85
+ "model.layers.17.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
86
+ "model.layers.17.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
87
+ "model.layers.18.input_layernorm.weight": "model-00008-of-00015.safetensors",
88
+ "model.layers.18.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
89
+ "model.layers.18.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
90
+ "model.layers.18.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
91
+ "model.layers.18.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
92
+ "model.layers.18.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
93
+ "model.layers.18.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
94
+ "model.layers.18.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
95
+ "model.layers.19.input_layernorm.weight": "model-00008-of-00015.safetensors",
96
+ "model.layers.19.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
97
+ "model.layers.19.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
98
+ "model.layers.19.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
99
+ "model.layers.19.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
100
+ "model.layers.19.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
101
+ "model.layers.19.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
102
+ "model.layers.19.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
103
+ "model.layers.2.input_layernorm.weight": "model-00002-of-00015.safetensors",
104
+ "model.layers.2.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
105
+ "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
106
+ "model.layers.2.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
107
+ "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
108
+ "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
109
+ "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
110
+ "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
111
+ "model.layers.20.input_layernorm.weight": "model-00008-of-00015.safetensors",
112
+ "model.layers.20.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
113
+ "model.layers.20.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
114
+ "model.layers.20.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
115
+ "model.layers.20.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
116
+ "model.layers.20.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
117
+ "model.layers.20.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
118
+ "model.layers.20.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
119
+ "model.layers.21.input_layernorm.weight": "model-00009-of-00015.safetensors",
120
+ "model.layers.21.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
121
+ "model.layers.21.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
122
+ "model.layers.21.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
123
+ "model.layers.21.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
124
+ "model.layers.21.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
125
+ "model.layers.21.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
126
+ "model.layers.21.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
127
+ "model.layers.22.input_layernorm.weight": "model-00009-of-00015.safetensors",
128
+ "model.layers.22.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
129
+ "model.layers.22.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
130
+ "model.layers.22.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
131
+ "model.layers.22.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
132
+ "model.layers.22.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
133
+ "model.layers.22.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
134
+ "model.layers.22.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
135
+ "model.layers.23.input_layernorm.weight": "model-00009-of-00015.safetensors",
136
+ "model.layers.23.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
137
+ "model.layers.23.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
138
+ "model.layers.23.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
139
+ "model.layers.23.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
140
+ "model.layers.23.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
141
+ "model.layers.23.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
142
+ "model.layers.23.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
143
+ "model.layers.24.input_layernorm.weight": "model-00010-of-00015.safetensors",
144
+ "model.layers.24.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
145
+ "model.layers.24.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
146
+ "model.layers.24.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
147
+ "model.layers.24.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
148
+ "model.layers.24.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
149
+ "model.layers.24.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
150
+ "model.layers.24.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
151
+ "model.layers.25.input_layernorm.weight": "model-00010-of-00015.safetensors",
152
+ "model.layers.25.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
153
+ "model.layers.25.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
154
+ "model.layers.25.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
155
+ "model.layers.25.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
156
+ "model.layers.25.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
157
+ "model.layers.25.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
158
+ "model.layers.25.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
159
+ "model.layers.26.input_layernorm.weight": "model-00010-of-00015.safetensors",
160
+ "model.layers.26.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
161
+ "model.layers.26.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
162
+ "model.layers.26.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
163
+ "model.layers.26.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
164
+ "model.layers.26.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
165
+ "model.layers.26.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
166
+ "model.layers.26.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
167
+ "model.layers.27.input_layernorm.weight": "model-00011-of-00015.safetensors",
168
+ "model.layers.27.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
169
+ "model.layers.27.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
170
+ "model.layers.27.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
171
+ "model.layers.27.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
172
+ "model.layers.27.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
173
+ "model.layers.27.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
174
+ "model.layers.27.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
175
+ "model.layers.28.input_layernorm.weight": "model-00011-of-00015.safetensors",
176
+ "model.layers.28.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
177
+ "model.layers.28.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
178
+ "model.layers.28.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
179
+ "model.layers.28.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
180
+ "model.layers.28.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
181
+ "model.layers.28.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
182
+ "model.layers.28.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
183
+ "model.layers.29.input_layernorm.weight": "model-00011-of-00015.safetensors",
184
+ "model.layers.29.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
185
+ "model.layers.29.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
186
+ "model.layers.29.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
187
+ "model.layers.29.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
188
+ "model.layers.29.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
189
+ "model.layers.29.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
190
+ "model.layers.29.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
191
+ "model.layers.3.input_layernorm.weight": "model-00003-of-00015.safetensors",
192
+ "model.layers.3.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
193
+ "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
194
+ "model.layers.3.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
195
+ "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
196
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
197
+ "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
198
+ "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
199
+ "model.layers.30.input_layernorm.weight": "model-00012-of-00015.safetensors",
200
+ "model.layers.30.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
201
+ "model.layers.30.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
202
+ "model.layers.30.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
203
+ "model.layers.30.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
204
+ "model.layers.30.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
205
+ "model.layers.30.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
206
+ "model.layers.30.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
207
+ "model.layers.31.input_layernorm.weight": "model-00012-of-00015.safetensors",
208
+ "model.layers.31.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
209
+ "model.layers.31.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
210
+ "model.layers.31.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
211
+ "model.layers.31.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
212
+ "model.layers.31.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
213
+ "model.layers.31.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
214
+ "model.layers.31.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
215
+ "model.layers.32.input_layernorm.weight": "model-00012-of-00015.safetensors",
216
+ "model.layers.32.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
217
+ "model.layers.32.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
218
+ "model.layers.32.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
219
+ "model.layers.32.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
220
+ "model.layers.32.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
221
+ "model.layers.32.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
222
+ "model.layers.32.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
223
+ "model.layers.33.input_layernorm.weight": "model-00013-of-00015.safetensors",
224
+ "model.layers.33.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
225
+ "model.layers.33.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
226
+ "model.layers.33.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
227
+ "model.layers.33.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
228
+ "model.layers.33.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
229
+ "model.layers.33.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
230
+ "model.layers.33.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
231
+ "model.layers.34.input_layernorm.weight": "model-00013-of-00015.safetensors",
232
+ "model.layers.34.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
233
+ "model.layers.34.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
234
+ "model.layers.34.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
235
+ "model.layers.34.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
236
+ "model.layers.34.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
237
+ "model.layers.34.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
238
+ "model.layers.34.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
239
+ "model.layers.35.input_layernorm.weight": "model-00013-of-00015.safetensors",
240
+ "model.layers.35.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
241
+ "model.layers.35.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
242
+ "model.layers.35.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
243
+ "model.layers.35.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
244
+ "model.layers.35.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
245
+ "model.layers.35.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
246
+ "model.layers.35.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
247
+ "model.layers.36.input_layernorm.weight": "model-00014-of-00015.safetensors",
248
+ "model.layers.36.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
249
+ "model.layers.36.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
250
+ "model.layers.36.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
251
+ "model.layers.36.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
252
+ "model.layers.36.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
253
+ "model.layers.36.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
254
+ "model.layers.36.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
255
+ "model.layers.37.input_layernorm.weight": "model-00014-of-00015.safetensors",
256
+ "model.layers.37.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
257
+ "model.layers.37.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
258
+ "model.layers.37.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
259
+ "model.layers.37.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
260
+ "model.layers.37.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
261
+ "model.layers.37.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
262
+ "model.layers.37.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
263
+ "model.layers.38.input_layernorm.weight": "model-00014-of-00015.safetensors",
264
+ "model.layers.38.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
265
+ "model.layers.38.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
266
+ "model.layers.38.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
267
+ "model.layers.38.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
268
+ "model.layers.38.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
269
+ "model.layers.38.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
270
+ "model.layers.38.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
271
+ "model.layers.39.input_layernorm.weight": "model-00015-of-00015.safetensors",
272
+ "model.layers.39.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
273
+ "model.layers.39.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
274
+ "model.layers.39.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
275
+ "model.layers.39.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
276
+ "model.layers.39.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
277
+ "model.layers.39.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
278
+ "model.layers.39.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
279
+ "model.layers.4.input_layernorm.weight": "model-00003-of-00015.safetensors",
280
+ "model.layers.4.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
281
+ "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
282
+ "model.layers.4.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
283
+ "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
284
+ "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
285
+ "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
286
+ "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
287
+ "model.layers.5.input_layernorm.weight": "model-00003-of-00015.safetensors",
288
+ "model.layers.5.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
289
+ "model.layers.5.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
290
+ "model.layers.5.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
291
+ "model.layers.5.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
292
+ "model.layers.5.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
293
+ "model.layers.5.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
294
+ "model.layers.5.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
295
+ "model.layers.6.input_layernorm.weight": "model-00004-of-00015.safetensors",
296
+ "model.layers.6.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
297
+ "model.layers.6.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
298
+ "model.layers.6.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
299
+ "model.layers.6.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
300
+ "model.layers.6.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
301
+ "model.layers.6.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
302
+ "model.layers.6.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
303
+ "model.layers.7.input_layernorm.weight": "model-00004-of-00015.safetensors",
304
+ "model.layers.7.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
305
+ "model.layers.7.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
306
+ "model.layers.7.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
307
+ "model.layers.7.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
308
+ "model.layers.7.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
309
+ "model.layers.7.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
310
+ "model.layers.7.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
311
+ "model.layers.8.input_layernorm.weight": "model-00004-of-00015.safetensors",
312
+ "model.layers.8.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
313
+ "model.layers.8.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
314
+ "model.layers.8.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
315
+ "model.layers.8.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
316
+ "model.layers.8.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
317
+ "model.layers.8.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
318
+ "model.layers.8.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
319
+ "model.layers.9.input_layernorm.weight": "model-00005-of-00015.safetensors",
320
+ "model.layers.9.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
321
+ "model.layers.9.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
322
+ "model.layers.9.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
323
+ "model.layers.9.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
324
+ "model.layers.9.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
325
+ "model.layers.9.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
326
+ "model.layers.9.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
327
+ "model.norm.weight": "model-00015-of-00015.safetensors"
328
+ }
329
+ }
output-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aacfd58261f09e1a78ad82f31c92c3d09b820053e01b6162072dbfb663edf33
3
+ size 8495211742
output-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a0c5fb2f23ad266985f961b37ad553a1118dade0e76a38d65868340bf0dfd9d
3
+ size 8558121072
output-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e6adf9995a88b0b8723a0e71ca7169a4c497edafc556331e1222ac4a0b834e0
3
+ size 6737196570
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<PAD>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ec24d1fe80ac960489b2004b7399ea561799de2fae774bd5a9234c13e6a0726
3
+ size 12777306
tokenizer_config.json ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<PAD>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<UNK>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<CLS>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<SEP>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<MASK_TOKEN>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<s>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "</s>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "<EOP_TOKEN>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "255000": {
70
+ "content": "<|im_start|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": false
76
+ },
77
+ "255001": {
78
+ "content": "<|im_end|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": false
84
+ },
85
+ "255002": {
86
+ "content": "<|YES_TOKEN|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": false
92
+ },
93
+ "255003": {
94
+ "content": "<|NO_TOKEN|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": false
100
+ },
101
+ "255004": {
102
+ "content": "<|GOOD_TOKEN|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": false
108
+ },
109
+ "255005": {
110
+ "content": "<|BAD_TOKEN|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": false
116
+ },
117
+ "255006": {
118
+ "content": "<|USER_TOKEN|>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "255007": {
126
+ "content": "<|CHATBOT_TOKEN|>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "255008": {
134
+ "content": "<|SYSTEM_TOKEN|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "255009": {
142
+ "content": "<|USER_0_TOKEN|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "255010": {
150
+ "content": "<|USER_1_TOKEN|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "255011": {
158
+ "content": "<|USER_2_TOKEN|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "255012": {
166
+ "content": "<|USER_3_TOKEN|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "255013": {
174
+ "content": "<|USER_4_TOKEN|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "255014": {
182
+ "content": "<|USER_5_TOKEN|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "255015": {
190
+ "content": "<|USER_6_TOKEN|>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "255016": {
198
+ "content": "<|USER_7_TOKEN|>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "255017": {
206
+ "content": "<|USER_8_TOKEN|>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ },
213
+ "255018": {
214
+ "content": "<|USER_9_TOKEN|>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": false
220
+ },
221
+ "255019": {
222
+ "content": "<|EXTRA_0_TOKEN|>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": false
228
+ },
229
+ "255020": {
230
+ "content": "<|EXTRA_1_TOKEN|>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": false
236
+ },
237
+ "255021": {
238
+ "content": "<|EXTRA_2_TOKEN|>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": false
244
+ },
245
+ "255022": {
246
+ "content": "<|EXTRA_3_TOKEN|>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": false
252
+ },
253
+ "255023": {
254
+ "content": "<|EXTRA_4_TOKEN|>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": false
260
+ },
261
+ "255024": {
262
+ "content": "<|EXTRA_5_TOKEN|>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": false
268
+ },
269
+ "255025": {
270
+ "content": "<|EXTRA_6_TOKEN|>",
271
+ "lstrip": false,
272
+ "normalized": false,
273
+ "rstrip": false,
274
+ "single_word": false,
275
+ "special": false
276
+ },
277
+ "255026": {
278
+ "content": "<|EXTRA_7_TOKEN|>",
279
+ "lstrip": false,
280
+ "normalized": false,
281
+ "rstrip": false,
282
+ "single_word": false,
283
+ "special": false
284
+ },
285
+ "255027": {
286
+ "content": "<|EXTRA_8_TOKEN|>",
287
+ "lstrip": false,
288
+ "normalized": false,
289
+ "rstrip": false,
290
+ "single_word": false,
291
+ "special": false
292
+ },
293
+ "255028": {
294
+ "content": "<|EXTRA_9_TOKEN|>",
295
+ "lstrip": false,
296
+ "normalized": false,
297
+ "rstrip": false,
298
+ "single_word": false,
299
+ "special": false
300
+ }
301
+ },
302
+ "bos_token": "<s>",
303
+ "clean_up_tokenization_spaces": false,
304
+ "eos_token": "</s>",
305
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}",
306
+ "legacy": true,
307
+ "model_max_length": 1000000000000000019884624838656,
308
+ "pad_token": "<PAD>",
309
+ "sp_model_kwargs": {},
310
+ "spaces_between_special_tokens": false,
311
+ "tokenizer_class": "LlamaTokenizer",
312
+ "unk_token": null,
313
+ "use_default_system_prompt": false
314
+ }