File size: 11,525 Bytes
6e75bc3 baa2488 6e75bc3 baa2488 ba0a171 baa2488 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---
# tangled-alpha-0.14-core

```bash
time python -B prepare_base_datasets.py
```
```
i=0, min_len=0, max_len=1073741824, block_size=8193, chunk_size=16386000, len(dataset)=1496631, len(dataset) * block_size=12261897783
Total number of tokens in the optimized dataset '../base-data-0-0-1073741824-8193-2000' is 12261897783
i=1, min_len=8193, max_len=16385, block_size=16385, chunk_size=16385000, len(dataset)=78802, len(dataset) * block_size=1291170770
Total number of tokens in the optimized dataset '../base-data-1-8193-16385-16385-1000' is 1291170770
i=2, min_len=16385, max_len=32769, block_size=32769, chunk_size=16384500, len(dataset)=23511, len(dataset) * block_size=770431959
Total number of tokens in the optimized dataset '../base-data-2-16385-32769-32769-500' is 770431959
i=3, min_len=32769, max_len=65537, block_size=65537, chunk_size=16384250, len(dataset)=5128, len(dataset) * block_size=336073736
Total number of tokens in the optimized dataset '../base-data-3-32769-65537-65537-250' is 336073736
i=4, min_len=65537, max_len=131073, block_size=131073, chunk_size=16384125, len(dataset)=1169, len(dataset) * block_size=153224337
Total number of tokens in the optimized dataset '../base-data-4-65537-131073-131073-125' is 153224337
46G ../base-data-0-0-1073741824-8193-2000
4.9G ../base-data-1-8193-16385-16385-1000
2.9G ../base-data-2-16385-32769-32769-500
1.3G ../base-data-3-32769-65537-65537-250
589M ../base-data-4-65537-131073-131073-125
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_base_model_0.yaml
```
```
```
Backup `wandb`:
```bash
mv wandb wandb-pretrain-base-0
```
Copy config:
```bash
cp ../config-0.json ../out/pretrain-base-0/final/config.json
```
Chat with model:
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-base-0/final
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-base-0/leaderboard/' --batch_size '4' --dtype 'bfloat16' '../out/pretrain-base-0/final'
```
```
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
|leaderboard | N/A| | | | | | | |
| - leaderboard_bbh | N/A| | | | | | | |
| - leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm |↑ |0.4560|± |0.0316|
| - leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm |↑ |0.5187|± |0.0366|
| - leaderboard_bbh_date_understanding | 1|none | 3|acc_norm |↑ |0.2000|± |0.0253|
| - leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm |↑ |0.3400|± |0.0300|
| - leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm |↑ |0.4680|± |0.0316|
| - leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm |↑ |0.0880|± |0.0180|
| - leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm |↑ |0.5160|± |0.0317|
| - leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm |↑ |0.1880|± |0.0248|
| - leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm |↑ |0.1440|± |0.0222|
| - leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm |↑ |0.3360|± |0.0299|
| - leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm |↑ |0.2680|± |0.0281|
| - leaderboard_bbh_navigate | 1|none | 3|acc_norm |↑ |0.5800|± |0.0313|
| - leaderboard_bbh_object_counting | 1|none | 3|acc_norm |↑ |0.0560|± |0.0146|
| - leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm |↑ |0.2055|± |0.0336|
| - leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm |↑ |0.1400|± |0.0220|
| - leaderboard_bbh_ruin_names | 1|none | 3|acc_norm |↑ |0.2160|± |0.0261|
| - leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm |↑ |0.1120|± |0.0200|
| - leaderboard_bbh_snarks | 1|none | 3|acc_norm |↑ |0.5056|± |0.0376|
| - leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm |↑ |0.4800|± |0.0317|
| - leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm |↑ |0.2840|± |0.0286|
| - leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm |↑ |0.2400|± |0.0271|
| - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm |↑ |0.1520|± |0.0228|
| - leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm |↑ |0.3320|± |0.0298|
| - leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm |↑ |0.4880|± |0.0317|
| - leaderboard_gpqa | N/A| | | | | | | |
| - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2071|± |0.0289|
| - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2637|± |0.0189|
| - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2612|± |0.0208|
| - leaderboard_ifeval | 3|none | 0|inst_level_loose_acc |↑ |0.2590|± | N/A|
| | |none | 0|inst_level_strict_acc |↑ |0.2494|± | N/A|
| | |none | 0|prompt_level_loose_acc |↑ |0.1497|± |0.0154|
| | |none | 0|prompt_level_strict_acc|↑ |0.1405|± |0.0150|
| - leaderboard_math_hard | N/A| | | | | | | |
| - leaderboard_math_algebra_hard | 2|none | 4|exact_match |↑ |0.0008|± |0.0008|
| - leaderboard_math_counting_and_prob_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_geometry_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_intermediate_algebra_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_num_theory_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_prealgebra_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_math_precalculus_hard | 2|none | 4|exact_match |↑ |0.0000|± | 0|
| - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.1112|± |0.0029|
| - leaderboard_musr | N/A| | | | | | | |
| - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5240|± |0.0316|
| - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.2578|± |0.0274|
| - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.3960|± |0.0310|
```
```bash
litgpt convert_pretrained_checkpoint ../out/pretrain-base-0/final ../out/pretrain-base-0/checkpoint
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_base_model_1.yaml
```
```bash
litgpt convert_pretrained_checkpoint ../out/pretrain-base-1/final ../out/pretrain-base-1/checkpoint
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_base_model_2.yaml
```
```bash
litgpt convert_pretrained_checkpoint ../out/pretrain-base-2/final ../out/pretrain-base-2/checkpoint
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_base_model_3.yaml
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-base-3/leaderboard/' --batch_size '4' --dtype 'bfloat16' '../out/pretrain-base-3/final'
```
```
```
|