jonny9f commited on
Commit
92b6ccb
·
verified ·
1 Parent(s): 5941003

Upload folder using huggingface_hub

Browse files
checkpoint-1692/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-uncased",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForSequenceClassification"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "problem_type": "single_label_classification",
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.48.0",
24
+ "vocab_size": 30522
25
+ }
checkpoint-1692/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b232355fadb9bbfb707eed2b6980060355c8d8c7e4b5dc79d821f4dd37af8d23
3
+ size 267832560
checkpoint-1692/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ca9edab0ecf2b9d2edb966bbbee452b11e0868b7714b2ef944adbeb30b31472
3
+ size 535727290
checkpoint-1692/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7dfc3241a5237ebe6101d857d6ea2aef20fc76c54f21d994c1489138f86373cd
3
+ size 14244
checkpoint-1692/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2146e450e271a8b076696e032564edc6a54af58183cc1595f4414249e96aa7c5
3
+ size 1064
checkpoint-1692/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-1692/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1692/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
checkpoint-1692/trainer_state.json ADDED
@@ -0,0 +1,1240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1692,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01773049645390071,
13
+ "grad_norm": 1.0585654973983765,
14
+ "learning_rate": 4.970449172576833e-05,
15
+ "loss": 0.4344,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.03546099290780142,
20
+ "grad_norm": 0.5365344285964966,
21
+ "learning_rate": 4.940898345153664e-05,
22
+ "loss": 0.357,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.05319148936170213,
27
+ "grad_norm": 1.0602149963378906,
28
+ "learning_rate": 4.911347517730497e-05,
29
+ "loss": 0.3397,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.07092198581560284,
34
+ "grad_norm": 2.197633743286133,
35
+ "learning_rate": 4.8817966903073283e-05,
36
+ "loss": 0.3472,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.08865248226950355,
41
+ "grad_norm": 1.7486677169799805,
42
+ "learning_rate": 4.852245862884161e-05,
43
+ "loss": 0.3175,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.10638297872340426,
48
+ "grad_norm": 2.9779016971588135,
49
+ "learning_rate": 4.822695035460993e-05,
50
+ "loss": 0.3174,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.12411347517730496,
55
+ "grad_norm": 1.5208463668823242,
56
+ "learning_rate": 4.793144208037825e-05,
57
+ "loss": 0.3061,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.14184397163120568,
62
+ "grad_norm": 2.6333279609680176,
63
+ "learning_rate": 4.763593380614658e-05,
64
+ "loss": 0.3196,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.1595744680851064,
69
+ "grad_norm": 2.9724740982055664,
70
+ "learning_rate": 4.734042553191489e-05,
71
+ "loss": 0.2976,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.1773049645390071,
76
+ "grad_norm": 1.83737313747406,
77
+ "learning_rate": 4.704491725768322e-05,
78
+ "loss": 0.2861,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.1950354609929078,
83
+ "grad_norm": 2.455568313598633,
84
+ "learning_rate": 4.674940898345154e-05,
85
+ "loss": 0.3028,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.2127659574468085,
90
+ "grad_norm": 1.068047046661377,
91
+ "learning_rate": 4.645390070921986e-05,
92
+ "loss": 0.29,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.23049645390070922,
97
+ "grad_norm": 1.9365094900131226,
98
+ "learning_rate": 4.615839243498818e-05,
99
+ "loss": 0.2338,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.24822695035460993,
104
+ "grad_norm": 4.382534027099609,
105
+ "learning_rate": 4.58628841607565e-05,
106
+ "loss": 0.2383,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.26595744680851063,
111
+ "grad_norm": 1.8602826595306396,
112
+ "learning_rate": 4.556737588652483e-05,
113
+ "loss": 0.2698,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.28368794326241137,
118
+ "grad_norm": 4.105195999145508,
119
+ "learning_rate": 4.527186761229315e-05,
120
+ "loss": 0.2895,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.30141843971631205,
125
+ "grad_norm": 2.267707347869873,
126
+ "learning_rate": 4.497635933806147e-05,
127
+ "loss": 0.3412,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.3191489361702128,
132
+ "grad_norm": 1.7341336011886597,
133
+ "learning_rate": 4.468085106382979e-05,
134
+ "loss": 0.2304,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 0.33687943262411346,
139
+ "grad_norm": 4.169299602508545,
140
+ "learning_rate": 4.438534278959811e-05,
141
+ "loss": 0.2063,
142
+ "step": 190
143
+ },
144
+ {
145
+ "epoch": 0.3546099290780142,
146
+ "grad_norm": 4.8010663986206055,
147
+ "learning_rate": 4.4089834515366435e-05,
148
+ "loss": 0.2919,
149
+ "step": 200
150
+ },
151
+ {
152
+ "epoch": 0.3723404255319149,
153
+ "grad_norm": 1.2835743427276611,
154
+ "learning_rate": 4.3794326241134755e-05,
155
+ "loss": 0.2172,
156
+ "step": 210
157
+ },
158
+ {
159
+ "epoch": 0.3900709219858156,
160
+ "grad_norm": 2.9821650981903076,
161
+ "learning_rate": 4.3498817966903076e-05,
162
+ "loss": 0.2404,
163
+ "step": 220
164
+ },
165
+ {
166
+ "epoch": 0.4078014184397163,
167
+ "grad_norm": 4.525284767150879,
168
+ "learning_rate": 4.3203309692671396e-05,
169
+ "loss": 0.2233,
170
+ "step": 230
171
+ },
172
+ {
173
+ "epoch": 0.425531914893617,
174
+ "grad_norm": 2.9262726306915283,
175
+ "learning_rate": 4.2907801418439716e-05,
176
+ "loss": 0.2247,
177
+ "step": 240
178
+ },
179
+ {
180
+ "epoch": 0.4432624113475177,
181
+ "grad_norm": 1.120538592338562,
182
+ "learning_rate": 4.2612293144208036e-05,
183
+ "loss": 0.2,
184
+ "step": 250
185
+ },
186
+ {
187
+ "epoch": 0.46099290780141844,
188
+ "grad_norm": 3.67739200592041,
189
+ "learning_rate": 4.231678486997636e-05,
190
+ "loss": 0.2774,
191
+ "step": 260
192
+ },
193
+ {
194
+ "epoch": 0.4787234042553192,
195
+ "grad_norm": 2.233776330947876,
196
+ "learning_rate": 4.2021276595744684e-05,
197
+ "loss": 0.2339,
198
+ "step": 270
199
+ },
200
+ {
201
+ "epoch": 0.49645390070921985,
202
+ "grad_norm": 3.464088201522827,
203
+ "learning_rate": 4.1725768321513004e-05,
204
+ "loss": 0.2443,
205
+ "step": 280
206
+ },
207
+ {
208
+ "epoch": 0.5141843971631206,
209
+ "grad_norm": 3.5438666343688965,
210
+ "learning_rate": 4.1430260047281324e-05,
211
+ "loss": 0.2949,
212
+ "step": 290
213
+ },
214
+ {
215
+ "epoch": 0.5319148936170213,
216
+ "grad_norm": 2.237935781478882,
217
+ "learning_rate": 4.1134751773049644e-05,
218
+ "loss": 0.212,
219
+ "step": 300
220
+ },
221
+ {
222
+ "epoch": 0.549645390070922,
223
+ "grad_norm": 2.065207004547119,
224
+ "learning_rate": 4.083924349881797e-05,
225
+ "loss": 0.1957,
226
+ "step": 310
227
+ },
228
+ {
229
+ "epoch": 0.5673758865248227,
230
+ "grad_norm": 1.892553448677063,
231
+ "learning_rate": 4.0543735224586285e-05,
232
+ "loss": 0.2182,
233
+ "step": 320
234
+ },
235
+ {
236
+ "epoch": 0.5851063829787234,
237
+ "grad_norm": 1.1844576597213745,
238
+ "learning_rate": 4.024822695035461e-05,
239
+ "loss": 0.258,
240
+ "step": 330
241
+ },
242
+ {
243
+ "epoch": 0.6028368794326241,
244
+ "grad_norm": 1.786897897720337,
245
+ "learning_rate": 3.995271867612293e-05,
246
+ "loss": 0.2151,
247
+ "step": 340
248
+ },
249
+ {
250
+ "epoch": 0.6205673758865248,
251
+ "grad_norm": 4.924149036407471,
252
+ "learning_rate": 3.965721040189125e-05,
253
+ "loss": 0.2095,
254
+ "step": 350
255
+ },
256
+ {
257
+ "epoch": 0.6382978723404256,
258
+ "grad_norm": 3.0387415885925293,
259
+ "learning_rate": 3.936170212765958e-05,
260
+ "loss": 0.2339,
261
+ "step": 360
262
+ },
263
+ {
264
+ "epoch": 0.6560283687943262,
265
+ "grad_norm": 0.9237979054450989,
266
+ "learning_rate": 3.906619385342789e-05,
267
+ "loss": 0.2731,
268
+ "step": 370
269
+ },
270
+ {
271
+ "epoch": 0.6737588652482269,
272
+ "grad_norm": 1.2017602920532227,
273
+ "learning_rate": 3.877068557919622e-05,
274
+ "loss": 0.2323,
275
+ "step": 380
276
+ },
277
+ {
278
+ "epoch": 0.6914893617021277,
279
+ "grad_norm": 4.876067161560059,
280
+ "learning_rate": 3.847517730496454e-05,
281
+ "loss": 0.2312,
282
+ "step": 390
283
+ },
284
+ {
285
+ "epoch": 0.7092198581560284,
286
+ "grad_norm": 2.4507224559783936,
287
+ "learning_rate": 3.817966903073286e-05,
288
+ "loss": 0.2548,
289
+ "step": 400
290
+ },
291
+ {
292
+ "epoch": 0.7269503546099291,
293
+ "grad_norm": 3.6143534183502197,
294
+ "learning_rate": 3.788416075650119e-05,
295
+ "loss": 0.2564,
296
+ "step": 410
297
+ },
298
+ {
299
+ "epoch": 0.7446808510638298,
300
+ "grad_norm": 3.5805842876434326,
301
+ "learning_rate": 3.75886524822695e-05,
302
+ "loss": 0.2275,
303
+ "step": 420
304
+ },
305
+ {
306
+ "epoch": 0.7624113475177305,
307
+ "grad_norm": 3.5016636848449707,
308
+ "learning_rate": 3.729314420803783e-05,
309
+ "loss": 0.2724,
310
+ "step": 430
311
+ },
312
+ {
313
+ "epoch": 0.7801418439716312,
314
+ "grad_norm": 4.354279518127441,
315
+ "learning_rate": 3.699763593380615e-05,
316
+ "loss": 0.1945,
317
+ "step": 440
318
+ },
319
+ {
320
+ "epoch": 0.7978723404255319,
321
+ "grad_norm": 1.473893642425537,
322
+ "learning_rate": 3.670212765957447e-05,
323
+ "loss": 0.2098,
324
+ "step": 450
325
+ },
326
+ {
327
+ "epoch": 0.8156028368794326,
328
+ "grad_norm": 3.5749127864837646,
329
+ "learning_rate": 3.6406619385342796e-05,
330
+ "loss": 0.2062,
331
+ "step": 460
332
+ },
333
+ {
334
+ "epoch": 0.8333333333333334,
335
+ "grad_norm": 1.5818848609924316,
336
+ "learning_rate": 3.611111111111111e-05,
337
+ "loss": 0.226,
338
+ "step": 470
339
+ },
340
+ {
341
+ "epoch": 0.851063829787234,
342
+ "grad_norm": 0.9678372144699097,
343
+ "learning_rate": 3.5815602836879437e-05,
344
+ "loss": 0.2087,
345
+ "step": 480
346
+ },
347
+ {
348
+ "epoch": 0.8687943262411347,
349
+ "grad_norm": 3.465823173522949,
350
+ "learning_rate": 3.552009456264776e-05,
351
+ "loss": 0.2473,
352
+ "step": 490
353
+ },
354
+ {
355
+ "epoch": 0.8865248226950354,
356
+ "grad_norm": 2.4084010124206543,
357
+ "learning_rate": 3.522458628841608e-05,
358
+ "loss": 0.2732,
359
+ "step": 500
360
+ },
361
+ {
362
+ "epoch": 0.9042553191489362,
363
+ "grad_norm": 2.5898969173431396,
364
+ "learning_rate": 3.49290780141844e-05,
365
+ "loss": 0.1784,
366
+ "step": 510
367
+ },
368
+ {
369
+ "epoch": 0.9219858156028369,
370
+ "grad_norm": 2.0226316452026367,
371
+ "learning_rate": 3.463356973995272e-05,
372
+ "loss": 0.1835,
373
+ "step": 520
374
+ },
375
+ {
376
+ "epoch": 0.9397163120567376,
377
+ "grad_norm": 2.84747576713562,
378
+ "learning_rate": 3.4338061465721045e-05,
379
+ "loss": 0.2101,
380
+ "step": 530
381
+ },
382
+ {
383
+ "epoch": 0.9574468085106383,
384
+ "grad_norm": 1.7862353324890137,
385
+ "learning_rate": 3.4042553191489365e-05,
386
+ "loss": 0.1742,
387
+ "step": 540
388
+ },
389
+ {
390
+ "epoch": 0.975177304964539,
391
+ "grad_norm": 5.6382155418396,
392
+ "learning_rate": 3.3747044917257685e-05,
393
+ "loss": 0.1596,
394
+ "step": 550
395
+ },
396
+ {
397
+ "epoch": 0.9929078014184397,
398
+ "grad_norm": 2.4179556369781494,
399
+ "learning_rate": 3.3451536643026005e-05,
400
+ "loss": 0.2326,
401
+ "step": 560
402
+ },
403
+ {
404
+ "epoch": 1.0,
405
+ "eval_accuracy": 0.9245283018867925,
406
+ "eval_f1": 0.4970414201183432,
407
+ "eval_loss": 0.1933222860097885,
408
+ "eval_precision": 0.7924528301886793,
409
+ "eval_recall": 0.3620689655172414,
410
+ "eval_runtime": 18.2335,
411
+ "eval_samples_per_second": 247.073,
412
+ "eval_steps_per_second": 30.932,
413
+ "step": 564
414
+ },
415
+ {
416
+ "epoch": 1.0106382978723405,
417
+ "grad_norm": 1.8270294666290283,
418
+ "learning_rate": 3.3156028368794326e-05,
419
+ "loss": 0.1533,
420
+ "step": 570
421
+ },
422
+ {
423
+ "epoch": 1.0283687943262412,
424
+ "grad_norm": 2.191605567932129,
425
+ "learning_rate": 3.2860520094562646e-05,
426
+ "loss": 0.1668,
427
+ "step": 580
428
+ },
429
+ {
430
+ "epoch": 1.0460992907801419,
431
+ "grad_norm": 5.722381591796875,
432
+ "learning_rate": 3.256501182033097e-05,
433
+ "loss": 0.1999,
434
+ "step": 590
435
+ },
436
+ {
437
+ "epoch": 1.0638297872340425,
438
+ "grad_norm": 5.328127384185791,
439
+ "learning_rate": 3.226950354609929e-05,
440
+ "loss": 0.1799,
441
+ "step": 600
442
+ },
443
+ {
444
+ "epoch": 1.0815602836879432,
445
+ "grad_norm": 3.9988811016082764,
446
+ "learning_rate": 3.1973995271867614e-05,
447
+ "loss": 0.1787,
448
+ "step": 610
449
+ },
450
+ {
451
+ "epoch": 1.099290780141844,
452
+ "grad_norm": 2.0374648571014404,
453
+ "learning_rate": 3.1678486997635934e-05,
454
+ "loss": 0.1728,
455
+ "step": 620
456
+ },
457
+ {
458
+ "epoch": 1.1170212765957448,
459
+ "grad_norm": 1.21236252784729,
460
+ "learning_rate": 3.1382978723404254e-05,
461
+ "loss": 0.1884,
462
+ "step": 630
463
+ },
464
+ {
465
+ "epoch": 1.1347517730496455,
466
+ "grad_norm": 1.166555404663086,
467
+ "learning_rate": 3.108747044917258e-05,
468
+ "loss": 0.1946,
469
+ "step": 640
470
+ },
471
+ {
472
+ "epoch": 1.1524822695035462,
473
+ "grad_norm": 1.6630871295928955,
474
+ "learning_rate": 3.0791962174940895e-05,
475
+ "loss": 0.1791,
476
+ "step": 650
477
+ },
478
+ {
479
+ "epoch": 1.1702127659574468,
480
+ "grad_norm": 1.3428844213485718,
481
+ "learning_rate": 3.0496453900709222e-05,
482
+ "loss": 0.1607,
483
+ "step": 660
484
+ },
485
+ {
486
+ "epoch": 1.1879432624113475,
487
+ "grad_norm": 2.1707041263580322,
488
+ "learning_rate": 3.0200945626477545e-05,
489
+ "loss": 0.1888,
490
+ "step": 670
491
+ },
492
+ {
493
+ "epoch": 1.2056737588652482,
494
+ "grad_norm": 2.2633979320526123,
495
+ "learning_rate": 2.9905437352245862e-05,
496
+ "loss": 0.159,
497
+ "step": 680
498
+ },
499
+ {
500
+ "epoch": 1.2234042553191489,
501
+ "grad_norm": 2.3307528495788574,
502
+ "learning_rate": 2.9609929078014186e-05,
503
+ "loss": 0.145,
504
+ "step": 690
505
+ },
506
+ {
507
+ "epoch": 1.2411347517730495,
508
+ "grad_norm": 5.760026931762695,
509
+ "learning_rate": 2.9314420803782506e-05,
510
+ "loss": 0.1352,
511
+ "step": 700
512
+ },
513
+ {
514
+ "epoch": 1.2588652482269502,
515
+ "grad_norm": 2.038200855255127,
516
+ "learning_rate": 2.901891252955083e-05,
517
+ "loss": 0.1637,
518
+ "step": 710
519
+ },
520
+ {
521
+ "epoch": 1.2765957446808511,
522
+ "grad_norm": 3.5735347270965576,
523
+ "learning_rate": 2.8723404255319154e-05,
524
+ "loss": 0.125,
525
+ "step": 720
526
+ },
527
+ {
528
+ "epoch": 1.2943262411347518,
529
+ "grad_norm": 1.2548505067825317,
530
+ "learning_rate": 2.842789598108747e-05,
531
+ "loss": 0.1571,
532
+ "step": 730
533
+ },
534
+ {
535
+ "epoch": 1.3120567375886525,
536
+ "grad_norm": 3.3048200607299805,
537
+ "learning_rate": 2.8132387706855794e-05,
538
+ "loss": 0.1298,
539
+ "step": 740
540
+ },
541
+ {
542
+ "epoch": 1.3297872340425532,
543
+ "grad_norm": 1.5077462196350098,
544
+ "learning_rate": 2.7836879432624114e-05,
545
+ "loss": 0.1593,
546
+ "step": 750
547
+ },
548
+ {
549
+ "epoch": 1.3475177304964538,
550
+ "grad_norm": 2.6117773056030273,
551
+ "learning_rate": 2.7541371158392438e-05,
552
+ "loss": 0.2071,
553
+ "step": 760
554
+ },
555
+ {
556
+ "epoch": 1.3652482269503547,
557
+ "grad_norm": 3.2539641857147217,
558
+ "learning_rate": 2.7245862884160755e-05,
559
+ "loss": 0.1661,
560
+ "step": 770
561
+ },
562
+ {
563
+ "epoch": 1.3829787234042552,
564
+ "grad_norm": 2.994198799133301,
565
+ "learning_rate": 2.695035460992908e-05,
566
+ "loss": 0.1904,
567
+ "step": 780
568
+ },
569
+ {
570
+ "epoch": 1.400709219858156,
571
+ "grad_norm": 2.030109405517578,
572
+ "learning_rate": 2.6654846335697402e-05,
573
+ "loss": 0.1583,
574
+ "step": 790
575
+ },
576
+ {
577
+ "epoch": 1.4184397163120568,
578
+ "grad_norm": 1.1124039888381958,
579
+ "learning_rate": 2.6359338061465723e-05,
580
+ "loss": 0.2241,
581
+ "step": 800
582
+ },
583
+ {
584
+ "epoch": 1.4361702127659575,
585
+ "grad_norm": 1.3373945951461792,
586
+ "learning_rate": 2.6063829787234046e-05,
587
+ "loss": 0.1659,
588
+ "step": 810
589
+ },
590
+ {
591
+ "epoch": 1.4539007092198581,
592
+ "grad_norm": 4.298996448516846,
593
+ "learning_rate": 2.5768321513002363e-05,
594
+ "loss": 0.1597,
595
+ "step": 820
596
+ },
597
+ {
598
+ "epoch": 1.4716312056737588,
599
+ "grad_norm": 2.2879459857940674,
600
+ "learning_rate": 2.5472813238770687e-05,
601
+ "loss": 0.1806,
602
+ "step": 830
603
+ },
604
+ {
605
+ "epoch": 1.4893617021276595,
606
+ "grad_norm": 5.475185394287109,
607
+ "learning_rate": 2.5177304964539007e-05,
608
+ "loss": 0.2264,
609
+ "step": 840
610
+ },
611
+ {
612
+ "epoch": 1.5070921985815602,
613
+ "grad_norm": 3.100987672805786,
614
+ "learning_rate": 2.488179669030733e-05,
615
+ "loss": 0.1517,
616
+ "step": 850
617
+ },
618
+ {
619
+ "epoch": 1.524822695035461,
620
+ "grad_norm": 1.2688792943954468,
621
+ "learning_rate": 2.458628841607565e-05,
622
+ "loss": 0.1257,
623
+ "step": 860
624
+ },
625
+ {
626
+ "epoch": 1.5425531914893615,
627
+ "grad_norm": 2.6867027282714844,
628
+ "learning_rate": 2.429078014184397e-05,
629
+ "loss": 0.1489,
630
+ "step": 870
631
+ },
632
+ {
633
+ "epoch": 1.5602836879432624,
634
+ "grad_norm": 2.5701425075531006,
635
+ "learning_rate": 2.3995271867612295e-05,
636
+ "loss": 0.2134,
637
+ "step": 880
638
+ },
639
+ {
640
+ "epoch": 1.5780141843971631,
641
+ "grad_norm": 5.73723840713501,
642
+ "learning_rate": 2.3699763593380615e-05,
643
+ "loss": 0.191,
644
+ "step": 890
645
+ },
646
+ {
647
+ "epoch": 1.5957446808510638,
648
+ "grad_norm": 3.5702483654022217,
649
+ "learning_rate": 2.340425531914894e-05,
650
+ "loss": 0.17,
651
+ "step": 900
652
+ },
653
+ {
654
+ "epoch": 1.6134751773049647,
655
+ "grad_norm": 4.70973014831543,
656
+ "learning_rate": 2.310874704491726e-05,
657
+ "loss": 0.1357,
658
+ "step": 910
659
+ },
660
+ {
661
+ "epoch": 1.6312056737588652,
662
+ "grad_norm": 1.4206900596618652,
663
+ "learning_rate": 2.281323877068558e-05,
664
+ "loss": 0.1264,
665
+ "step": 920
666
+ },
667
+ {
668
+ "epoch": 1.648936170212766,
669
+ "grad_norm": 3.2121095657348633,
670
+ "learning_rate": 2.25177304964539e-05,
671
+ "loss": 0.1409,
672
+ "step": 930
673
+ },
674
+ {
675
+ "epoch": 1.6666666666666665,
676
+ "grad_norm": 5.003275394439697,
677
+ "learning_rate": 2.2222222222222223e-05,
678
+ "loss": 0.1993,
679
+ "step": 940
680
+ },
681
+ {
682
+ "epoch": 1.6843971631205674,
683
+ "grad_norm": 8.793657302856445,
684
+ "learning_rate": 2.1926713947990547e-05,
685
+ "loss": 0.097,
686
+ "step": 950
687
+ },
688
+ {
689
+ "epoch": 1.702127659574468,
690
+ "grad_norm": 2.719095230102539,
691
+ "learning_rate": 2.1631205673758867e-05,
692
+ "loss": 0.1739,
693
+ "step": 960
694
+ },
695
+ {
696
+ "epoch": 1.7198581560283688,
697
+ "grad_norm": 4.192145347595215,
698
+ "learning_rate": 2.1335697399527187e-05,
699
+ "loss": 0.1819,
700
+ "step": 970
701
+ },
702
+ {
703
+ "epoch": 1.7375886524822695,
704
+ "grad_norm": 3.239504098892212,
705
+ "learning_rate": 2.1040189125295508e-05,
706
+ "loss": 0.1775,
707
+ "step": 980
708
+ },
709
+ {
710
+ "epoch": 1.7553191489361701,
711
+ "grad_norm": 1.6205350160598755,
712
+ "learning_rate": 2.074468085106383e-05,
713
+ "loss": 0.1455,
714
+ "step": 990
715
+ },
716
+ {
717
+ "epoch": 1.773049645390071,
718
+ "grad_norm": 2.7471115589141846,
719
+ "learning_rate": 2.0449172576832152e-05,
720
+ "loss": 0.2198,
721
+ "step": 1000
722
+ },
723
+ {
724
+ "epoch": 1.7907801418439715,
725
+ "grad_norm": 2.1342594623565674,
726
+ "learning_rate": 2.0153664302600475e-05,
727
+ "loss": 0.1577,
728
+ "step": 1010
729
+ },
730
+ {
731
+ "epoch": 1.8085106382978724,
732
+ "grad_norm": 2.6111040115356445,
733
+ "learning_rate": 1.9858156028368796e-05,
734
+ "loss": 0.1907,
735
+ "step": 1020
736
+ },
737
+ {
738
+ "epoch": 1.826241134751773,
739
+ "grad_norm": 1.9335544109344482,
740
+ "learning_rate": 1.9562647754137116e-05,
741
+ "loss": 0.1812,
742
+ "step": 1030
743
+ },
744
+ {
745
+ "epoch": 1.8439716312056738,
746
+ "grad_norm": 3.2039554119110107,
747
+ "learning_rate": 1.926713947990544e-05,
748
+ "loss": 0.1604,
749
+ "step": 1040
750
+ },
751
+ {
752
+ "epoch": 1.8617021276595744,
753
+ "grad_norm": 4.588708400726318,
754
+ "learning_rate": 1.897163120567376e-05,
755
+ "loss": 0.1576,
756
+ "step": 1050
757
+ },
758
+ {
759
+ "epoch": 1.8794326241134751,
760
+ "grad_norm": 2.468317985534668,
761
+ "learning_rate": 1.867612293144208e-05,
762
+ "loss": 0.2102,
763
+ "step": 1060
764
+ },
765
+ {
766
+ "epoch": 1.897163120567376,
767
+ "grad_norm": 2.883124828338623,
768
+ "learning_rate": 1.83806146572104e-05,
769
+ "loss": 0.1883,
770
+ "step": 1070
771
+ },
772
+ {
773
+ "epoch": 1.9148936170212765,
774
+ "grad_norm": 1.7469326257705688,
775
+ "learning_rate": 1.8085106382978724e-05,
776
+ "loss": 0.1598,
777
+ "step": 1080
778
+ },
779
+ {
780
+ "epoch": 1.9326241134751774,
781
+ "grad_norm": 3.9554150104522705,
782
+ "learning_rate": 1.7789598108747048e-05,
783
+ "loss": 0.171,
784
+ "step": 1090
785
+ },
786
+ {
787
+ "epoch": 1.950354609929078,
788
+ "grad_norm": 7.007796287536621,
789
+ "learning_rate": 1.7494089834515368e-05,
790
+ "loss": 0.1908,
791
+ "step": 1100
792
+ },
793
+ {
794
+ "epoch": 1.9680851063829787,
795
+ "grad_norm": 2.2939341068267822,
796
+ "learning_rate": 1.7198581560283688e-05,
797
+ "loss": 0.2282,
798
+ "step": 1110
799
+ },
800
+ {
801
+ "epoch": 1.9858156028368794,
802
+ "grad_norm": 1.1014021635055542,
803
+ "learning_rate": 1.690307328605201e-05,
804
+ "loss": 0.1599,
805
+ "step": 1120
806
+ },
807
+ {
808
+ "epoch": 2.0,
809
+ "eval_accuracy": 0.93007769145394,
810
+ "eval_f1": 0.5827814569536424,
811
+ "eval_loss": 0.18422859907150269,
812
+ "eval_precision": 0.7560137457044673,
813
+ "eval_recall": 0.47413793103448276,
814
+ "eval_runtime": 18.3061,
815
+ "eval_samples_per_second": 246.093,
816
+ "eval_steps_per_second": 30.809,
817
+ "step": 1128
818
+ },
819
+ {
820
+ "epoch": 2.00354609929078,
821
+ "grad_norm": 1.147599458694458,
822
+ "learning_rate": 1.6607565011820332e-05,
823
+ "loss": 0.1652,
824
+ "step": 1130
825
+ },
826
+ {
827
+ "epoch": 2.021276595744681,
828
+ "grad_norm": 1.893280029296875,
829
+ "learning_rate": 1.6312056737588656e-05,
830
+ "loss": 0.1345,
831
+ "step": 1140
832
+ },
833
+ {
834
+ "epoch": 2.0390070921985815,
835
+ "grad_norm": 3.6475846767425537,
836
+ "learning_rate": 1.6016548463356976e-05,
837
+ "loss": 0.1214,
838
+ "step": 1150
839
+ },
840
+ {
841
+ "epoch": 2.0567375886524824,
842
+ "grad_norm": 6.175610542297363,
843
+ "learning_rate": 1.5721040189125296e-05,
844
+ "loss": 0.131,
845
+ "step": 1160
846
+ },
847
+ {
848
+ "epoch": 2.074468085106383,
849
+ "grad_norm": 2.870246171951294,
850
+ "learning_rate": 1.5425531914893617e-05,
851
+ "loss": 0.1379,
852
+ "step": 1170
853
+ },
854
+ {
855
+ "epoch": 2.0921985815602837,
856
+ "grad_norm": 1.1721495389938354,
857
+ "learning_rate": 1.5130023640661939e-05,
858
+ "loss": 0.1239,
859
+ "step": 1180
860
+ },
861
+ {
862
+ "epoch": 2.1099290780141846,
863
+ "grad_norm": 5.15436315536499,
864
+ "learning_rate": 1.483451536643026e-05,
865
+ "loss": 0.097,
866
+ "step": 1190
867
+ },
868
+ {
869
+ "epoch": 2.127659574468085,
870
+ "grad_norm": 0.1995478719472885,
871
+ "learning_rate": 1.4539007092198581e-05,
872
+ "loss": 0.1416,
873
+ "step": 1200
874
+ },
875
+ {
876
+ "epoch": 2.145390070921986,
877
+ "grad_norm": 4.829070091247559,
878
+ "learning_rate": 1.4243498817966905e-05,
879
+ "loss": 0.1075,
880
+ "step": 1210
881
+ },
882
+ {
883
+ "epoch": 2.1631205673758864,
884
+ "grad_norm": 4.2572479248046875,
885
+ "learning_rate": 1.3947990543735227e-05,
886
+ "loss": 0.1233,
887
+ "step": 1220
888
+ },
889
+ {
890
+ "epoch": 2.1808510638297873,
891
+ "grad_norm": 3.297318458557129,
892
+ "learning_rate": 1.3652482269503547e-05,
893
+ "loss": 0.1171,
894
+ "step": 1230
895
+ },
896
+ {
897
+ "epoch": 2.198581560283688,
898
+ "grad_norm": 2.2237844467163086,
899
+ "learning_rate": 1.3356973995271869e-05,
900
+ "loss": 0.0868,
901
+ "step": 1240
902
+ },
903
+ {
904
+ "epoch": 2.2163120567375887,
905
+ "grad_norm": 4.605975151062012,
906
+ "learning_rate": 1.3061465721040189e-05,
907
+ "loss": 0.1458,
908
+ "step": 1250
909
+ },
910
+ {
911
+ "epoch": 2.2340425531914896,
912
+ "grad_norm": 6.8933610916137695,
913
+ "learning_rate": 1.2765957446808511e-05,
914
+ "loss": 0.1242,
915
+ "step": 1260
916
+ },
917
+ {
918
+ "epoch": 2.25177304964539,
919
+ "grad_norm": 2.5259997844696045,
920
+ "learning_rate": 1.2470449172576833e-05,
921
+ "loss": 0.1131,
922
+ "step": 1270
923
+ },
924
+ {
925
+ "epoch": 2.269503546099291,
926
+ "grad_norm": 5.3347296714782715,
927
+ "learning_rate": 1.2174940898345153e-05,
928
+ "loss": 0.1334,
929
+ "step": 1280
930
+ },
931
+ {
932
+ "epoch": 2.2872340425531914,
933
+ "grad_norm": 3.3900346755981445,
934
+ "learning_rate": 1.1879432624113477e-05,
935
+ "loss": 0.1162,
936
+ "step": 1290
937
+ },
938
+ {
939
+ "epoch": 2.3049645390070923,
940
+ "grad_norm": 3.4547274112701416,
941
+ "learning_rate": 1.1583924349881797e-05,
942
+ "loss": 0.1586,
943
+ "step": 1300
944
+ },
945
+ {
946
+ "epoch": 2.3226950354609928,
947
+ "grad_norm": 3.0310921669006348,
948
+ "learning_rate": 1.1288416075650119e-05,
949
+ "loss": 0.1088,
950
+ "step": 1310
951
+ },
952
+ {
953
+ "epoch": 2.3404255319148937,
954
+ "grad_norm": 4.557296276092529,
955
+ "learning_rate": 1.0992907801418441e-05,
956
+ "loss": 0.0988,
957
+ "step": 1320
958
+ },
959
+ {
960
+ "epoch": 2.3581560283687946,
961
+ "grad_norm": 18.163665771484375,
962
+ "learning_rate": 1.0697399527186761e-05,
963
+ "loss": 0.1055,
964
+ "step": 1330
965
+ },
966
+ {
967
+ "epoch": 2.375886524822695,
968
+ "grad_norm": 3.0445291996002197,
969
+ "learning_rate": 1.0401891252955083e-05,
970
+ "loss": 0.1547,
971
+ "step": 1340
972
+ },
973
+ {
974
+ "epoch": 2.393617021276596,
975
+ "grad_norm": 2.7467446327209473,
976
+ "learning_rate": 1.0106382978723404e-05,
977
+ "loss": 0.1248,
978
+ "step": 1350
979
+ },
980
+ {
981
+ "epoch": 2.4113475177304964,
982
+ "grad_norm": 0.8503823280334473,
983
+ "learning_rate": 9.810874704491727e-06,
984
+ "loss": 0.0915,
985
+ "step": 1360
986
+ },
987
+ {
988
+ "epoch": 2.4290780141843973,
989
+ "grad_norm": 1.9473341703414917,
990
+ "learning_rate": 9.515366430260048e-06,
991
+ "loss": 0.1267,
992
+ "step": 1370
993
+ },
994
+ {
995
+ "epoch": 2.4468085106382977,
996
+ "grad_norm": 2.457197427749634,
997
+ "learning_rate": 9.219858156028368e-06,
998
+ "loss": 0.1228,
999
+ "step": 1380
1000
+ },
1001
+ {
1002
+ "epoch": 2.4645390070921986,
1003
+ "grad_norm": 2.5377461910247803,
1004
+ "learning_rate": 8.924349881796691e-06,
1005
+ "loss": 0.085,
1006
+ "step": 1390
1007
+ },
1008
+ {
1009
+ "epoch": 2.482269503546099,
1010
+ "grad_norm": 4.247258186340332,
1011
+ "learning_rate": 8.628841607565012e-06,
1012
+ "loss": 0.1345,
1013
+ "step": 1400
1014
+ },
1015
+ {
1016
+ "epoch": 2.5,
1017
+ "grad_norm": 4.234439373016357,
1018
+ "learning_rate": 8.333333333333334e-06,
1019
+ "loss": 0.1396,
1020
+ "step": 1410
1021
+ },
1022
+ {
1023
+ "epoch": 2.5177304964539005,
1024
+ "grad_norm": 2.6999399662017822,
1025
+ "learning_rate": 8.037825059101656e-06,
1026
+ "loss": 0.0934,
1027
+ "step": 1420
1028
+ },
1029
+ {
1030
+ "epoch": 2.5354609929078014,
1031
+ "grad_norm": 3.0950117111206055,
1032
+ "learning_rate": 7.742316784869976e-06,
1033
+ "loss": 0.088,
1034
+ "step": 1430
1035
+ },
1036
+ {
1037
+ "epoch": 2.5531914893617023,
1038
+ "grad_norm": 6.523775577545166,
1039
+ "learning_rate": 7.446808510638298e-06,
1040
+ "loss": 0.1234,
1041
+ "step": 1440
1042
+ },
1043
+ {
1044
+ "epoch": 2.5709219858156027,
1045
+ "grad_norm": 0.6427033543586731,
1046
+ "learning_rate": 7.151300236406621e-06,
1047
+ "loss": 0.0849,
1048
+ "step": 1450
1049
+ },
1050
+ {
1051
+ "epoch": 2.5886524822695036,
1052
+ "grad_norm": 4.271224498748779,
1053
+ "learning_rate": 6.855791962174941e-06,
1054
+ "loss": 0.158,
1055
+ "step": 1460
1056
+ },
1057
+ {
1058
+ "epoch": 2.6063829787234045,
1059
+ "grad_norm": 1.9938910007476807,
1060
+ "learning_rate": 6.560283687943262e-06,
1061
+ "loss": 0.1214,
1062
+ "step": 1470
1063
+ },
1064
+ {
1065
+ "epoch": 2.624113475177305,
1066
+ "grad_norm": 1.0301127433776855,
1067
+ "learning_rate": 6.264775413711583e-06,
1068
+ "loss": 0.1217,
1069
+ "step": 1480
1070
+ },
1071
+ {
1072
+ "epoch": 2.6418439716312054,
1073
+ "grad_norm": 1.3055609464645386,
1074
+ "learning_rate": 5.969267139479906e-06,
1075
+ "loss": 0.0761,
1076
+ "step": 1490
1077
+ },
1078
+ {
1079
+ "epoch": 2.6595744680851063,
1080
+ "grad_norm": 4.245100021362305,
1081
+ "learning_rate": 5.673758865248227e-06,
1082
+ "loss": 0.0884,
1083
+ "step": 1500
1084
+ },
1085
+ {
1086
+ "epoch": 2.6773049645390072,
1087
+ "grad_norm": 0.3163594901561737,
1088
+ "learning_rate": 5.378250591016549e-06,
1089
+ "loss": 0.081,
1090
+ "step": 1510
1091
+ },
1092
+ {
1093
+ "epoch": 2.6950354609929077,
1094
+ "grad_norm": 2.392202615737915,
1095
+ "learning_rate": 5.08274231678487e-06,
1096
+ "loss": 0.1288,
1097
+ "step": 1520
1098
+ },
1099
+ {
1100
+ "epoch": 2.7127659574468086,
1101
+ "grad_norm": 4.6757493019104,
1102
+ "learning_rate": 4.787234042553191e-06,
1103
+ "loss": 0.1134,
1104
+ "step": 1530
1105
+ },
1106
+ {
1107
+ "epoch": 2.7304964539007095,
1108
+ "grad_norm": 14.765192985534668,
1109
+ "learning_rate": 4.491725768321513e-06,
1110
+ "loss": 0.1706,
1111
+ "step": 1540
1112
+ },
1113
+ {
1114
+ "epoch": 2.74822695035461,
1115
+ "grad_norm": 5.195065021514893,
1116
+ "learning_rate": 4.1962174940898345e-06,
1117
+ "loss": 0.1669,
1118
+ "step": 1550
1119
+ },
1120
+ {
1121
+ "epoch": 2.7659574468085104,
1122
+ "grad_norm": 2.5050220489501953,
1123
+ "learning_rate": 3.9007092198581565e-06,
1124
+ "loss": 0.0757,
1125
+ "step": 1560
1126
+ },
1127
+ {
1128
+ "epoch": 2.7836879432624113,
1129
+ "grad_norm": 2.086787223815918,
1130
+ "learning_rate": 3.605200945626478e-06,
1131
+ "loss": 0.1797,
1132
+ "step": 1570
1133
+ },
1134
+ {
1135
+ "epoch": 2.801418439716312,
1136
+ "grad_norm": 5.924837589263916,
1137
+ "learning_rate": 3.309692671394799e-06,
1138
+ "loss": 0.1602,
1139
+ "step": 1580
1140
+ },
1141
+ {
1142
+ "epoch": 2.8191489361702127,
1143
+ "grad_norm": 4.369917392730713,
1144
+ "learning_rate": 3.0141843971631207e-06,
1145
+ "loss": 0.08,
1146
+ "step": 1590
1147
+ },
1148
+ {
1149
+ "epoch": 2.8368794326241136,
1150
+ "grad_norm": 1.1417466402053833,
1151
+ "learning_rate": 2.7186761229314422e-06,
1152
+ "loss": 0.1123,
1153
+ "step": 1600
1154
+ },
1155
+ {
1156
+ "epoch": 2.854609929078014,
1157
+ "grad_norm": 0.7693955302238464,
1158
+ "learning_rate": 2.4231678486997638e-06,
1159
+ "loss": 0.1303,
1160
+ "step": 1610
1161
+ },
1162
+ {
1163
+ "epoch": 2.872340425531915,
1164
+ "grad_norm": 3.25712251663208,
1165
+ "learning_rate": 2.1276595744680853e-06,
1166
+ "loss": 0.0748,
1167
+ "step": 1620
1168
+ },
1169
+ {
1170
+ "epoch": 2.8900709219858154,
1171
+ "grad_norm": 3.0831634998321533,
1172
+ "learning_rate": 1.8321513002364066e-06,
1173
+ "loss": 0.0703,
1174
+ "step": 1630
1175
+ },
1176
+ {
1177
+ "epoch": 2.9078014184397163,
1178
+ "grad_norm": 2.351789712905884,
1179
+ "learning_rate": 1.5366430260047282e-06,
1180
+ "loss": 0.0501,
1181
+ "step": 1640
1182
+ },
1183
+ {
1184
+ "epoch": 2.925531914893617,
1185
+ "grad_norm": 2.154916286468506,
1186
+ "learning_rate": 1.2411347517730497e-06,
1187
+ "loss": 0.0907,
1188
+ "step": 1650
1189
+ },
1190
+ {
1191
+ "epoch": 2.9432624113475176,
1192
+ "grad_norm": 6.570488452911377,
1193
+ "learning_rate": 9.456264775413712e-07,
1194
+ "loss": 0.1337,
1195
+ "step": 1660
1196
+ },
1197
+ {
1198
+ "epoch": 2.9609929078014185,
1199
+ "grad_norm": 4.879380702972412,
1200
+ "learning_rate": 6.501182033096927e-07,
1201
+ "loss": 0.1201,
1202
+ "step": 1670
1203
+ },
1204
+ {
1205
+ "epoch": 2.978723404255319,
1206
+ "grad_norm": 4.488287448883057,
1207
+ "learning_rate": 3.546099290780142e-07,
1208
+ "loss": 0.1064,
1209
+ "step": 1680
1210
+ },
1211
+ {
1212
+ "epoch": 2.99645390070922,
1213
+ "grad_norm": 3.126821279525757,
1214
+ "learning_rate": 5.91016548463357e-08,
1215
+ "loss": 0.1375,
1216
+ "step": 1690
1217
+ }
1218
+ ],
1219
+ "logging_steps": 10,
1220
+ "max_steps": 1692,
1221
+ "num_input_tokens_seen": 0,
1222
+ "num_train_epochs": 3,
1223
+ "save_steps": 500,
1224
+ "stateful_callbacks": {
1225
+ "TrainerControl": {
1226
+ "args": {
1227
+ "should_epoch_stop": false,
1228
+ "should_evaluate": false,
1229
+ "should_log": false,
1230
+ "should_save": true,
1231
+ "should_training_stop": true
1232
+ },
1233
+ "attributes": {}
1234
+ }
1235
+ },
1236
+ "total_flos": 7160790169147392.0,
1237
+ "train_batch_size": 32,
1238
+ "trial_name": null,
1239
+ "trial_params": null
1240
+ }
checkpoint-1692/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:589953756b2200bee9b4dd81fcad8270e00bca90c5da9b4f312663d0f0c1fc8c
3
+ size 5368
checkpoint-1692/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-uncased",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForSequenceClassification"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "problem_type": "single_label_classification",
18
+ "qa_dropout": 0.1,
19
+ "seq_classif_dropout": 0.2,
20
+ "sinusoidal_pos_embds": false,
21
+ "tie_weights_": true,
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.48.0",
24
+ "vocab_size": 30522
25
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b232355fadb9bbfb707eed2b6980060355c8d8c7e4b5dc79d821f4dd37af8d23
3
+ size 267832560
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:589953756b2200bee9b4dd81fcad8270e00bca90c5da9b4f312663d0f0c1fc8c
3
+ size 5368
vocab.txt ADDED
The diff for this file is too large to render. See raw diff