nozomuteruyo14 commited on
Commit
b18f21f
·
verified ·
1 Parent(s): 5b16bbc

Upload experiment_log_GLUE.txt

Browse files
Files changed (1) hide show
  1. examples/experiment_log_GLUE.txt +972 -0
examples/experiment_log_GLUE.txt ADDED
@@ -0,0 +1,972 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ==============================
3
+ Task: mnli | Model: bert-base-uncased | Method: lora
4
+ ==============================
5
+
6
+ Injected standard LoRA adapters via PEFT.
7
+ Trainable params: 1194243 / 110678790 (1.08%)
8
+ Starting training...
9
+ {'loss': 0.7543, 'grad_norm': 7.192684173583984, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
10
+ {'eval_loss': 0.607556939125061, 'eval_accuracy': 0.7472236372898624, 'eval_runtime': 13.7881, 'eval_samples_per_second': 711.847, 'eval_steps_per_second': 22.266, 'epoch': 1.0}
11
+ {'loss': 0.6227, 'grad_norm': 4.976467132568359, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
12
+ {'eval_loss': 0.5613736510276794, 'eval_accuracy': 0.7701477330616403, 'eval_runtime': 13.8073, 'eval_samples_per_second': 710.857, 'eval_steps_per_second': 22.235, 'epoch': 2.0}
13
+ {'loss': 0.5874, 'grad_norm': 6.665809154510498, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
14
+ {'eval_loss': 0.5499060153961182, 'eval_accuracy': 0.7754457463066735, 'eval_runtime': 13.7671, 'eval_samples_per_second': 712.934, 'eval_steps_per_second': 22.3, 'epoch': 3.0}
15
+ {'train_runtime': 3924.5014, 'train_samples_per_second': 300.193, 'train_steps_per_second': 9.381, 'train_loss': 0.6405815247192117, 'epoch': 3.0}
16
+ Training completed in 3924.87 seconds.
17
+ {'eval_loss': 0.5499060153961182, 'eval_accuracy': 0.7754457463066735, 'eval_runtime': 13.7518, 'eval_samples_per_second': 713.725, 'eval_steps_per_second': 22.324, 'epoch': 3.0}
18
+ {'eval_loss': 0.5282740592956543, 'eval_accuracy': 0.78966639544345, 'eval_runtime': 14.1188, 'eval_samples_per_second': 696.375, 'eval_steps_per_second': 21.815, 'epoch': 3.0}
19
+
20
+ === FINAL RESULTS for mnli | bert-base-uncased | lora ===
21
+ Metric: 0.7754/0.7897
22
+ Training Time: 3924.87 seconds
23
+
24
+
25
+ ==============================
26
+ Task: sst2 | Model: bert-base-uncased | Method: lora
27
+ ==============================
28
+
29
+ Injected standard LoRA adapters via PEFT.
30
+ Trainable params: 1193474 / 110677252 (1.08%)
31
+ Starting training...
32
+ {'eval_loss': 0.258949875831604, 'eval_accuracy': 0.8956422018348624, 'eval_runtime': 0.7467, 'eval_samples_per_second': 1167.75, 'eval_steps_per_second': 37.497, 'epoch': 1.0}
33
+ {'eval_loss': 0.25339475274086, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7453, 'eval_samples_per_second': 1169.944, 'eval_steps_per_second': 37.567, 'epoch': 2.0}
34
+ {'eval_loss': 0.24459940195083618, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7415, 'eval_samples_per_second': 1176.06, 'eval_steps_per_second': 37.763, 'epoch': 3.0}
35
+ {'train_runtime': 390.9411, 'train_samples_per_second': 516.822, 'train_steps_per_second': 16.153, 'train_loss': 0.2664547108509006, 'epoch': 3.0}
36
+ Training completed in 391.31 seconds.
37
+ {'eval_loss': 0.24459940195083618, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7466, 'eval_samples_per_second': 1168.019, 'eval_steps_per_second': 37.505, 'epoch': 3.0}
38
+
39
+ === FINAL RESULTS for sst2 | bert-base-uncased | lora ===
40
+ Metric: 0.9014
41
+ Training Time: 391.31 seconds
42
+
43
+
44
+ ==============================
45
+ Task: cola | Model: bert-base-uncased | Method: lora
46
+ ==============================
47
+
48
+ Injected standard LoRA adapters via PEFT.
49
+ Trainable params: 1193474 / 110677252 (1.08%)
50
+ Starting training...
51
+ {'eval_loss': 0.5983226895332336, 'eval_matthews_correlation': 0.018148342420931135, 'eval_runtime': 0.5265, 'eval_samples_per_second': 1980.947, 'eval_steps_per_second': 62.676, 'epoch': 1.0}
52
+ {'eval_loss': 0.5542184114456177, 'eval_matthews_correlation': 0.17454042413408488, 'eval_runtime': 0.5276, 'eval_samples_per_second': 1976.709, 'eval_steps_per_second': 62.542, 'epoch': 2.0}
53
+ {'eval_loss': 0.5396776795387268, 'eval_matthews_correlation': 0.27356428891843526, 'eval_runtime': 0.5345, 'eval_samples_per_second': 1951.52, 'eval_steps_per_second': 61.745, 'epoch': 3.0}
54
+ {'train_runtime': 47.7869, 'train_samples_per_second': 536.821, 'train_steps_per_second': 16.825, 'train_loss': 0.5679058245758513, 'epoch': 3.0}
55
+ Training completed in 48.14 seconds.
56
+ {'eval_loss': 0.5396776795387268, 'eval_matthews_correlation': 0.27356428891843526, 'eval_runtime': 0.5305, 'eval_samples_per_second': 1966.035, 'eval_steps_per_second': 62.204, 'epoch': 3.0}
57
+
58
+ === FINAL RESULTS for cola | bert-base-uncased | lora ===
59
+ Metric: 0.2736
60
+ Training Time: 48.14 seconds
61
+
62
+
63
+ ==============================
64
+ Task: qqp | Model: bert-base-uncased | Method: lora
65
+ ==============================
66
+
67
+ Injected standard LoRA adapters via PEFT.
68
+ Trainable params: 1193474 / 110677252 (1.08%)
69
+ Starting training...
70
+ {'loss': 0.4109, 'grad_norm': 3.7838921546936035, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
71
+ {'eval_accuracy': 0.8416027702201335, 'eval_f1': 0.8019054689433308, 'eval_loss': 0.34393781423568726, 'eval_runtime': 52.8326, 'eval_samples_per_second': 765.247, 'eval_steps_per_second': 23.925, 'epoch': 1.0}
72
+ {'loss': 0.3442, 'grad_norm': 6.105409622192383, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
73
+ {'eval_accuracy': 0.8546623794212219, 'eval_f1': 0.8169014084507042, 'eval_loss': 0.32436296343803406, 'eval_runtime': 52.7316, 'eval_samples_per_second': 766.713, 'eval_steps_per_second': 23.97, 'epoch': 2.0}
74
+ {'loss': 0.3261, 'grad_norm': 5.963987350463867, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
75
+ {'eval_accuracy': 0.8592134553549344, 'eval_f1': 0.8201238781443559, 'eval_loss': 0.3147530257701874, 'eval_runtime': 52.7125, 'eval_samples_per_second': 766.991, 'eval_steps_per_second': 23.979, 'epoch': 3.0}
76
+ {'train_runtime': 3482.13, 'train_samples_per_second': 313.468, 'train_steps_per_second': 9.797, 'train_loss': 0.3554784081295529, 'epoch': 3.0}
77
+ Training completed in 3482.43 seconds.
78
+ {'eval_accuracy': 0.8592134553549344, 'eval_f1': 0.8201238781443559, 'eval_loss': 0.3147530257701874, 'eval_runtime': 52.7041, 'eval_samples_per_second': 767.113, 'eval_steps_per_second': 23.983, 'epoch': 3.0}
79
+
80
+ === FINAL RESULTS for qqp | bert-base-uncased | lora ===
81
+ Metric: 0.8592/0.8201
82
+ Training Time: 3482.43 seconds
83
+
84
+
85
+ ==============================
86
+ Task: qnli | Model: bert-base-uncased | Method: lora
87
+ ==============================
88
+
89
+ Injected standard LoRA adapters via PEFT.
90
+ Trainable params: 1193474 / 110677252 (1.08%)
91
+ Starting training...
92
+ {'eval_loss': 0.3872588574886322, 'eval_accuracy': 0.8266520226981512, 'eval_runtime': 10.2202, 'eval_samples_per_second': 534.531, 'eval_steps_per_second': 16.732, 'epoch': 1.0}
93
+ {'eval_loss': 0.35759007930755615, 'eval_accuracy': 0.8359875526267618, 'eval_runtime': 10.4448, 'eval_samples_per_second': 523.038, 'eval_steps_per_second': 16.372, 'epoch': 2.0}
94
+ {'eval_loss': 0.3487180471420288, 'eval_accuracy': 0.842394288852279, 'eval_runtime': 10.396, 'eval_samples_per_second': 525.488, 'eval_steps_per_second': 16.449, 'epoch': 3.0}
95
+ {'train_runtime': 1341.9095, 'train_samples_per_second': 234.166, 'train_steps_per_second': 7.319, 'train_loss': 0.4522236532942629, 'epoch': 3.0}
96
+ Training completed in 1342.28 seconds.
97
+ {'eval_loss': 0.3487180471420288, 'eval_accuracy': 0.842394288852279, 'eval_runtime': 10.3933, 'eval_samples_per_second': 525.626, 'eval_steps_per_second': 16.453, 'epoch': 3.0}
98
+
99
+ === FINAL RESULTS for qnli | bert-base-uncased | lora ===
100
+ Metric: 0.8424
101
+ Training Time: 1342.28 seconds
102
+
103
+
104
+ ==============================
105
+ Task: rte | Model: bert-base-uncased | Method: lora
106
+ ==============================
107
+
108
+ Injected standard LoRA adapters via PEFT.
109
+ Trainable params: 1193474 / 110677252 (1.08%)
110
+ Starting training...
111
+ {'eval_loss': 0.6956374049186707, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.873, 'eval_samples_per_second': 317.305, 'eval_steps_per_second': 10.31, 'epoch': 1.0}
112
+ {'eval_loss': 0.6953310966491699, 'eval_accuracy': 0.48375451263537905, 'eval_runtime': 0.8871, 'eval_samples_per_second': 312.258, 'eval_steps_per_second': 10.146, 'epoch': 2.0}
113
+ {'eval_loss': 0.6961230039596558, 'eval_accuracy': 0.47653429602888087, 'eval_runtime': 0.8538, 'eval_samples_per_second': 324.414, 'eval_steps_per_second': 10.541, 'epoch': 3.0}
114
+ {'train_runtime': 56.6362, 'train_samples_per_second': 131.894, 'train_steps_per_second': 4.132, 'train_loss': 0.6990308843107305, 'epoch': 3.0}
115
+ Training completed in 57.00 seconds.
116
+ {'eval_loss': 0.6953310966491699, 'eval_accuracy': 0.48375451263537905, 'eval_runtime': 0.8862, 'eval_samples_per_second': 312.561, 'eval_steps_per_second': 10.155, 'epoch': 3.0}
117
+
118
+ === FINAL RESULTS for rte | bert-base-uncased | lora ===
119
+ Metric: 0.4838
120
+ Training Time: 57.00 seconds
121
+
122
+
123
+ ==============================
124
+ Task: mrpc | Model: bert-base-uncased | Method: lora
125
+ ==============================
126
+
127
+ Injected standard LoRA adapters via PEFT.
128
+ Trainable params: 1193474 / 110677252 (1.08%)
129
+ Starting training...
130
+ {'eval_loss': 0.6082455515861511, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6508, 'eval_samples_per_second': 626.892, 'eval_steps_per_second': 19.975, 'epoch': 1.0}
131
+ {'eval_loss': 0.5976766347885132, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.65, 'eval_samples_per_second': 627.713, 'eval_steps_per_second': 20.001, 'epoch': 2.0}
132
+ {'eval_loss': 0.5930073857307434, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6529, 'eval_samples_per_second': 624.911, 'eval_steps_per_second': 19.911, 'epoch': 3.0}
133
+ {'train_runtime': 43.6258, 'train_samples_per_second': 252.236, 'train_steps_per_second': 7.908, 'train_loss': 0.6178461931753849, 'epoch': 3.0}
134
+ Training completed in 43.98 seconds.
135
+ {'eval_loss': 0.5930073857307434, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6498, 'eval_samples_per_second': 627.85, 'eval_steps_per_second': 20.005, 'epoch': 3.0}
136
+
137
+ === FINAL RESULTS for mrpc | bert-base-uncased | lora ===
138
+ Metric: 0.6863
139
+ Training Time: 43.98 seconds
140
+
141
+
142
+ ==============================
143
+ Task: stsb | Model: bert-base-uncased | Method: lora
144
+ ==============================
145
+
146
+ Injected standard LoRA adapters via PEFT.
147
+ Trainable params: 1192705 / 110675714 (1.08%)
148
+ Starting training...
149
+ {'eval_loss': 2.186974048614502, 'eval_pearson': 0.4989167054020985, 'eval_spearmanr': 0.5444894161992337, 'eval_combined_score': 0.5217030608006661, 'eval_runtime': 1.5493, 'eval_samples_per_second': 968.18, 'eval_steps_per_second': 30.336, 'epoch': 1.0}
150
+ {'eval_loss': 1.4939329624176025, 'eval_pearson': 0.6610799440013643, 'eval_spearmanr': 0.6714651442830666, 'eval_combined_score': 0.6662725441422155, 'eval_runtime': 1.5413, 'eval_samples_per_second': 973.222, 'eval_steps_per_second': 30.494, 'epoch': 2.0}
151
+ {'eval_loss': 1.2076021432876587, 'eval_pearson': 0.7196282697199895, 'eval_spearmanr': 0.7309657778188898, 'eval_combined_score': 0.7252970237694396, 'eval_runtime': 1.5503, 'eval_samples_per_second': 967.559, 'eval_steps_per_second': 30.317, 'epoch': 3.0}
152
+ {'train_runtime': 62.1314, 'train_samples_per_second': 277.589, 'train_steps_per_second': 8.691, 'train_loss': 2.3513389304832177, 'epoch': 3.0}
153
+ Training completed in 62.46 seconds.
154
+ {'eval_loss': 1.2076021432876587, 'eval_pearson': 0.7196282697199895, 'eval_spearmanr': 0.7309657778188898, 'eval_combined_score': 0.7252970237694396, 'eval_runtime': 1.5501, 'eval_samples_per_second': 967.653, 'eval_steps_per_second': 30.32, 'epoch': 3.0}
155
+
156
+ === FINAL RESULTS for stsb | bert-base-uncased | lora ===
157
+ Metric: 0.7253
158
+ Training Time: 62.46 seconds
159
+
160
+
161
+ ==============================
162
+ Task: mnli | Model: bert-base-uncased | Method: diff_lora
163
+ ==============================
164
+
165
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
166
+ Trainable params: 2383933 / 111793984 (2.13%)
167
+ Starting training...
168
+ {'loss': 0.6947, 'grad_norm': 15.653863906860352, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
169
+ {'eval_loss': 0.5447636246681213, 'eval_accuracy': 0.7818644931227713, 'eval_runtime': 16.5856, 'eval_samples_per_second': 591.779, 'eval_steps_per_second': 18.51, 'epoch': 1.0}
170
+ {'loss': 0.5561, 'grad_norm': 11.298938751220703, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
171
+ {'eval_loss': 0.514032244682312, 'eval_accuracy': 0.800509424350484, 'eval_runtime': 16.5827, 'eval_samples_per_second': 591.882, 'eval_steps_per_second': 18.513, 'epoch': 2.0}
172
+ {'loss': 0.5145, 'grad_norm': 18.323087692260742, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
173
+ {'eval_loss': 0.5049977898597717, 'eval_accuracy': 0.8046867040244524, 'eval_runtime': 16.6301, 'eval_samples_per_second': 590.194, 'eval_steps_per_second': 18.46, 'epoch': 3.0}
174
+ {'train_runtime': 4639.8539, 'train_samples_per_second': 253.91, 'train_steps_per_second': 7.935, 'train_loss': 0.5715459817805947, 'epoch': 3.0}
175
+ Training completed in 4640.21 seconds.
176
+ {'eval_loss': 0.5049977898597717, 'eval_accuracy': 0.8046867040244524, 'eval_runtime': 16.5868, 'eval_samples_per_second': 591.735, 'eval_steps_per_second': 18.509, 'epoch': 3.0}
177
+ {'eval_loss': 0.48250600695610046, 'eval_accuracy': 0.8116354759967453, 'eval_runtime': 17.0429, 'eval_samples_per_second': 576.898, 'eval_steps_per_second': 18.072, 'epoch': 3.0}
178
+
179
+ === FINAL RESULTS for mnli | bert-base-uncased | diff_lora ===
180
+ Metric: 0.8047/0.8116
181
+ Training Time: 4640.21 seconds
182
+
183
+
184
+ ==============================
185
+ Task: sst2 | Model: bert-base-uncased | Method: diff_lora
186
+ ==============================
187
+
188
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
189
+ Trainable params: 2383933 / 111793215 (2.13%)
190
+ Starting training...
191
+ {'eval_loss': 0.23795804381370544, 'eval_accuracy': 0.911697247706422, 'eval_runtime': 0.8555, 'eval_samples_per_second': 1019.265, 'eval_steps_per_second': 32.729, 'epoch': 1.0}
192
+ {'eval_loss': 0.2600213289260864, 'eval_accuracy': 0.9139908256880734, 'eval_runtime': 0.8524, 'eval_samples_per_second': 1022.961, 'eval_steps_per_second': 32.847, 'epoch': 2.0}
193
+ {'eval_loss': 0.2648456394672394, 'eval_accuracy': 0.9139908256880734, 'eval_runtime': 0.88, 'eval_samples_per_second': 990.875, 'eval_steps_per_second': 31.817, 'epoch': 3.0}
194
+ {'train_runtime': 427.5234, 'train_samples_per_second': 472.599, 'train_steps_per_second': 14.771, 'train_loss': 0.2100713710226148, 'epoch': 3.0}
195
+ Training completed in 427.89 seconds.
196
+ {'eval_loss': 0.23795804381370544, 'eval_accuracy': 0.911697247706422, 'eval_runtime': 0.8749, 'eval_samples_per_second': 996.654, 'eval_steps_per_second': 32.003, 'epoch': 3.0}
197
+
198
+ === FINAL RESULTS for sst2 | bert-base-uncased | diff_lora ===
199
+ Metric: 0.9117
200
+ Training Time: 427.89 seconds
201
+
202
+
203
+ ==============================
204
+ Task: cola | Model: bert-base-uncased | Method: diff_lora
205
+ ==============================
206
+
207
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
208
+ Trainable params: 2383933 / 111793215 (2.13%)
209
+ Starting training...
210
+ {'eval_loss': 0.5782420039176941, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.6416, 'eval_samples_per_second': 1625.645, 'eval_steps_per_second': 51.435, 'epoch': 1.0}
211
+ {'eval_loss': 0.5364716649055481, 'eval_matthews_correlation': 0.3429695650358, 'eval_runtime': 0.6568, 'eval_samples_per_second': 1587.909, 'eval_steps_per_second': 50.241, 'epoch': 2.0}
212
+ {'eval_loss': 0.5494520664215088, 'eval_matthews_correlation': 0.3558006877385648, 'eval_runtime': 0.624, 'eval_samples_per_second': 1671.599, 'eval_steps_per_second': 52.889, 'epoch': 3.0}
213
+ {'train_runtime': 52.7721, 'train_samples_per_second': 486.109, 'train_steps_per_second': 15.235, 'train_loss': 0.536839651231149, 'epoch': 3.0}
214
+ Training completed in 53.13 seconds.
215
+ {'eval_loss': 0.5364716649055481, 'eval_matthews_correlation': 0.3429695650358, 'eval_runtime': 0.6229, 'eval_samples_per_second': 1674.314, 'eval_steps_per_second': 52.974, 'epoch': 3.0}
216
+
217
+ === FINAL RESULTS for cola | bert-base-uncased | diff_lora ===
218
+ Metric: 0.3430
219
+ Training Time: 53.13 seconds
220
+
221
+
222
+ ==============================
223
+ Task: qqp | Model: bert-base-uncased | Method: diff_lora
224
+ ==============================
225
+
226
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
227
+ Trainable params: 2383933 / 111793215 (2.13%)
228
+ Starting training...
229
+ {'loss': 0.3753, 'grad_norm': 9.991658210754395, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
230
+ {'eval_accuracy': 0.8646549591887213, 'eval_f1': 0.8212699242226287, 'eval_loss': 0.3013371527194977, 'eval_runtime': 51.0312, 'eval_samples_per_second': 792.26, 'eval_steps_per_second': 24.769, 'epoch': 1.0}
231
+ {'loss': 0.3016, 'grad_norm': 9.778715133666992, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
232
+ {'eval_accuracy': 0.8736829087311403, 'eval_f1': 0.8335343394504384, 'eval_loss': 0.28591614961624146, 'eval_runtime': 51.0348, 'eval_samples_per_second': 792.204, 'eval_steps_per_second': 24.767, 'epoch': 2.0}
233
+ {'loss': 0.2761, 'grad_norm': 9.68469524383545, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
234
+ {'eval_accuracy': 0.8792233489982686, 'eval_f1': 0.8404821796086375, 'eval_loss': 0.2785017192363739, 'eval_runtime': 51.8869, 'eval_samples_per_second': 779.195, 'eval_steps_per_second': 24.361, 'epoch': 3.0}
235
+ {'train_runtime': 3478.2186, 'train_samples_per_second': 313.821, 'train_steps_per_second': 9.808, 'train_loss': 0.31152516229379196, 'epoch': 3.0}
236
+ Training completed in 3478.58 seconds.
237
+ {'eval_accuracy': 0.8792233489982686, 'eval_f1': 0.8404821796086375, 'eval_loss': 0.2785017192363739, 'eval_runtime': 51.969, 'eval_samples_per_second': 777.963, 'eval_steps_per_second': 24.322, 'epoch': 3.0}
238
+
239
+ === FINAL RESULTS for qqp | bert-base-uncased | diff_lora ===
240
+ Metric: 0.8792/0.8405
241
+ Training Time: 3478.58 seconds
242
+
243
+
244
+ ==============================
245
+ Task: qnli | Model: bert-base-uncased | Method: diff_lora
246
+ ==============================
247
+
248
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
249
+ Trainable params: 2383933 / 111793215 (2.13%)
250
+ Starting training...
251
+ {'eval_loss': 0.3488791286945343, 'eval_accuracy': 0.8451400329489291, 'eval_runtime': 10.1741, 'eval_samples_per_second': 536.954, 'eval_steps_per_second': 16.807, 'epoch': 1.0}
252
+ {'eval_loss': 0.3232109546661377, 'eval_accuracy': 0.8617975471352737, 'eval_runtime': 10.278, 'eval_samples_per_second': 531.526, 'eval_steps_per_second': 16.638, 'epoch': 2.0}
253
+ {'eval_loss': 0.3128054440021515, 'eval_accuracy': 0.8652754896576972, 'eval_runtime': 10.3058, 'eval_samples_per_second': 530.091, 'eval_steps_per_second': 16.593, 'epoch': 3.0}
254
+ {'train_runtime': 1331.802, 'train_samples_per_second': 235.943, 'train_steps_per_second': 7.375, 'train_loss': 0.40391058000375435, 'epoch': 3.0}
255
+ Training completed in 1332.17 seconds.
256
+ {'eval_loss': 0.3128054440021515, 'eval_accuracy': 0.8652754896576972, 'eval_runtime': 10.3119, 'eval_samples_per_second': 529.777, 'eval_steps_per_second': 16.583, 'epoch': 3.0}
257
+
258
+ === FINAL RESULTS for qnli | bert-base-uncased | diff_lora ===
259
+ Metric: 0.8653
260
+ Training Time: 1332.17 seconds
261
+
262
+
263
+ ==============================
264
+ Task: rte | Model: bert-base-uncased | Method: diff_lora
265
+ ==============================
266
+
267
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
268
+ Trainable params: 2383933 / 111793215 (2.13%)
269
+ Starting training...
270
+ {'eval_loss': 0.6908643245697021, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.8397, 'eval_samples_per_second': 329.895, 'eval_steps_per_second': 10.719, 'epoch': 1.0}
271
+ {'eval_loss': 0.6881587505340576, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.8412, 'eval_samples_per_second': 329.302, 'eval_steps_per_second': 10.699, 'epoch': 2.0}
272
+ {'eval_loss': 0.6893179416656494, 'eval_accuracy': 0.5451263537906137, 'eval_runtime': 0.8748, 'eval_samples_per_second': 316.636, 'eval_steps_per_second': 10.288, 'epoch': 3.0}
273
+ {'train_runtime': 57.4855, 'train_samples_per_second': 129.946, 'train_steps_per_second': 4.071, 'train_loss': 0.6872217553293604, 'epoch': 3.0}
274
+ Training completed in 57.84 seconds.
275
+ {'eval_loss': 0.6881587505340576, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.8573, 'eval_samples_per_second': 323.112, 'eval_steps_per_second': 10.498, 'epoch': 3.0}
276
+
277
+ === FINAL RESULTS for rte | bert-base-uncased | diff_lora ===
278
+ Metric: 0.5379
279
+ Training Time: 57.84 seconds
280
+
281
+
282
+ ==============================
283
+ Task: mrpc | Model: bert-base-uncased | Method: diff_lora
284
+ ==============================
285
+
286
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
287
+ Trainable params: 2383933 / 111793215 (2.13%)
288
+ Starting training...
289
+ {'eval_loss': 0.6013302803039551, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8189910979228486, 'eval_runtime': 0.6298, 'eval_samples_per_second': 647.787, 'eval_steps_per_second': 20.64, 'epoch': 1.0}
290
+ {'eval_loss': 0.5768405795097351, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8233532934131736, 'eval_runtime': 0.6327, 'eval_samples_per_second': 644.858, 'eval_steps_per_second': 20.547, 'epoch': 2.0}
291
+ {'eval_loss': 0.5735046863555908, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8157099697885196, 'eval_runtime': 0.6039, 'eval_samples_per_second': 675.597, 'eval_steps_per_second': 21.526, 'epoch': 3.0}
292
+ {'train_runtime': 43.087, 'train_samples_per_second': 255.391, 'train_steps_per_second': 8.007, 'train_loss': 0.5828715448794157, 'epoch': 3.0}
293
+ Training completed in 43.44 seconds.
294
+ {'eval_loss': 0.5735046863555908, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8157099697885196, 'eval_runtime': 0.636, 'eval_samples_per_second': 641.462, 'eval_steps_per_second': 20.439, 'epoch': 3.0}
295
+
296
+ === FINAL RESULTS for mrpc | bert-base-uncased | diff_lora ===
297
+ Metric: 0.7010
298
+ Training Time: 43.44 seconds
299
+
300
+
301
+ ==============================
302
+ Task: stsb | Model: bert-base-uncased | Method: diff_lora
303
+ ==============================
304
+
305
+ Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
306
+ Trainable params: 2383933 / 111792446 (2.13%)
307
+ Starting training...
308
+ {'eval_loss': 2.023221015930176, 'eval_pearson': 0.43274608571597944, 'eval_spearmanr': 0.38884918971151733, 'eval_combined_score': 0.4107976377137484, 'eval_runtime': 1.5082, 'eval_samples_per_second': 994.574, 'eval_steps_per_second': 31.163, 'epoch': 1.0}
309
+ {'eval_loss': 1.104512333869934, 'eval_pearson': 0.7544793686109483, 'eval_spearmanr': 0.7705548315901775, 'eval_combined_score': 0.7625171001005628, 'eval_runtime': 1.5033, 'eval_samples_per_second': 997.79, 'eval_steps_per_second': 31.264, 'epoch': 2.0}
310
+ {'eval_loss': 0.9440999627113342, 'eval_pearson': 0.7774580882446339, 'eval_spearmanr': 0.7833591644633611, 'eval_combined_score': 0.7804086263539975, 'eval_runtime': 1.4677, 'eval_samples_per_second': 1021.974, 'eval_steps_per_second': 32.022, 'epoch': 3.0}
311
+ {'train_runtime': 60.8312, 'train_samples_per_second': 283.522, 'train_steps_per_second': 8.877, 'train_loss': 1.575874498155382, 'epoch': 3.0}
312
+ Training completed in 61.19 seconds.
313
+ {'eval_loss': 0.9440999627113342, 'eval_pearson': 0.7774580882446339, 'eval_spearmanr': 0.7833591644633611, 'eval_combined_score': 0.7804086263539975, 'eval_runtime': 1.5024, 'eval_samples_per_second': 998.372, 'eval_steps_per_second': 31.282, 'epoch': 3.0}
314
+
315
+ === FINAL RESULTS for stsb | bert-base-uncased | diff_lora ===
316
+ Metric: 0.7804
317
+ Training Time: 61.19 seconds
318
+
319
+
320
+ ==============================
321
+ Task: mnli | Model: bert-base-uncased | Method: adalora
322
+ ==============================
323
+
324
+ Injected AdaLoRA adapters via PEFT.
325
+ Trainable params: 1790943 / 111275551 (1.61%)
326
+ Starting training...
327
+ {'loss': 1.1374, 'grad_norm': 3.505448579788208, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
328
+ {'eval_loss': 1.0381470918655396, 'eval_accuracy': 0.47009679062659193, 'eval_runtime': 19.6674, 'eval_samples_per_second': 499.05, 'eval_steps_per_second': 15.61, 'epoch': 1.0}
329
+ {'loss': 0.9977, 'grad_norm': 1.374243140220642, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
330
+ {'eval_loss': 0.8795942664146423, 'eval_accuracy': 0.5992868059093225, 'eval_runtime': 19.9383, 'eval_samples_per_second': 492.269, 'eval_steps_per_second': 15.398, 'epoch': 2.0}
331
+ {'loss': 0.8936, 'grad_norm': 2.785215377807617, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
332
+ {'eval_loss': 0.8512938618659973, 'eval_accuracy': 0.6146714212939378, 'eval_runtime': 19.8935, 'eval_samples_per_second': 493.377, 'eval_steps_per_second': 15.432, 'epoch': 3.0}
333
+ {'train_runtime': 5589.1918, 'train_samples_per_second': 210.783, 'train_steps_per_second': 6.587, 'train_loss': 0.9832181039451258, 'epoch': 3.0}
334
+ Training completed in 5589.56 seconds.
335
+ {'eval_loss': 0.8512938618659973, 'eval_accuracy': 0.6146714212939378, 'eval_runtime': 19.9172, 'eval_samples_per_second': 492.789, 'eval_steps_per_second': 15.414, 'epoch': 3.0}
336
+ {'eval_loss': 0.8136247396469116, 'eval_accuracy': 0.6382221318144833, 'eval_runtime': 20.3656, 'eval_samples_per_second': 482.775, 'eval_steps_per_second': 15.124, 'epoch': 3.0}
337
+
338
+ === FINAL RESULTS for mnli | bert-base-uncased | adalora ===
339
+ Metric: 0.6147/0.6382
340
+ Training Time: 5589.56 seconds
341
+
342
+
343
+ ==============================
344
+ Task: sst2 | Model: bert-base-uncased | Method: adalora
345
+ ==============================
346
+
347
+ Injected AdaLoRA adapters via PEFT.
348
+ Trainable params: 1790174 / 111274013 (1.61%)
349
+ Starting training...
350
+ {'eval_loss': 0.6694198846817017, 'eval_accuracy': 0.5481651376146789, 'eval_runtime': 1.1894, 'eval_samples_per_second': 733.137, 'eval_steps_per_second': 23.541, 'epoch': 1.0}
351
+ {'eval_loss': 0.6387322545051575, 'eval_accuracy': 0.6307339449541285, 'eval_runtime': 1.1581, 'eval_samples_per_second': 752.97, 'eval_steps_per_second': 24.178, 'epoch': 2.0}
352
+ {'eval_loss': 0.6214661002159119, 'eval_accuracy': 0.6376146788990825, 'eval_runtime': 1.1568, 'eval_samples_per_second': 753.8, 'eval_steps_per_second': 24.205, 'epoch': 3.0}
353
+ {'train_runtime': 590.0367, 'train_samples_per_second': 342.431, 'train_steps_per_second': 10.703, 'train_loss': 0.7303827692003168, 'epoch': 3.0}
354
+ Training completed in 590.41 seconds.
355
+ {'eval_loss': 0.6214661002159119, 'eval_accuracy': 0.6376146788990825, 'eval_runtime': 1.1533, 'eval_samples_per_second': 756.093, 'eval_steps_per_second': 24.278, 'epoch': 3.0}
356
+
357
+ === FINAL RESULTS for sst2 | bert-base-uncased | adalora ===
358
+ Metric: 0.6376
359
+ Training Time: 590.41 seconds
360
+
361
+
362
+ ==============================
363
+ Task: cola | Model: bert-base-uncased | Method: adalora
364
+ ==============================
365
+
366
+ Injected AdaLoRA adapters via PEFT.
367
+ Trainable params: 1790174 / 111274013 (1.61%)
368
+ Starting training...
369
+ {'eval_loss': 1.3985247611999512, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9098, 'eval_samples_per_second': 1146.385, 'eval_steps_per_second': 36.271, 'epoch': 1.0}
370
+ {'eval_loss': 1.262640357017517, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9196, 'eval_samples_per_second': 1134.13, 'eval_steps_per_second': 35.883, 'epoch': 2.0}
371
+ {'eval_loss': 1.2143875360488892, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 1.0264, 'eval_samples_per_second': 1016.217, 'eval_steps_per_second': 32.153, 'epoch': 3.0}
372
+ {'train_runtime': 72.2854, 'train_samples_per_second': 354.885, 'train_steps_per_second': 11.123, 'train_loss': 1.3467446702036692, 'epoch': 3.0}
373
+ Training completed in 72.58 seconds.
374
+ {'eval_loss': 1.2143875360488892, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9954, 'eval_samples_per_second': 1047.853, 'eval_steps_per_second': 33.154, 'epoch': 3.0}
375
+
376
+ === FINAL RESULTS for cola | bert-base-uncased | adalora ===
377
+ Metric: -0.0207
378
+ Training Time: 72.58 seconds
379
+
380
+
381
+ ==============================
382
+ Task: qqp | Model: bert-base-uncased | Method: adalora
383
+ ==============================
384
+
385
+ Injected AdaLoRA adapters via PEFT.
386
+ Trainable params: 1790174 / 111274013 (1.61%)
387
+ Starting training...
388
+ {'loss': 0.6296, 'grad_norm': 2.261782646179199, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
389
+ {'eval_accuracy': 0.7579520158298293, 'eval_f1': 0.7092691622103386, 'eval_loss': 0.47327563166618347, 'eval_runtime': 65.3962, 'eval_samples_per_second': 618.232, 'eval_steps_per_second': 19.328, 'epoch': 1.0}
390
+ {'loss': 0.4677, 'grad_norm': 1.627873182296753, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
391
+ {'eval_accuracy': 0.7813999505317833, 'eval_f1': 0.7402574501851525, 'eval_loss': 0.4382329285144806, 'eval_runtime': 65.3685, 'eval_samples_per_second': 618.493, 'eval_steps_per_second': 19.337, 'epoch': 2.0}
392
+ {'loss': 0.445, 'grad_norm': 1.8313876390457153, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
393
+ {'eval_accuracy': 0.7876576799406382, 'eval_f1': 0.745002524727478, 'eval_loss': 0.4278358221054077, 'eval_runtime': 65.368, 'eval_samples_per_second': 618.498, 'eval_steps_per_second': 19.337, 'epoch': 3.0}
394
+ {'train_runtime': 4354.7417, 'train_samples_per_second': 250.655, 'train_steps_per_second': 7.834, 'train_loss': 0.5051169407328218, 'epoch': 3.0}
395
+ Training completed in 4355.09 seconds.
396
+ {'eval_accuracy': 0.7876576799406382, 'eval_f1': 0.745002524727478, 'eval_loss': 0.4278358221054077, 'eval_runtime': 65.3902, 'eval_samples_per_second': 618.289, 'eval_steps_per_second': 19.33, 'epoch': 3.0}
397
+
398
+ === FINAL RESULTS for qqp | bert-base-uncased | adalora ===
399
+ Metric: 0.7877/0.7450
400
+ Training Time: 4355.09 seconds
401
+
402
+
403
+ ==============================
404
+ Task: qnli | Model: bert-base-uncased | Method: adalora
405
+ ==============================
406
+
407
+ Injected AdaLoRA adapters via PEFT.
408
+ Trainable params: 1790174 / 111274013 (1.61%)
409
+ Starting training...
410
+ {'eval_loss': 0.6812258958816528, 'eval_accuracy': 0.5740435658063335, 'eval_runtime': 12.2629, 'eval_samples_per_second': 445.492, 'eval_steps_per_second': 13.945, 'epoch': 1.0}
411
+ {'eval_loss': 0.6761545538902283, 'eval_accuracy': 0.5824638477027274, 'eval_runtime': 12.015, 'eval_samples_per_second': 454.683, 'eval_steps_per_second': 14.232, 'epoch': 2.0}
412
+ {'eval_loss': 0.6743924021720886, 'eval_accuracy': 0.5850265421929343, 'eval_runtime': 12.1905, 'eval_samples_per_second': 448.137, 'eval_steps_per_second': 14.027, 'epoch': 3.0}
413
+ {'train_runtime': 1599.385, 'train_samples_per_second': 196.469, 'train_steps_per_second': 6.141, 'train_loss': 0.7361293226462279, 'epoch': 3.0}
414
+ Training completed in 1599.73 seconds.
415
+ {'eval_loss': 0.6743924021720886, 'eval_accuracy': 0.5850265421929343, 'eval_runtime': 12.2374, 'eval_samples_per_second': 446.417, 'eval_steps_per_second': 13.974, 'epoch': 3.0}
416
+
417
+ === FINAL RESULTS for qnli | bert-base-uncased | adalora ===
418
+ Metric: 0.5850
419
+ Training Time: 1599.73 seconds
420
+
421
+
422
+ ==============================
423
+ Task: rte | Model: bert-base-uncased | Method: adalora
424
+ ==============================
425
+
426
+ Injected AdaLoRA adapters via PEFT.
427
+ Trainable params: 1790174 / 111274013 (1.61%)
428
+ Starting training...
429
+ {'eval_loss': 1.662428379058838, 'eval_accuracy': 0.4584837545126354, 'eval_runtime': 0.9804, 'eval_samples_per_second': 282.533, 'eval_steps_per_second': 9.18, 'epoch': 1.0}
430
+ {'eval_loss': 1.6126227378845215, 'eval_accuracy': 0.47653429602888087, 'eval_runtime': 0.9767, 'eval_samples_per_second': 283.616, 'eval_steps_per_second': 9.215, 'epoch': 2.0}
431
+ {'eval_loss': 1.5964787006378174, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.964, 'eval_samples_per_second': 287.349, 'eval_steps_per_second': 9.336, 'epoch': 3.0}
432
+ {'train_runtime': 64.4909, 'train_samples_per_second': 115.83, 'train_steps_per_second': 3.628, 'train_loss': 1.6494870960202992, 'epoch': 3.0}
433
+ Training completed in 64.83 seconds.
434
+ {'eval_loss': 1.5964787006378174, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.9452, 'eval_samples_per_second': 293.057, 'eval_steps_per_second': 9.522, 'epoch': 3.0}
435
+
436
+ === FINAL RESULTS for rte | bert-base-uncased | adalora ===
437
+ Metric: 0.4693
438
+ Training Time: 64.83 seconds
439
+
440
+
441
+ ==============================
442
+ Task: mrpc | Model: bert-base-uncased | Method: adalora
443
+ ==============================
444
+
445
+ Injected AdaLoRA adapters via PEFT.
446
+ Trainable params: 1790174 / 111274013 (1.61%)
447
+ Starting training...
448
+ {'eval_loss': 1.5392217636108398, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7702, 'eval_samples_per_second': 529.751, 'eval_steps_per_second': 16.879, 'epoch': 1.0}
449
+ {'eval_loss': 1.4676430225372314, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7687, 'eval_samples_per_second': 530.759, 'eval_steps_per_second': 16.911, 'epoch': 2.0}
450
+ {'eval_loss': 1.4506442546844482, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7486, 'eval_samples_per_second': 545.023, 'eval_steps_per_second': 17.366, 'epoch': 3.0}
451
+ {'train_runtime': 51.2265, 'train_samples_per_second': 214.811, 'train_steps_per_second': 6.735, 'train_loss': 1.5334315203238225, 'epoch': 3.0}
452
+ Training completed in 51.59 seconds.
453
+ {'eval_loss': 1.4506442546844482, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7848, 'eval_samples_per_second': 519.858, 'eval_steps_per_second': 16.564, 'epoch': 3.0}
454
+
455
+ === FINAL RESULTS for mrpc | bert-base-uncased | adalora ===
456
+ Metric: 0.6887
457
+ Training Time: 51.59 seconds
458
+
459
+
460
+ ==============================
461
+ Task: stsb | Model: bert-base-uncased | Method: adalora
462
+ ==============================
463
+
464
+ Injected AdaLoRA adapters via PEFT.
465
+ Trainable params: 1789405 / 111272475 (1.61%)
466
+ Starting training...
467
+ {'eval_loss': 3.5649571418762207, 'eval_pearson': 0.018438010337713206, 'eval_spearmanr': 0.01081373344132055, 'eval_combined_score': 0.014625871889516879, 'eval_runtime': 1.9768, 'eval_samples_per_second': 758.783, 'eval_steps_per_second': 23.775, 'epoch': 1.0}
468
+ {'eval_loss': 3.0467772483825684, 'eval_pearson': 0.030652826982401318, 'eval_spearmanr': 0.027105310496517078, 'eval_combined_score': 0.028879068739459196, 'eval_runtime': 2.0047, 'eval_samples_per_second': 748.233, 'eval_steps_per_second': 23.445, 'epoch': 2.0}
469
+ {'eval_loss': 3.0490925312042236, 'eval_pearson': 0.033701994792119494, 'eval_spearmanr': 0.031193733243172723, 'eval_combined_score': 0.03244786401764611, 'eval_runtime': 1.9735, 'eval_samples_per_second': 760.073, 'eval_steps_per_second': 23.816, 'epoch': 3.0}
470
+ {'train_runtime': 75.4372, 'train_samples_per_second': 228.627, 'train_steps_per_second': 7.158, 'train_loss': 4.300191243489583, 'epoch': 3.0}
471
+ Training completed in 75.77 seconds.
472
+ {'eval_loss': 3.0467772483825684, 'eval_pearson': 0.030652826982401318, 'eval_spearmanr': 0.027105310496517078, 'eval_combined_score': 0.028879068739459196, 'eval_runtime': 1.9734, 'eval_samples_per_second': 760.094, 'eval_steps_per_second': 23.816, 'epoch': 3.0}
473
+
474
+ === FINAL RESULTS for stsb | bert-base-uncased | adalora ===
475
+ Metric: 0.0289
476
+ Training Time: 75.77 seconds
477
+
478
+
479
+ ==============================
480
+ Task: mnli | Model: bert-base-uncased | Method: vb_lora
481
+ ==============================
482
+
483
+ Injected VB-LoRA adapters via PEFT.
484
+ Trainable params: 1259779 / 110744326 (1.14%)
485
+ Starting training...
486
+ {'loss': 0.9732, 'grad_norm': 2.977407932281494, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
487
+ {'eval_loss': 0.8344303369522095, 'eval_accuracy': 0.6200713194090678, 'eval_runtime': 17.8208, 'eval_samples_per_second': 550.76, 'eval_steps_per_second': 17.227, 'epoch': 1.0}
488
+ {'loss': 0.8339, 'grad_norm': 1.741886854171753, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
489
+ {'eval_loss': 0.7724255919456482, 'eval_accuracy': 0.6549159449821701, 'eval_runtime': 18.1125, 'eval_samples_per_second': 541.892, 'eval_steps_per_second': 16.95, 'epoch': 2.0}
490
+ {'loss': 0.7929, 'grad_norm': 2.1190361976623535, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
491
+ {'eval_loss': 0.751194953918457, 'eval_accuracy': 0.6682628629648497, 'eval_runtime': 18.1553, 'eval_samples_per_second': 540.613, 'eval_steps_per_second': 16.91, 'epoch': 3.0}
492
+ {'train_runtime': 5663.3001, 'train_samples_per_second': 208.025, 'train_steps_per_second': 6.501, 'train_loss': 0.850493589000876, 'epoch': 3.0}
493
+ Training completed in 5663.65 seconds.
494
+ {'eval_loss': 0.751194953918457, 'eval_accuracy': 0.6682628629648497, 'eval_runtime': 18.1791, 'eval_samples_per_second': 539.907, 'eval_steps_per_second': 16.888, 'epoch': 3.0}
495
+ {'eval_loss': 0.7144766449928284, 'eval_accuracy': 0.6841944670463792, 'eval_runtime': 18.5603, 'eval_samples_per_second': 529.734, 'eval_steps_per_second': 16.595, 'epoch': 3.0}
496
+
497
+ === FINAL RESULTS for mnli | bert-base-uncased | vb_lora ===
498
+ Metric: 0.6683/0.6842
499
+ Training Time: 5663.65 seconds
500
+
501
+
502
+ ==============================
503
+ Task: sst2 | Model: bert-base-uncased | Method: vb_lora
504
+ ==============================
505
+
506
+ Injected VB-LoRA adapters via PEFT.
507
+ Trainable params: 1259010 / 110742788 (1.14%)
508
+ Starting training...
509
+ {'eval_loss': 0.30775463581085205, 'eval_accuracy': 0.8646788990825688, 'eval_runtime': 1.0436, 'eval_samples_per_second': 835.556, 'eval_steps_per_second': 26.83, 'epoch': 1.0}
510
+ {'eval_loss': 0.2980089485645294, 'eval_accuracy': 0.8715596330275229, 'eval_runtime': 1.046, 'eval_samples_per_second': 833.647, 'eval_steps_per_second': 26.768, 'epoch': 2.0}
511
+ {'eval_loss': 0.29754653573036194, 'eval_accuracy': 0.8704128440366973, 'eval_runtime': 1.0707, 'eval_samples_per_second': 814.396, 'eval_steps_per_second': 26.15, 'epoch': 3.0}
512
+ {'train_runtime': 683.4236, 'train_samples_per_second': 295.639, 'train_steps_per_second': 9.24, 'train_loss': 0.3783851847040776, 'epoch': 3.0}
513
+ Training completed in 683.79 seconds.
514
+ {'eval_loss': 0.29754653573036194, 'eval_accuracy': 0.8704128440366973, 'eval_runtime': 1.0569, 'eval_samples_per_second': 825.042, 'eval_steps_per_second': 26.492, 'epoch': 3.0}
515
+
516
+ === FINAL RESULTS for sst2 | bert-base-uncased | vb_lora ===
517
+ Metric: 0.8704
518
+ Training Time: 683.79 seconds
519
+
520
+
521
+ ==============================
522
+ Task: cola | Model: bert-base-uncased | Method: vb_lora
523
+ ==============================
524
+
525
+ Injected VB-LoRA adapters via PEFT.
526
+ Trainable params: 1259010 / 110742788 (1.14%)
527
+ Starting training...
528
+ {'eval_loss': 0.6138903498649597, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.869, 'eval_samples_per_second': 1200.187, 'eval_steps_per_second': 37.973, 'epoch': 1.0}
529
+ {'eval_loss': 0.6121873259544373, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8514, 'eval_samples_per_second': 1225.11, 'eval_steps_per_second': 38.762, 'epoch': 2.0}
530
+ {'eval_loss': 0.6119409203529358, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8459, 'eval_samples_per_second': 1233.046, 'eval_steps_per_second': 39.013, 'epoch': 3.0}
531
+ {'train_runtime': 86.4303, 'train_samples_per_second': 296.806, 'train_steps_per_second': 9.302, 'train_loss': 0.6045411143136855, 'epoch': 3.0}
532
+ Training completed in 86.72 seconds.
533
+ {'eval_loss': 0.6119409203529358, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8439, 'eval_samples_per_second': 1235.924, 'eval_steps_per_second': 39.104, 'epoch': 3.0}
534
+
535
+ === FINAL RESULTS for cola | bert-base-uncased | vb_lora ===
536
+ Metric: -0.0207
537
+ Training Time: 86.72 seconds
538
+
539
+
540
+ ==============================
541
+ Task: qqp | Model: bert-base-uncased | Method: vb_lora
542
+ ==============================
543
+
544
+ Injected VB-LoRA adapters via PEFT.
545
+ Trainable params: 1259010 / 110742788 (1.14%)
546
+ Starting training...
547
+ {'loss': 0.5054, 'grad_norm': 2.0415947437286377, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
548
+ {'eval_accuracy': 0.7799159040316597, 'eval_f1': 0.7357448325017819, 'eval_loss': 0.43818068504333496, 'eval_runtime': 59.0279, 'eval_samples_per_second': 684.93, 'eval_steps_per_second': 21.414, 'epoch': 1.0}
549
+ {'loss': 0.4411, 'grad_norm': 1.7730364799499512, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
550
+ {'eval_accuracy': 0.7931981202077665, 'eval_f1': 0.7508863927539254, 'eval_loss': 0.4178532361984253, 'eval_runtime': 53.8204, 'eval_samples_per_second': 751.202, 'eval_steps_per_second': 23.486, 'epoch': 2.0}
551
+ {'loss': 0.4285, 'grad_norm': 1.8256773948669434, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
552
+ {'eval_accuracy': 0.7980212713331685, 'eval_f1': 0.7538730484055699, 'eval_loss': 0.41063377261161804, 'eval_runtime': 58.2527, 'eval_samples_per_second': 694.046, 'eval_steps_per_second': 21.699, 'epoch': 3.0}
553
+ {'train_runtime': 4593.3769, 'train_samples_per_second': 237.633, 'train_steps_per_second': 7.427, 'train_loss': 0.45428847002975403, 'epoch': 3.0}
554
+ Training completed in 4593.73 seconds.
555
+ {'eval_accuracy': 0.7980212713331685, 'eval_f1': 0.7538730484055699, 'eval_loss': 0.41063377261161804, 'eval_runtime': 58.3001, 'eval_samples_per_second': 693.481, 'eval_steps_per_second': 21.681, 'epoch': 3.0}
556
+
557
+ === FINAL RESULTS for qqp | bert-base-uncased | vb_lora ===
558
+ Metric: 0.7980/0.7539
559
+ Training Time: 4593.73 seconds
560
+
561
+
562
+ ==============================
563
+ Task: qnli | Model: bert-base-uncased | Method: vb_lora
564
+ ==============================
565
+
566
+ Injected VB-LoRA adapters via PEFT.
567
+ Trainable params: 1259010 / 110742788 (1.14%)
568
+ Starting training...
569
+ {'eval_loss': 0.5664609670639038, 'eval_accuracy': 0.7063884312648728, 'eval_runtime': 11.136, 'eval_samples_per_second': 490.571, 'eval_steps_per_second': 15.356, 'epoch': 1.0}
570
+ {'eval_loss': 0.5128809809684753, 'eval_accuracy': 0.7481237415339557, 'eval_runtime': 11.1218, 'eval_samples_per_second': 491.196, 'eval_steps_per_second': 15.375, 'epoch': 2.0}
571
+ {'eval_loss': 0.503043532371521, 'eval_accuracy': 0.7565440234303497, 'eval_runtime': 11.1587, 'eval_samples_per_second': 489.575, 'eval_steps_per_second': 15.324, 'epoch': 3.0}
572
+ {'train_runtime': 1579.3084, 'train_samples_per_second': 198.966, 'train_steps_per_second': 6.219, 'train_loss': 0.5892526125184535, 'epoch': 3.0}
573
+ Training completed in 1579.68 seconds.
574
+ {'eval_loss': 0.503043532371521, 'eval_accuracy': 0.7565440234303497, 'eval_runtime': 11.1592, 'eval_samples_per_second': 489.551, 'eval_steps_per_second': 15.324, 'epoch': 3.0}
575
+
576
+ === FINAL RESULTS for qnli | bert-base-uncased | vb_lora ===
577
+ Metric: 0.7565
578
+ Training Time: 1579.68 seconds
579
+
580
+
581
+ ==============================
582
+ Task: rte | Model: bert-base-uncased | Method: vb_lora
583
+ ==============================
584
+
585
+ Injected VB-LoRA adapters via PEFT.
586
+ Trainable params: 1259010 / 110742788 (1.14%)
587
+ Starting training...
588
+ {'eval_loss': 0.6961030960083008, 'eval_accuracy': 0.4620938628158845, 'eval_runtime': 0.9002, 'eval_samples_per_second': 307.711, 'eval_steps_per_second': 9.998, 'epoch': 1.0}
589
+ {'eval_loss': 0.6959496140480042, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.9133, 'eval_samples_per_second': 303.289, 'eval_steps_per_second': 9.854, 'epoch': 2.0}
590
+ {'eval_loss': 0.6967261433601379, 'eval_accuracy': 0.4657039711191336, 'eval_runtime': 0.89, 'eval_samples_per_second': 311.227, 'eval_steps_per_second': 10.112, 'epoch': 3.0}
591
+ {'train_runtime': 60.3133, 'train_samples_per_second': 123.853, 'train_steps_per_second': 3.88, 'train_loss': 0.699567615476429, 'epoch': 3.0}
592
+ Training completed in 60.65 seconds.
593
+ {'eval_loss': 0.6959496140480042, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.872, 'eval_samples_per_second': 317.676, 'eval_steps_per_second': 10.322, 'epoch': 3.0}
594
+
595
+ === FINAL RESULTS for rte | bert-base-uncased | vb_lora ===
596
+ Metric: 0.4693
597
+ Training Time: 60.65 seconds
598
+
599
+
600
+ ==============================
601
+ Task: mrpc | Model: bert-base-uncased | Method: vb_lora
602
+ ==============================
603
+
604
+ Injected VB-LoRA adapters via PEFT.
605
+ Trainable params: 1259010 / 110742788 (1.14%)
606
+ Starting training...
607
+ {'eval_loss': 0.6160275936126709, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7, 'eval_samples_per_second': 582.878, 'eval_steps_per_second': 18.572, 'epoch': 1.0}
608
+ {'eval_loss': 0.61436927318573, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7001, 'eval_samples_per_second': 582.808, 'eval_steps_per_second': 18.57, 'epoch': 2.0}
609
+ {'eval_loss': 0.6137855052947998, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.6954, 'eval_samples_per_second': 586.711, 'eval_steps_per_second': 18.694, 'epoch': 3.0}
610
+ {'train_runtime': 52.5746, 'train_samples_per_second': 209.303, 'train_steps_per_second': 6.562, 'train_loss': 0.6311185975005661, 'epoch': 3.0}
611
+ Training completed in 52.88 seconds.
612
+ {'eval_loss': 0.6137855052947998, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.6713, 'eval_samples_per_second': 607.74, 'eval_steps_per_second': 19.364, 'epoch': 3.0}
613
+
614
+ === FINAL RESULTS for mrpc | bert-base-uncased | vb_lora ===
615
+ Metric: 0.6887
616
+ Training Time: 52.88 seconds
617
+
618
+
619
+ ==============================
620
+ Task: stsb | Model: bert-base-uncased | Method: vb_lora
621
+ ==============================
622
+
623
+ Injected VB-LoRA adapters via PEFT.
624
+ Trainable params: 1258241 / 110741250 (1.14%)
625
+ Starting training...
626
+ {'eval_loss': 2.4936869144439697, 'eval_pearson': 0.05272379775188579, 'eval_spearmanr': 0.05256041700870254, 'eval_combined_score': 0.052642107380294165, 'eval_runtime': 1.7738, 'eval_samples_per_second': 845.665, 'eval_steps_per_second': 26.497, 'epoch': 1.0}
627
+ {'eval_loss': 2.325751543045044, 'eval_pearson': 0.13511265081366103, 'eval_spearmanr': 0.16174774952420415, 'eval_combined_score': 0.1484302001689326, 'eval_runtime': 1.7752, 'eval_samples_per_second': 844.99, 'eval_steps_per_second': 26.476, 'epoch': 2.0}
628
+ {'eval_loss': 2.3533577919006348, 'eval_pearson': 0.16271341646560994, 'eval_spearmanr': 0.19961028632970146, 'eval_combined_score': 0.1811618513976557, 'eval_runtime': 1.8103, 'eval_samples_per_second': 828.594, 'eval_steps_per_second': 25.963, 'epoch': 3.0}
629
+ {'train_runtime': 78.4724, 'train_samples_per_second': 219.784, 'train_steps_per_second': 6.881, 'train_loss': 3.2359501591435187, 'epoch': 3.0}
630
+ Training completed in 78.82 seconds.
631
+ {'eval_loss': 2.325751543045044, 'eval_pearson': 0.13511265081366103, 'eval_spearmanr': 0.16174774952420415, 'eval_combined_score': 0.1484302001689326, 'eval_runtime': 1.7443, 'eval_samples_per_second': 859.947, 'eval_steps_per_second': 26.945, 'epoch': 3.0}
632
+
633
+ === FINAL RESULTS for stsb | bert-base-uncased | vb_lora ===
634
+ Metric: 0.1484
635
+ Training Time: 78.82 seconds
636
+
637
+
638
+ ==============================
639
+ Task: mnli | Model: bert-base-uncased | Method: olora
640
+ ==============================
641
+
642
+ Injected OLoRA adapters via PEFT.
643
+ Trainable params: 1194243 / 110678790 (1.08%)
644
+ Starting training...
645
+ {'loss': 0.7301, 'grad_norm': 27.89801788330078, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
646
+ {'eval_loss': 0.5824580192565918, 'eval_accuracy': 0.763525216505349, 'eval_runtime': 16.7405, 'eval_samples_per_second': 586.301, 'eval_steps_per_second': 18.339, 'epoch': 1.0}
647
+ {'loss': 0.6163, 'grad_norm': 17.541271209716797, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
648
+ {'eval_loss': 0.560714602470398, 'eval_accuracy': 0.7812531839021906, 'eval_runtime': 16.7432, 'eval_samples_per_second': 586.207, 'eval_steps_per_second': 18.336, 'epoch': 2.0}
649
+ {'loss': 0.5811, 'grad_norm': 22.611732482910156, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
650
+ {'eval_loss': 0.544284999370575, 'eval_accuracy': 0.7876719307182883, 'eval_runtime': 16.7288, 'eval_samples_per_second': 586.713, 'eval_steps_per_second': 18.352, 'epoch': 3.0}
651
+ {'train_runtime': 4657.1368, 'train_samples_per_second': 252.968, 'train_steps_per_second': 7.905, 'train_loss': 0.6288769857513548, 'epoch': 3.0}
652
+ Training completed in 4657.52 seconds.
653
+ {'eval_loss': 0.544284999370575, 'eval_accuracy': 0.7876719307182883, 'eval_runtime': 16.6966, 'eval_samples_per_second': 587.845, 'eval_steps_per_second': 18.387, 'epoch': 3.0}
654
+ {'eval_loss': 0.5244448184967041, 'eval_accuracy': 0.7910903173311635, 'eval_runtime': 17.1344, 'eval_samples_per_second': 573.815, 'eval_steps_per_second': 17.976, 'epoch': 3.0}
655
+
656
+ === FINAL RESULTS for mnli | bert-base-uncased | olora ===
657
+ Metric: 0.7877/0.7911
658
+ Training Time: 4657.52 seconds
659
+
660
+
661
+ ==============================
662
+ Task: sst2 | Model: bert-base-uncased | Method: olora
663
+ ==============================
664
+
665
+ Injected OLoRA adapters via PEFT.
666
+ Trainable params: 1193474 / 110677252 (1.08%)
667
+ Starting training...
668
+ {'eval_loss': 0.25105705857276917, 'eval_accuracy': 0.8990825688073395, 'eval_runtime': 0.9059, 'eval_samples_per_second': 962.591, 'eval_steps_per_second': 30.909, 'epoch': 1.0}
669
+ {'eval_loss': 0.2556295692920685, 'eval_accuracy': 0.8979357798165137, 'eval_runtime': 0.8961, 'eval_samples_per_second': 973.058, 'eval_steps_per_second': 31.245, 'epoch': 2.0}
670
+ {'eval_loss': 0.25559449195861816, 'eval_accuracy': 0.9059633027522935, 'eval_runtime': 0.9124, 'eval_samples_per_second': 955.713, 'eval_steps_per_second': 30.688, 'epoch': 3.0}
671
+ {'train_runtime': 446.4985, 'train_samples_per_second': 452.514, 'train_steps_per_second': 14.143, 'train_loss': 0.23866029620076207, 'epoch': 3.0}
672
+ Training completed in 446.89 seconds.
673
+ {'eval_loss': 0.25105705857276917, 'eval_accuracy': 0.8990825688073395, 'eval_runtime': 0.9447, 'eval_samples_per_second': 923.001, 'eval_steps_per_second': 29.638, 'epoch': 3.0}
674
+
675
+ === FINAL RESULTS for sst2 | bert-base-uncased | olora ===
676
+ Metric: 0.8991
677
+ Training Time: 446.89 seconds
678
+
679
+
680
+ ==============================
681
+ Task: cola | Model: bert-base-uncased | Method: olora
682
+ ==============================
683
+
684
+ Injected OLoRA adapters via PEFT.
685
+ Trainable params: 1193474 / 110677252 (1.08%)
686
+ Starting training...
687
+ {'eval_loss': 0.5550746321678162, 'eval_matthews_correlation': 0.11382192951310593, 'eval_runtime': 0.6376, 'eval_samples_per_second': 1635.944, 'eval_steps_per_second': 51.76, 'epoch': 1.0}
688
+ {'eval_loss': 0.5441713333129883, 'eval_matthews_correlation': 0.38281296016649696, 'eval_runtime': 0.61, 'eval_samples_per_second': 1709.713, 'eval_steps_per_second': 54.094, 'epoch': 2.0}
689
+ {'eval_loss': 0.5497397184371948, 'eval_matthews_correlation': 0.39302533664823136, 'eval_runtime': 0.6395, 'eval_samples_per_second': 1630.927, 'eval_steps_per_second': 51.602, 'epoch': 3.0}
690
+ {'train_runtime': 53.1826, 'train_samples_per_second': 482.357, 'train_steps_per_second': 15.118, 'train_loss': 0.5172855889619287, 'epoch': 3.0}
691
+ Training completed in 53.52 seconds.
692
+ {'eval_loss': 0.5441713333129883, 'eval_matthews_correlation': 0.38281296016649696, 'eval_runtime': 0.6376, 'eval_samples_per_second': 1635.842, 'eval_steps_per_second': 51.757, 'epoch': 3.0}
693
+
694
+ === FINAL RESULTS for cola | bert-base-uncased | olora ===
695
+ Metric: 0.3828
696
+ Training Time: 53.52 seconds
697
+
698
+
699
+ ==============================
700
+ Task: qqp | Model: bert-base-uncased | Method: olora
701
+ ==============================
702
+
703
+ Injected OLoRA adapters via PEFT.
704
+ Trainable params: 1193474 / 110677252 (1.08%)
705
+ Starting training...
706
+ {'loss': 0.3952, 'grad_norm': 15.992859840393066, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
707
+ {'eval_accuracy': 0.8515211476626268, 'eval_f1': 0.8073553480311928, 'eval_loss': 0.3308490812778473, 'eval_runtime': 52.6496, 'eval_samples_per_second': 767.906, 'eval_steps_per_second': 24.008, 'epoch': 1.0}
708
+ {'loss': 0.3326, 'grad_norm': 17.12047004699707, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
709
+ {'eval_accuracy': 0.8558496166213208, 'eval_f1': 0.815452818239392, 'eval_loss': 0.31894829869270325, 'eval_runtime': 52.635, 'eval_samples_per_second': 768.12, 'eval_steps_per_second': 24.014, 'epoch': 2.0}
710
+ {'loss': 0.3139, 'grad_norm': 19.835132598876953, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
711
+ {'eval_accuracy': 0.8632203809052683, 'eval_f1': 0.8219231016938237, 'eval_loss': 0.308892160654068, 'eval_runtime': 52.6582, 'eval_samples_per_second': 767.782, 'eval_steps_per_second': 24.004, 'epoch': 3.0}
712
+ {'train_runtime': 3570.0614, 'train_samples_per_second': 305.748, 'train_steps_per_second': 9.555, 'train_loss': 0.3423057106390892, 'epoch': 3.0}
713
+ Training completed in 3570.45 seconds.
714
+ {'eval_accuracy': 0.8632203809052683, 'eval_f1': 0.8219231016938237, 'eval_loss': 0.308892160654068, 'eval_runtime': 52.682, 'eval_samples_per_second': 767.435, 'eval_steps_per_second': 23.993, 'epoch': 3.0}
715
+
716
+ === FINAL RESULTS for qqp | bert-base-uncased | olora ===
717
+ Metric: 0.8632/0.8219
718
+ Training Time: 3570.45 seconds
719
+
720
+
721
+ ==============================
722
+ Task: qnli | Model: bert-base-uncased | Method: olora
723
+ ==============================
724
+
725
+ Injected OLoRA adapters via PEFT.
726
+ Trainable params: 1193474 / 110677252 (1.08%)
727
+ Starting training...
728
+ {'eval_loss': 0.3933061361312866, 'eval_accuracy': 0.8213435841112942, 'eval_runtime': 10.4228, 'eval_samples_per_second': 524.137, 'eval_steps_per_second': 16.406, 'epoch': 1.0}
729
+ {'eval_loss': 0.35298967361450195, 'eval_accuracy': 0.8411129416071755, 'eval_runtime': 10.2135, 'eval_samples_per_second': 534.882, 'eval_steps_per_second': 16.743, 'epoch': 2.0}
730
+ {'eval_loss': 0.338245153427124, 'eval_accuracy': 0.8513637195680029, 'eval_runtime': 10.3823, 'eval_samples_per_second': 526.183, 'eval_steps_per_second': 16.47, 'epoch': 3.0}
731
+ {'train_runtime': 1330.4657, 'train_samples_per_second': 236.18, 'train_steps_per_second': 7.382, 'train_loss': 0.42918042722968847, 'epoch': 3.0}
732
+ Training completed in 1330.87 seconds.
733
+ {'eval_loss': 0.338245153427124, 'eval_accuracy': 0.8513637195680029, 'eval_runtime': 10.3996, 'eval_samples_per_second': 525.311, 'eval_steps_per_second': 16.443, 'epoch': 3.0}
734
+
735
+ === FINAL RESULTS for qnli | bert-base-uncased | olora ===
736
+ Metric: 0.8514
737
+ Training Time: 1330.87 seconds
738
+
739
+
740
+ ==============================
741
+ Task: rte | Model: bert-base-uncased | Method: olora
742
+ ==============================
743
+
744
+ Injected OLoRA adapters via PEFT.
745
+ Trainable params: 1193474 / 110677252 (1.08%)
746
+ Starting training...
747
+ {'eval_loss': 0.6978908777236938, 'eval_accuracy': 0.4981949458483754, 'eval_runtime': 0.8742, 'eval_samples_per_second': 316.85, 'eval_steps_per_second': 10.295, 'epoch': 1.0}
748
+ {'eval_loss': 0.6917179226875305, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.8406, 'eval_samples_per_second': 329.513, 'eval_steps_per_second': 10.706, 'epoch': 2.0}
749
+ {'eval_loss': 0.6925662755966187, 'eval_accuracy': 0.5306859205776173, 'eval_runtime': 0.8629, 'eval_samples_per_second': 321.014, 'eval_steps_per_second': 10.43, 'epoch': 3.0}
750
+ {'train_runtime': 56.4059, 'train_samples_per_second': 132.433, 'train_steps_per_second': 4.149, 'train_loss': 0.6980368459326589, 'epoch': 3.0}
751
+ Training completed in 56.78 seconds.
752
+ {'eval_loss': 0.6917179226875305, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.841, 'eval_samples_per_second': 329.385, 'eval_steps_per_second': 10.702, 'epoch': 3.0}
753
+
754
+ === FINAL RESULTS for rte | bert-base-uncased | olora ===
755
+ Metric: 0.5126
756
+ Training Time: 56.78 seconds
757
+
758
+
759
+ ==============================
760
+ Task: mrpc | Model: bert-base-uncased | Method: olora
761
+ ==============================
762
+
763
+ Injected OLoRA adapters via PEFT.
764
+ Trainable params: 1193474 / 110677252 (1.08%)
765
+ Starting training...
766
+ {'eval_loss': 0.5835548639297485, 'eval_accuracy': 0.696078431372549, 'eval_f1': 0.8165680473372781, 'eval_runtime': 0.6216, 'eval_samples_per_second': 656.401, 'eval_steps_per_second': 20.915, 'epoch': 1.0}
767
+ {'eval_loss': 0.5289849638938904, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6517, 'eval_samples_per_second': 626.042, 'eval_steps_per_second': 19.947, 'epoch': 2.0}
768
+ {'eval_loss': 0.529857337474823, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6546, 'eval_samples_per_second': 623.294, 'eval_steps_per_second': 19.86, 'epoch': 3.0}
769
+ {'train_runtime': 43.5167, 'train_samples_per_second': 252.869, 'train_steps_per_second': 7.928, 'train_loss': 0.570729440882586, 'epoch': 3.0}
770
+ Training completed in 43.86 seconds.
771
+ {'eval_loss': 0.5289849638938904, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6218, 'eval_samples_per_second': 656.177, 'eval_steps_per_second': 20.908, 'epoch': 3.0}
772
+
773
+ === FINAL RESULTS for mrpc | bert-base-uncased | olora ===
774
+ Metric: 0.7108
775
+ Training Time: 43.86 seconds
776
+
777
+
778
+ ==============================
779
+ Task: stsb | Model: bert-base-uncased | Method: olora
780
+ ==============================
781
+
782
+ Injected OLoRA adapters via PEFT.
783
+ Trainable params: 1192705 / 110675714 (1.08%)
784
+ Starting training...
785
+ {'eval_loss': 1.0667306184768677, 'eval_pearson': 0.7780507120926891, 'eval_spearmanr': 0.7911748317717975, 'eval_combined_score': 0.7846127719322433, 'eval_runtime': 1.524, 'eval_samples_per_second': 984.239, 'eval_steps_per_second': 30.839, 'epoch': 1.0}
786
+ {'eval_loss': 0.8928676843643188, 'eval_pearson': 0.8168255680257844, 'eval_spearmanr': 0.8212092492693253, 'eval_combined_score': 0.8190174086475548, 'eval_runtime': 1.5491, 'eval_samples_per_second': 968.315, 'eval_steps_per_second': 30.341, 'epoch': 2.0}
787
+ {'eval_loss': 0.8309628367424011, 'eval_pearson': 0.8232657429745108, 'eval_spearmanr': 0.8258763378952371, 'eval_combined_score': 0.824571040434874, 'eval_runtime': 1.544, 'eval_samples_per_second': 971.493, 'eval_steps_per_second': 30.44, 'epoch': 3.0}
788
+ {'train_runtime': 62.4587, 'train_samples_per_second': 276.134, 'train_steps_per_second': 8.646, 'train_loss': 1.3374586317274306, 'epoch': 3.0}
789
+ Training completed in 62.84 seconds.
790
+ {'eval_loss': 0.8309628367424011, 'eval_pearson': 0.8232657429745108, 'eval_spearmanr': 0.8258763378952371, 'eval_combined_score': 0.824571040434874, 'eval_runtime': 1.5328, 'eval_samples_per_second': 978.63, 'eval_steps_per_second': 30.664, 'epoch': 3.0}
791
+
792
+ === FINAL RESULTS for stsb | bert-base-uncased | olora ===
793
+ Metric: 0.8246
794
+ Training Time: 62.84 seconds
795
+
796
+
797
+ ==============================
798
+ Task: mnli | Model: bert-base-uncased | Method: full_finetuning
799
+ ==============================
800
+
801
+ Performing full fine-tuning: All parameters are trainable.
802
+ Proceeding with full fine-tuning (no adapter injection).
803
+ Trainable params: 109484547 / 109484547 (100.00%)
804
+ Starting training...
805
+ {'loss': 0.5597, 'grad_norm': 5.752374649047852, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
806
+ {'eval_loss': 0.4514749050140381, 'eval_accuracy': 0.8239429444727457, 'eval_runtime': 14.9081, 'eval_samples_per_second': 658.366, 'eval_steps_per_second': 20.593, 'epoch': 1.0}
807
+ {'loss': 0.3919, 'grad_norm': 5.102553844451904, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
808
+ {'eval_loss': 0.46204888820648193, 'eval_accuracy': 0.8308711156393276, 'eval_runtime': 14.6024, 'eval_samples_per_second': 672.152, 'eval_steps_per_second': 21.024, 'epoch': 2.0}
809
+ {'loss': 0.3015, 'grad_norm': 4.843125820159912, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
810
+ {'eval_loss': 0.5197769403457642, 'eval_accuracy': 0.8348446255731024, 'eval_runtime': 14.5879, 'eval_samples_per_second': 672.819, 'eval_steps_per_second': 21.045, 'epoch': 3.0}
811
+ {'train_runtime': 5323.0035, 'train_samples_per_second': 221.324, 'train_steps_per_second': 6.916, 'train_loss': 0.3868698789885851, 'epoch': 3.0}
812
+ Training completed in 5323.34 seconds.
813
+ {'eval_loss': 0.4514749050140381, 'eval_accuracy': 0.8239429444727457, 'eval_runtime': 14.5681, 'eval_samples_per_second': 673.734, 'eval_steps_per_second': 21.073, 'epoch': 3.0}
814
+ {'eval_loss': 0.42962974309921265, 'eval_accuracy': 0.8319772172497966, 'eval_runtime': 14.9481, 'eval_samples_per_second': 657.744, 'eval_steps_per_second': 20.605, 'epoch': 3.0}
815
+
816
+ === FINAL RESULTS for mnli | bert-base-uncased | full_finetuning ===
817
+ Metric: 0.8239/0.8320
818
+ Training Time: 5323.34 seconds
819
+
820
+
821
+ ==============================
822
+ Task: sst2 | Model: bert-base-uncased | Method: full_finetuning
823
+ ==============================
824
+
825
+ Performing full fine-tuning: All parameters are trainable.
826
+ Proceeding with full fine-tuning (no adapter injection).
827
+ Trainable params: 109483778 / 109483778 (100.00%)
828
+ Starting training...
829
+ {'eval_loss': 0.2134767472743988, 'eval_accuracy': 0.9334862385321101, 'eval_runtime': 0.7923, 'eval_samples_per_second': 1100.568, 'eval_steps_per_second': 35.339, 'epoch': 1.0}
830
+ {'eval_loss': 0.2665484845638275, 'eval_accuracy': 0.9220183486238532, 'eval_runtime': 0.7939, 'eval_samples_per_second': 1098.343, 'eval_steps_per_second': 35.268, 'epoch': 2.0}
831
+ {'eval_loss': 0.2949509024620056, 'eval_accuracy': 0.926605504587156, 'eval_runtime': 0.8068, 'eval_samples_per_second': 1080.866, 'eval_steps_per_second': 34.707, 'epoch': 3.0}
832
+ {'train_runtime': 479.0704, 'train_samples_per_second': 421.748, 'train_steps_per_second': 13.182, 'train_loss': 0.12836022939553643, 'epoch': 3.0}
833
+ Training completed in 479.43 seconds.
834
+ {'eval_loss': 0.2134767472743988, 'eval_accuracy': 0.9334862385321101, 'eval_runtime': 0.8041, 'eval_samples_per_second': 1084.426, 'eval_steps_per_second': 34.821, 'epoch': 3.0}
835
+
836
+ === FINAL RESULTS for sst2 | bert-base-uncased | full_finetuning ===
837
+ Metric: 0.9335
838
+ Training Time: 479.43 seconds
839
+
840
+
841
+ ==============================
842
+ Task: cola | Model: bert-base-uncased | Method: full_finetuning
843
+ ==============================
844
+
845
+ Performing full fine-tuning: All parameters are trainable.
846
+ Proceeding with full fine-tuning (no adapter injection).
847
+ Trainable params: 109483778 / 109483778 (100.00%)
848
+ Starting training...
849
+ {'eval_loss': 0.412758469581604, 'eval_matthews_correlation': 0.5526896422396544, 'eval_runtime': 0.5472, 'eval_samples_per_second': 1906.193, 'eval_steps_per_second': 60.311, 'epoch': 1.0}
850
+ {'eval_loss': 0.46548306941986084, 'eval_matthews_correlation': 0.5677348492150284, 'eval_runtime': 0.5415, 'eval_samples_per_second': 1926.018, 'eval_steps_per_second': 60.938, 'epoch': 2.0}
851
+ {'eval_loss': 0.5247135162353516, 'eval_matthews_correlation': 0.5679361809424823, 'eval_runtime': 0.5406, 'eval_samples_per_second': 1929.352, 'eval_steps_per_second': 61.044, 'epoch': 3.0}
852
+ {'train_runtime': 52.3626, 'train_samples_per_second': 489.911, 'train_steps_per_second': 15.354, 'train_loss': 0.32915398137486396, 'epoch': 3.0}
853
+ Training completed in 52.71 seconds.
854
+ {'eval_loss': 0.412758469581604, 'eval_matthews_correlation': 0.5526896422396544, 'eval_runtime': 0.4981, 'eval_samples_per_second': 2093.982, 'eval_steps_per_second': 66.253, 'epoch': 3.0}
855
+
856
+ === FINAL RESULTS for cola | bert-base-uncased | full_finetuning ===
857
+ Metric: 0.5527
858
+ Training Time: 52.71 seconds
859
+
860
+
861
+ ==============================
862
+ Task: qqp | Model: bert-base-uncased | Method: full_finetuning
863
+ ==============================
864
+
865
+ Performing full fine-tuning: All parameters are trainable.
866
+ Proceeding with full fine-tuning (no adapter injection).
867
+ Trainable params: 109483778 / 109483778 (100.00%)
868
+ Starting training...
869
+ {'loss': 0.3062, 'grad_norm': 6.693933010101318, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
870
+ {'eval_accuracy': 0.8985901558248826, 'eval_f1': 0.8664059954382535, 'eval_loss': 0.242730051279068, 'eval_runtime': 45.8719, 'eval_samples_per_second': 881.367, 'eval_steps_per_second': 27.555, 'epoch': 1.0}
871
+ {'loss': 0.2005, 'grad_norm': 3.212670087814331, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
872
+ {'eval_accuracy': 0.9056393767004699, 'eval_f1': 0.875485492346356, 'eval_loss': 0.240932896733284, 'eval_runtime': 46.6544, 'eval_samples_per_second': 866.585, 'eval_steps_per_second': 27.093, 'epoch': 2.0}
873
+ {'loss': 0.1439, 'grad_norm': 6.470639705657959, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
874
+ {'eval_accuracy': 0.9091021518674252, 'eval_f1': 0.8781619865398004, 'eval_loss': 0.27697858214378357, 'eval_runtime': 46.7501, 'eval_samples_per_second': 864.81, 'eval_steps_per_second': 27.037, 'epoch': 3.0}
875
+ {'train_runtime': 3914.5254, 'train_samples_per_second': 278.843, 'train_steps_per_second': 8.714, 'train_loss': 0.20568252834988449, 'epoch': 3.0}
876
+ Training completed in 3914.84 seconds.
877
+ {'eval_accuracy': 0.9056393767004699, 'eval_f1': 0.875485492346356, 'eval_loss': 0.240932896733284, 'eval_runtime': 46.8238, 'eval_samples_per_second': 863.45, 'eval_steps_per_second': 26.995, 'epoch': 3.0}
878
+
879
+ === FINAL RESULTS for qqp | bert-base-uncased | full_finetuning ===
880
+ Metric: 0.9056/0.8755
881
+ Training Time: 3914.84 seconds
882
+
883
+
884
+ ==============================
885
+ Task: qnli | Model: bert-base-uncased | Method: full_finetuning
886
+ ==============================
887
+
888
+ Performing full fine-tuning: All parameters are trainable.
889
+ Proceeding with full fine-tuning (no adapter injection).
890
+ Trainable params: 109483778 / 109483778 (100.00%)
891
+ Starting training...
892
+ {'eval_loss': 0.2799956798553467, 'eval_accuracy': 0.8892549881017756, 'eval_runtime': 9.229, 'eval_samples_per_second': 591.941, 'eval_steps_per_second': 18.529, 'epoch': 1.0}
893
+ {'eval_loss': 0.27452367544174194, 'eval_accuracy': 0.8945634266886326, 'eval_runtime': 9.2156, 'eval_samples_per_second': 592.797, 'eval_steps_per_second': 18.555, 'epoch': 2.0}
894
+ {'eval_loss': 0.3037053644657135, 'eval_accuracy': 0.900054914881933, 'eval_runtime': 9.0708, 'eval_samples_per_second': 602.26, 'eval_steps_per_second': 18.852, 'epoch': 3.0}
895
+ {'train_runtime': 1543.5827, 'train_samples_per_second': 203.571, 'train_steps_per_second': 6.363, 'train_loss': 0.2646887299000331, 'epoch': 3.0}
896
+ Training completed in 1543.95 seconds.
897
+ {'eval_loss': 0.27452367544174194, 'eval_accuracy': 0.8945634266886326, 'eval_runtime': 9.0652, 'eval_samples_per_second': 602.631, 'eval_steps_per_second': 18.863, 'epoch': 3.0}
898
+
899
+ === FINAL RESULTS for qnli | bert-base-uncased | full_finetuning ===
900
+ Metric: 0.8946
901
+ Training Time: 1543.95 seconds
902
+
903
+
904
+ ==============================
905
+ Task: rte | Model: bert-base-uncased | Method: full_finetuning
906
+ ==============================
907
+
908
+ Performing full fine-tuning: All parameters are trainable.
909
+ Proceeding with full fine-tuning (no adapter injection).
910
+ Trainable params: 109483778 / 109483778 (100.00%)
911
+ Starting training...
912
+ {'eval_loss': 0.6838569641113281, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.7626, 'eval_samples_per_second': 363.225, 'eval_steps_per_second': 11.802, 'epoch': 1.0}
913
+ {'eval_loss': 0.6644460558891296, 'eval_accuracy': 0.6137184115523465, 'eval_runtime': 0.7536, 'eval_samples_per_second': 367.558, 'eval_steps_per_second': 11.942, 'epoch': 2.0}
914
+ {'eval_loss': 0.6593529582023621, 'eval_accuracy': 0.6173285198555957, 'eval_runtime': 0.755, 'eval_samples_per_second': 366.902, 'eval_steps_per_second': 11.921, 'epoch': 3.0}
915
+ {'train_runtime': 68.9582, 'train_samples_per_second': 108.326, 'train_steps_per_second': 3.393, 'train_loss': 0.6521141508705596, 'epoch': 3.0}
916
+ Training completed in 69.31 seconds.
917
+ {'eval_loss': 0.6593529582023621, 'eval_accuracy': 0.6173285198555957, 'eval_runtime': 0.7661, 'eval_samples_per_second': 361.559, 'eval_steps_per_second': 11.747, 'epoch': 3.0}
918
+
919
+ === FINAL RESULTS for rte | bert-base-uncased | full_finetuning ===
920
+ Metric: 0.6173
921
+ Training Time: 69.31 seconds
922
+
923
+
924
+ ==============================
925
+ Task: mrpc | Model: bert-base-uncased | Method: full_finetuning
926
+ ==============================
927
+
928
+ Performing full fine-tuning: All parameters are trainable.
929
+ Proceeding with full fine-tuning (no adapter injection).
930
+ Trainable params: 109483778 / 109483778 (100.00%)
931
+ Starting training...
932
+ {'eval_loss': 0.4329614043235779, 'eval_accuracy': 0.7916666666666666, 'eval_f1': 0.8393194706994329, 'eval_runtime': 0.5617, 'eval_samples_per_second': 726.329, 'eval_steps_per_second': 23.143, 'epoch': 1.0}
933
+ {'eval_loss': 0.37167686223983765, 'eval_accuracy': 0.8382352941176471, 'eval_f1': 0.8838028169014085, 'eval_runtime': 0.5608, 'eval_samples_per_second': 727.595, 'eval_steps_per_second': 23.183, 'epoch': 2.0}
934
+ {'eval_loss': 0.373555064201355, 'eval_accuracy': 0.8455882352941176, 'eval_f1': 0.8911917098445595, 'eval_runtime': 0.5397, 'eval_samples_per_second': 755.993, 'eval_steps_per_second': 24.088, 'epoch': 3.0}
935
+ {'train_runtime': 51.1271, 'train_samples_per_second': 215.228, 'train_steps_per_second': 6.748, 'train_loss': 0.40035665760869565, 'epoch': 3.0}
936
+ Training completed in 51.43 seconds.
937
+ {'eval_loss': 0.37167686223983765, 'eval_accuracy': 0.8382352941176471, 'eval_f1': 0.8838028169014085, 'eval_runtime': 0.543, 'eval_samples_per_second': 751.353, 'eval_steps_per_second': 23.94, 'epoch': 3.0}
938
+
939
+ === FINAL RESULTS for mrpc | bert-base-uncased | full_finetuning ===
940
+ Metric: 0.8382
941
+ Training Time: 51.43 seconds
942
+
943
+
944
+ ==============================
945
+ Task: stsb | Model: bert-base-uncased | Method: full_finetuning
946
+ ==============================
947
+
948
+ Performing full fine-tuning: All parameters are trainable.
949
+ Proceeding with full fine-tuning (no adapter injection).
950
+ Trainable params: 109483009 / 109483009 (100.00%)
951
+ Starting training...
952
+ {'eval_loss': 0.6429872512817383, 'eval_pearson': 0.8484146526961172, 'eval_spearmanr': 0.8490013360398954, 'eval_combined_score': 0.8487079943680063, 'eval_runtime': 1.3435, 'eval_samples_per_second': 1116.498, 'eval_steps_per_second': 34.984, 'epoch': 1.0}
953
+ {'eval_loss': 0.6205856204032898, 'eval_pearson': 0.8590625126747368, 'eval_spearmanr': 0.8583028815966305, 'eval_combined_score': 0.8586826971356836, 'eval_runtime': 1.3332, 'eval_samples_per_second': 1125.142, 'eval_steps_per_second': 35.254, 'epoch': 2.0}
954
+ {'eval_loss': 0.5940393209457397, 'eval_pearson': 0.8632712038056458, 'eval_spearmanr': 0.8619679001368761, 'eval_combined_score': 0.862619551971261, 'eval_runtime': 1.3473, 'eval_samples_per_second': 1113.324, 'eval_steps_per_second': 34.884, 'epoch': 3.0}
955
+ {'train_runtime': 69.6028, 'train_samples_per_second': 247.792, 'train_steps_per_second': 7.758, 'train_loss': 0.745247960973669, 'epoch': 3.0}
956
+ Training completed in 69.90 seconds.
957
+ {'eval_loss': 0.5940393209457397, 'eval_pearson': 0.8632712038056458, 'eval_spearmanr': 0.8619679001368761, 'eval_combined_score': 0.862619551971261, 'eval_runtime': 1.3184, 'eval_samples_per_second': 1137.781, 'eval_steps_per_second': 35.65, 'epoch': 3.0}
958
+
959
+ === FINAL RESULTS for stsb | bert-base-uncased | full_finetuning ===
960
+ Metric: 0.8626
961
+ Training Time: 69.90 seconds
962
+
963
+
964
+ ===== Summary of GLUE Results =====
965
+ Model | Method || mnli (m/mm) | sst2 (Acc) | cola (Mcc) | qqp (Acc/F1) | qnli (Acc) | rte (Acc) | mrpc (Acc) | stsb (Corr) || Average
966
+ -------------------------------------------------------------------------------------------------------------------------------------
967
+ bert-base-uncased | lora || 0.7754/0.7897 | 0.9014 | 0.2736 | 0.8592/0.8201 | 0.8424 | 0.4838 | 0.6863 | 0.7253 || 0.6919
968
+ bert-base-uncased | diff_lora || 0.8047/0.8116 | 0.9117 | 0.3430 | 0.8792/0.8405 | 0.8653 | 0.5379 | 0.7010 | 0.7804 || 0.7259
969
+ bert-base-uncased | adalora || 0.6147/0.6382 | 0.6376 | -0.0207 | 0.7877/0.7450 | 0.5850 | 0.4693 | 0.6887 | 0.0289 || 0.4727
970
+ bert-base-uncased | vb_lora || 0.6683/0.6842 | 0.8704 | -0.0207 | 0.7980/0.7539 | 0.7565 | 0.4693 | 0.6887 | 0.1484 || 0.5456
971
+ bert-base-uncased | olora || 0.7877/0.7911 | 0.8991 | 0.3828 | 0.8632/0.8219 | 0.8514 | 0.5126 | 0.7108 | 0.8246 || 0.7267
972
+ bert-base-uncased | full_finetuning || 0.8239/0.8320 | 0.9335 | 0.5527 | 0.9056/0.8755 | 0.8946 | 0.6173 | 0.8382 | 0.8626 || 0.8022