2025-04-12 13:03:40,371 - INFO - Starting GLiNER fine-tuning at 20250412_130340 2025-04-12 13:03:40,372 - INFO - Output directory: gliner_finetuned_20250412_130340 2025-04-12 13:03:40,372 - INFO - Loading data from ./gliner_dataset_fixed.json... 2025-04-12 13:03:44,939 - INFO - Successfully loaded 32439 samples from ./gliner_dataset_fixed.json 2025-04-12 13:03:44,976 - INFO - Sequence Length Analysis: 2025-04-12 13:03:44,976 - INFO - Minimum length: 246 2025-04-12 13:03:44,976 - INFO - Maximum length: 1286 2025-04-12 13:03:44,976 - INFO - Mean length: 450.5 2025-04-12 13:03:44,976 - INFO - Median length: 447 2025-04-12 13:03:44,976 - INFO - 90th percentile: 540 2025-04-12 13:03:44,976 - INFO - 95th percentile: 567 2025-04-12 13:03:44,976 - INFO - 99th percentile: 636 2025-04-12 13:03:44,976 - INFO - Sequences exceeding 1024 tokens: 19 2025-04-12 13:03:44,976 - INFO - Sequences exceeding 2048 tokens: 0 2025-04-12 13:03:44,976 - INFO - Recommended max_len setting: 768 2025-04-12 13:03:44,976 - INFO - Using maximum sequence length: 768 2025-04-12 13:03:45,128 - INFO - Extracted entity types: ['AGE', 'AGE_INFO', 'ANIMAL_INFO', 'BEHAVIORAL_PATTERN', 'CONTEXT_SENSITIVE', 'CRIMINAL_RECORD', 'DATE_TIME', 'ECONOMIC_STATUS', 'EMAIL_ADDRESS', 'EMPLOYMENT_INFO', 'FAMILY_RELATION', 'FINANCIAL_INFO', 'GOV_ID', 'HEALTH_INFO', 'IDENTIFIABLE_IMAGE', 'NO_ADDRESS', 'NO_PHONE_NUMBER', 'PERSON', 'POLITICAL_CASE', 'POSTAL_CODE', 'SEXUAL_ORIENTATION'] 2025-04-12 13:03:45,304 - INFO - Dataset Statistics: 2025-04-12 13:03:45,304 - INFO - Total samples: 32439 2025-04-12 13:03:45,304 - INFO - Samples with entities: 32439 (100.0%) 2025-04-12 13:03:45,304 - INFO - Total entities: 583306 2025-04-12 13:03:45,304 - INFO - Average entities per sample: 17.98 2025-04-12 13:03:45,304 - INFO - Entity type distribution: 2025-04-12 13:03:45,304 - INFO - PERSON: 80730 (13.8%) 2025-04-12 13:03:45,304 - INFO - DATE_TIME: 76795 (13.2%) 2025-04-12 13:03:45,304 - INFO - HEALTH_INFO: 52559 (9.0%) 2025-04-12 13:03:45,304 - INFO - GOV_ID: 47823 (8.2%) 2025-04-12 13:03:45,304 - INFO - NO_ADDRESS: 45076 (7.7%) 2025-04-12 13:03:45,304 - INFO - CRIMINAL_RECORD: 37081 (6.4%) 2025-04-12 13:03:45,304 - INFO - NO_PHONE_NUMBER: 32198 (5.5%) 2025-04-12 13:03:45,304 - INFO - EMAIL_ADDRESS: 30921 (5.3%) 2025-04-12 13:03:45,304 - INFO - FAMILY_RELATION: 29615 (5.1%) 2025-04-12 13:03:45,304 - INFO - CONTEXT_SENSITIVE: 23812 (4.1%) 2025-04-12 13:03:45,304 - INFO - EMPLOYMENT_INFO: 22452 (3.8%) 2025-04-12 13:03:45,304 - INFO - FINANCIAL_INFO: 20784 (3.6%) 2025-04-12 13:03:45,304 - INFO - POLITICAL_CASE: 18895 (3.2%) 2025-04-12 13:03:45,304 - INFO - BEHAVIORAL_PATTERN: 15996 (2.7%) 2025-04-12 13:03:45,304 - INFO - ECONOMIC_STATUS: 14990 (2.6%) 2025-04-12 13:03:45,304 - INFO - IDENTIFIABLE_IMAGE: 14825 (2.5%) 2025-04-12 13:03:45,304 - INFO - SEXUAL_ORIENTATION: 11758 (2.0%) 2025-04-12 13:03:45,304 - INFO - POSTAL_CODE: 6985 (1.2%) 2025-04-12 13:03:45,304 - INFO - ANIMAL_INFO: 6 (0.0%) 2025-04-12 13:03:45,304 - INFO - AGE: 3 (0.0%) 2025-04-12 13:03:45,304 - INFO - AGE_INFO: 2 (0.0%) 2025-04-12 13:03:45,320 - INFO - Dataset split: 29195 training samples, 3244 validation samples 2025-04-12 13:03:46,412 - INFO - Successfully imported GLiNER components 2025-04-12 13:03:46,463 - INFO - Using device: cuda 2025-04-12 13:03:46,463 - INFO - Estimated GPU memory requirement: 1.0 GB with batch size 3 2025-04-12 13:03:46,463 - INFO - Loading base model: urchade/gliner_multi_pii-v1 with max_length=768 2025-04-12 13:03:53,498 - INFO - Updating model configuration with max_len=768 2025-04-12 13:03:53,890 - INFO - Starting training for 1 epochs... 2025-04-12 13:03:55,247 - INFO - Epoch: 0.00 | loss: 140.323600 | grad_norm: 3202.168945 | learning_rate: 0.000000 2025-04-12 13:03:59,822 - INFO - Epoch: 0.00 | loss: 122.931800 | grad_norm: 2228.249268 | learning_rate: 0.000000 2025-04-12 13:04:04,342 - INFO - Epoch: 0.00 | loss: 136.630000 | grad_norm: 3372.845703 | learning_rate: 0.000000 2025-04-12 13:04:09,320 - INFO - Epoch: 0.00 | loss: 135.774000 | grad_norm: 3298.110596 | learning_rate: 0.000000 2025-04-12 13:04:15,500 - INFO - Epoch: 0.00 | loss: 140.438300 | grad_norm: 5675.559570 | learning_rate: 0.000000 2025-04-12 13:04:20,727 - INFO - Epoch: 0.01 | loss: 122.378000 | grad_norm: 4736.537109 | learning_rate: 0.000001 2025-04-12 13:04:25,399 - INFO - Epoch: 0.01 | loss: 85.118100 | grad_norm: 1768.929321 | learning_rate: 0.000001 2025-04-12 13:04:30,546 - INFO - Epoch: 0.01 | loss: 81.233900 | grad_norm: 2576.842285 | learning_rate: 0.000001 2025-04-12 13:04:35,275 - INFO - Epoch: 0.01 | loss: 61.796600 | grad_norm: 1398.179077 | learning_rate: 0.000001 2025-04-12 13:04:40,228 - INFO - Epoch: 0.01 | loss: 75.628900 | grad_norm: 1661.466187 | learning_rate: 0.000001 2025-04-12 13:04:45,190 - INFO - Epoch: 0.01 | loss: 64.279600 | grad_norm: 750.858337 | learning_rate: 0.000001 2025-04-12 13:04:50,023 - INFO - Epoch: 0.01 | loss: 53.126000 | grad_norm: 952.372009 | learning_rate: 0.000001 2025-04-12 13:04:55,124 - INFO - Epoch: 0.01 | loss: 45.007000 | grad_norm: 907.657349 | learning_rate: 0.000001 2025-04-12 13:05:00,075 - INFO - Epoch: 0.01 | loss: 50.020200 | grad_norm: 743.049988 | learning_rate: 0.000001 2025-04-12 13:05:05,014 - INFO - Epoch: 0.01 | loss: 49.268200 | grad_norm: 1706.012329 | learning_rate: 0.000001 2025-04-12 13:05:10,765 - INFO - Epoch: 0.02 | loss: 34.830100 | grad_norm: 1231.194214 | learning_rate: 0.000002 2025-04-12 13:05:16,286 - INFO - Epoch: 0.02 | loss: 34.937400 | grad_norm: 630.159424 | learning_rate: 0.000002 2025-04-12 13:05:21,064 - INFO - Epoch: 0.02 | loss: 27.923300 | grad_norm: 296.572327 | learning_rate: 0.000002 2025-04-12 13:05:25,980 - INFO - Epoch: 0.02 | loss: 34.720300 | grad_norm: 460.758911 | learning_rate: 0.000002 2025-04-12 13:05:31,049 - INFO - Epoch: 0.02 | loss: 30.638000 | grad_norm: 316.702271 | learning_rate: 0.000002 2025-04-12 13:05:36,070 - INFO - Epoch: 0.02 | loss: 29.622900 | grad_norm: 783.264099 | learning_rate: 0.000002 2025-04-12 13:05:41,026 - INFO - Epoch: 0.02 | loss: 24.152500 | grad_norm: 765.701111 | learning_rate: 0.000002 2025-04-12 13:05:45,953 - INFO - Epoch: 0.02 | loss: 25.327200 | grad_norm: 476.966766 | learning_rate: 0.000002 2025-04-12 13:05:50,608 - INFO - Epoch: 0.02 | loss: 25.635400 | grad_norm: 670.216675 | learning_rate: 0.000002 2025-04-12 13:05:55,174 - INFO - Epoch: 0.02 | loss: 18.274500 | grad_norm: 342.680725 | learning_rate: 0.000002 2025-04-12 13:05:59,737 - INFO - Epoch: 0.03 | loss: 21.405900 | grad_norm: 677.592224 | learning_rate: 0.000003 2025-04-12 13:06:04,563 - INFO - Epoch: 0.03 | loss: 22.647400 | grad_norm: 911.266418 | learning_rate: 0.000003 2025-04-12 13:06:09,758 - INFO - Epoch: 0.03 | loss: 21.935700 | grad_norm: 193.076965 | learning_rate: 0.000003 2025-04-12 13:06:14,863 - INFO - Epoch: 0.03 | loss: 24.583100 | grad_norm: 370.018524 | learning_rate: 0.000003 2025-04-12 13:06:19,529 - INFO - Epoch: 0.03 | loss: 20.682700 | grad_norm: 259.588165 | learning_rate: 0.000003 2025-04-12 13:06:24,568 - INFO - Epoch: 0.03 | loss: 21.886300 | grad_norm: 359.787537 | learning_rate: 0.000003 2025-04-12 13:06:28,844 - INFO - Epoch: 0.03 | loss: 18.259400 | grad_norm: 242.610794 | learning_rate: 0.000003 2025-04-12 13:06:33,635 - INFO - Epoch: 0.03 | loss: 23.930000 | grad_norm: 180.025772 | learning_rate: 0.000003 2025-04-12 13:06:38,887 - INFO - Epoch: 0.03 | loss: 23.324100 | grad_norm: 589.947571 | learning_rate: 0.000003 2025-04-12 13:06:43,399 - INFO - Epoch: 0.03 | loss: 18.778900 | grad_norm: 368.380524 | learning_rate: 0.000003 2025-04-12 13:06:47,966 - INFO - Epoch: 0.04 | loss: 17.444500 | grad_norm: 243.626511 | learning_rate: 0.000004 2025-04-12 13:06:52,757 - INFO - Epoch: 0.04 | loss: 19.247900 | grad_norm: 196.236847 | learning_rate: 0.000004 2025-04-12 13:06:57,724 - INFO - Epoch: 0.04 | loss: 19.234600 | grad_norm: 147.374191 | learning_rate: 0.000004 2025-04-12 13:07:02,525 - INFO - Epoch: 0.04 | loss: 18.150900 | grad_norm: 403.455231 | learning_rate: 0.000004 2025-04-12 13:07:07,402 - INFO - Epoch: 0.04 | loss: 18.727900 | grad_norm: 395.928955 | learning_rate: 0.000004 2025-04-12 13:07:12,693 - INFO - Epoch: 0.04 | loss: 12.666500 | grad_norm: 229.890106 | learning_rate: 0.000004 2025-04-12 13:07:17,500 - INFO - Epoch: 0.04 | loss: 17.782100 | grad_norm: 544.844543 | learning_rate: 0.000004 2025-04-12 13:07:22,182 - INFO - Epoch: 0.04 | loss: 15.039500 | grad_norm: 85.487595 | learning_rate: 0.000004 2025-04-12 13:07:26,989 - INFO - Epoch: 0.04 | loss: 13.855000 | grad_norm: 524.025269 | learning_rate: 0.000004 2025-04-12 13:07:31,592 - INFO - Epoch: 0.05 | loss: 13.171100 | grad_norm: 225.053772 | learning_rate: 0.000005 2025-04-12 13:07:36,790 - INFO - Epoch: 0.05 | loss: 16.996100 | grad_norm: 307.995178 | learning_rate: 0.000005 2025-04-12 13:07:41,716 - INFO - Epoch: 0.05 | loss: 17.788600 | grad_norm: 279.316956 | learning_rate: 0.000005 2025-04-12 13:07:46,791 - INFO - Epoch: 0.05 | loss: 17.880900 | grad_norm: 337.658447 | learning_rate: 0.000005 2025-04-12 13:07:51,791 - INFO - Epoch: 0.05 | loss: 18.573600 | grad_norm: 867.915283 | learning_rate: 0.000005 2025-04-12 13:07:56,962 - INFO - Epoch: 0.05 | loss: 14.010600 | grad_norm: 286.679047 | learning_rate: 0.000005 2025-04-12 13:08:01,976 - INFO - Epoch: 0.05 | loss: 16.094600 | grad_norm: 516.637878 | learning_rate: 0.000005 2025-04-12 13:08:07,094 - INFO - Epoch: 0.05 | loss: 16.508000 | grad_norm: 352.346252 | learning_rate: 0.000005 2025-04-12 13:08:11,847 - INFO - Epoch: 0.05 | loss: 14.087300 | grad_norm: 229.355164 | learning_rate: 0.000005 2025-04-12 13:08:16,545 - INFO - Epoch: 0.05 | loss: 12.506700 | grad_norm: 1041.391235 | learning_rate: 0.000005 2025-04-12 13:08:21,287 - INFO - Epoch: 0.06 | loss: 15.508500 | grad_norm: 205.268646 | learning_rate: 0.000006 2025-04-12 13:08:26,530 - INFO - Epoch: 0.06 | loss: 15.205600 | grad_norm: 194.818970 | learning_rate: 0.000006 2025-04-12 13:08:31,537 - INFO - Epoch: 0.06 | loss: 15.721500 | grad_norm: 347.851044 | learning_rate: 0.000006 2025-04-12 13:08:36,544 - INFO - Epoch: 0.06 | loss: 14.087700 | grad_norm: 212.622299 | learning_rate: 0.000006 2025-04-12 13:08:41,630 - INFO - Epoch: 0.06 | loss: 15.985800 | grad_norm: 304.040955 | learning_rate: 0.000006 2025-04-12 13:08:46,340 - INFO - Epoch: 0.06 | loss: 12.333700 | grad_norm: 140.206787 | learning_rate: 0.000006 2025-04-12 13:08:51,104 - INFO - Epoch: 0.06 | loss: 12.214800 | grad_norm: 353.609955 | learning_rate: 0.000006 2025-04-12 13:08:56,070 - INFO - Epoch: 0.06 | loss: 14.773400 | grad_norm: 247.484085 | learning_rate: 0.000006 2025-04-12 13:09:00,694 - INFO - Epoch: 0.06 | loss: 17.139800 | grad_norm: 347.786896 | learning_rate: 0.000006 2025-04-12 13:09:05,535 - INFO - Epoch: 0.06 | loss: 14.044100 | grad_norm: 203.295685 | learning_rate: 0.000006 2025-04-12 13:09:10,581 - INFO - Epoch: 0.07 | loss: 16.125900 | grad_norm: 360.842987 | learning_rate: 0.000007 2025-04-12 13:09:15,301 - INFO - Epoch: 0.07 | loss: 11.543000 | grad_norm: 289.093597 | learning_rate: 0.000007 2025-04-12 13:09:20,314 - INFO - Epoch: 0.07 | loss: 17.411800 | grad_norm: 369.311737 | learning_rate: 0.000007 2025-04-12 13:09:25,283 - INFO - Epoch: 0.07 | loss: 14.195800 | grad_norm: 148.265686 | learning_rate: 0.000007 2025-04-12 13:09:30,312 - INFO - Epoch: 0.07 | loss: 14.200000 | grad_norm: 5914.278320 | learning_rate: 0.000007 2025-04-12 13:09:35,466 - INFO - Epoch: 0.07 | loss: 11.618100 | grad_norm: 183.046478 | learning_rate: 0.000007 2025-04-12 13:09:40,041 - INFO - Epoch: 0.07 | loss: 15.876000 | grad_norm: 329.735809 | learning_rate: 0.000007 2025-04-12 13:09:45,089 - INFO - Epoch: 0.07 | loss: 15.193700 | grad_norm: 222.757660 | learning_rate: 0.000007 2025-04-12 13:09:49,945 - INFO - Epoch: 0.07 | loss: 11.622400 | grad_norm: 143.095200 | learning_rate: 0.000007 2025-04-12 13:09:54,720 - INFO - Epoch: 0.08 | loss: 12.488700 | grad_norm: 229.603699 | learning_rate: 0.000007 2025-04-12 13:09:59,426 - INFO - Epoch: 0.08 | loss: 13.804100 | grad_norm: 227.989349 | learning_rate: 0.000008 2025-04-12 13:10:04,301 - INFO - Epoch: 0.08 | loss: 12.408300 | grad_norm: 386.193726 | learning_rate: 0.000008 2025-04-12 13:10:09,020 - INFO - Epoch: 0.08 | loss: 12.478000 | grad_norm: 260.824127 | learning_rate: 0.000008 2025-04-12 13:10:13,686 - INFO - Epoch: 0.08 | loss: 10.913700 | grad_norm: 412.287415 | learning_rate: 0.000008 2025-04-12 13:10:18,134 - INFO - Epoch: 0.08 | loss: 13.846700 | grad_norm: 209.536148 | learning_rate: 0.000008 2025-04-12 13:10:23,349 - INFO - Epoch: 0.08 | loss: 14.299400 | grad_norm: 306.600891 | learning_rate: 0.000008 2025-04-12 13:10:28,371 - INFO - Epoch: 0.08 | loss: 13.481700 | grad_norm: 242.817383 | learning_rate: 0.000008 2025-04-12 13:10:33,381 - INFO - Epoch: 0.08 | loss: 18.169100 | grad_norm: 172.849060 | learning_rate: 0.000008 2025-04-12 13:10:38,087 - INFO - Epoch: 0.08 | loss: 14.438300 | grad_norm: 146.484772 | learning_rate: 0.000008 2025-04-12 13:10:43,094 - INFO - Epoch: 0.09 | loss: 12.364200 | grad_norm: 278.056274 | learning_rate: 0.000009 2025-04-12 13:10:47,970 - INFO - Epoch: 0.09 | loss: 15.034000 | grad_norm: 372.200623 | learning_rate: 0.000009 2025-04-12 13:10:52,898 - INFO - Epoch: 0.09 | loss: 12.979900 | grad_norm: 228.185608 | learning_rate: 0.000009 2025-04-12 13:10:57,652 - INFO - Epoch: 0.09 | loss: 10.969300 | grad_norm: 398.439026 | learning_rate: 0.000009 2025-04-12 13:11:02,798 - INFO - Epoch: 0.09 | loss: 13.686000 | grad_norm: 200.278397 | learning_rate: 0.000009 2025-04-12 13:11:07,638 - INFO - Epoch: 0.09 | loss: 10.847800 | grad_norm: 257.384338 | learning_rate: 0.000009 2025-04-12 13:11:12,281 - INFO - Epoch: 0.09 | loss: 11.342200 | grad_norm: 201.933990 | learning_rate: 0.000009 2025-04-12 13:11:17,642 - INFO - Epoch: 0.09 | loss: 11.669800 | grad_norm: 648.375488 | learning_rate: 0.000009 2025-04-12 13:11:22,607 - INFO - Epoch: 0.09 | loss: 9.725600 | grad_norm: 221.250824 | learning_rate: 0.000009 2025-04-12 13:11:27,603 - INFO - Epoch: 0.09 | loss: 13.729000 | grad_norm: 318.458221 | learning_rate: 0.000009 2025-04-12 13:11:32,917 - INFO - Epoch: 0.10 | loss: 14.026800 | grad_norm: 399.364288 | learning_rate: 0.000010 2025-04-12 13:11:37,630 - INFO - Epoch: 0.10 | loss: 8.859800 | grad_norm: 242.018692 | learning_rate: 0.000010 2025-04-12 13:11:42,337 - INFO - Epoch: 0.10 | loss: 10.120700 | grad_norm: 256.770264 | learning_rate: 0.000010 2025-04-12 13:11:46,893 - INFO - Epoch: 0.10 | loss: 10.423500 | grad_norm: 146.113419 | learning_rate: 0.000010 2025-04-12 13:11:51,896 - INFO - Epoch: 0.10 | loss: 14.408400 | grad_norm: 110.297394 | learning_rate: 0.000010 2025-04-12 13:11:56,793 - INFO - Epoch: 0.10 | loss: 13.246900 | grad_norm: 360.215668 | learning_rate: 0.000010 2025-04-12 13:12:01,321 - INFO - Epoch: 0.10 | loss: 12.821900 | grad_norm: 303.729004 | learning_rate: 0.000010 2025-04-12 13:12:06,479 - INFO - Epoch: 0.10 | loss: 10.633800 | grad_norm: 258.658783 | learning_rate: 0.000010 2025-04-12 13:12:11,163 - INFO - Epoch: 0.10 | loss: 9.851800 | grad_norm: 224.755554 | learning_rate: 0.000010 2025-04-12 13:12:16,410 - INFO - Epoch: 0.10 | loss: 11.058000 | grad_norm: 142.910080 | learning_rate: 0.000010 2025-04-12 13:12:21,922 - INFO - Epoch: 0.11 | loss: 12.503400 | grad_norm: 327.789062 | learning_rate: 0.000010 2025-04-12 13:12:26,895 - INFO - Epoch: 0.11 | loss: 12.233800 | grad_norm: 241.726212 | learning_rate: 0.000010 2025-04-12 13:12:31,957 - INFO - Epoch: 0.11 | loss: 10.690800 | grad_norm: 309.933502 | learning_rate: 0.000010 2025-04-12 13:12:36,890 - INFO - Epoch: 0.11 | loss: 11.568500 | grad_norm: 173.722412 | learning_rate: 0.000010 2025-04-12 13:12:41,902 - INFO - Epoch: 0.11 | loss: 10.906800 | grad_norm: 215.633133 | learning_rate: 0.000010 2025-04-12 13:12:46,436 - INFO - Epoch: 0.11 | loss: 8.508900 | grad_norm: 109.257668 | learning_rate: 0.000010 2025-04-12 13:12:51,363 - INFO - Epoch: 0.11 | loss: 11.365800 | grad_norm: 302.849915 | learning_rate: 0.000010 2025-04-12 13:12:55,991 - INFO - Epoch: 0.11 | loss: 10.441400 | grad_norm: 203.624191 | learning_rate: 0.000010 2025-04-12 13:13:00,500 - INFO - Epoch: 0.11 | loss: 12.760100 | grad_norm: 233.451630 | learning_rate: 0.000010 2025-04-12 13:13:05,166 - INFO - Epoch: 0.12 | loss: 9.600200 | grad_norm: 174.898682 | learning_rate: 0.000010 2025-04-12 13:13:09,864 - INFO - Epoch: 0.12 | loss: 9.026700 | grad_norm: 160.688339 | learning_rate: 0.000010 2025-04-12 13:13:14,671 - INFO - Epoch: 0.12 | loss: 13.254200 | grad_norm: 220.910706 | learning_rate: 0.000010 2025-04-12 13:13:19,441 - INFO - Epoch: 0.12 | loss: 11.850900 | grad_norm: 378.505585 | learning_rate: 0.000010 2025-04-12 13:13:24,500 - INFO - Epoch: 0.12 | loss: 9.920900 | grad_norm: 119.590233 | learning_rate: 0.000010 2025-04-12 13:13:29,406 - INFO - Epoch: 0.12 | loss: 9.538100 | grad_norm: 123.160431 | learning_rate: 0.000010 2025-04-12 13:13:34,243 - INFO - Epoch: 0.12 | loss: 8.840100 | grad_norm: 239.793457 | learning_rate: 0.000010 2025-04-12 13:13:39,383 - INFO - Epoch: 0.12 | loss: 11.353500 | grad_norm: 177.385834 | learning_rate: 0.000010 2025-04-12 13:13:44,359 - INFO - Epoch: 0.12 | loss: 11.843200 | grad_norm: 222.862488 | learning_rate: 0.000010 2025-04-12 13:13:49,330 - INFO - Epoch: 0.12 | loss: 12.599700 | grad_norm: 262.372314 | learning_rate: 0.000010 2025-04-12 13:13:54,453 - INFO - Epoch: 0.13 | loss: 10.381500 | grad_norm: 157.198944 | learning_rate: 0.000010 2025-04-12 13:13:59,525 - INFO - Epoch: 0.13 | loss: 8.941200 | grad_norm: 204.814377 | learning_rate: 0.000010 2025-04-12 13:14:04,451 - INFO - Epoch: 0.13 | loss: 12.429500 | grad_norm: 429.325104 | learning_rate: 0.000010 2025-04-12 13:14:09,308 - INFO - Epoch: 0.13 | loss: 9.601000 | grad_norm: 170.943161 | learning_rate: 0.000010 2025-04-12 13:14:14,010 - INFO - Epoch: 0.13 | loss: 9.504600 | grad_norm: 166.994324 | learning_rate: 0.000010 2025-04-12 13:14:18,706 - INFO - Epoch: 0.13 | loss: 7.949500 | grad_norm: 177.706299 | learning_rate: 0.000010 2025-04-12 13:14:23,704 - INFO - Epoch: 0.13 | loss: 10.319200 | grad_norm: 37.959476 | learning_rate: 0.000010 2025-04-12 13:14:28,357 - INFO - Epoch: 0.13 | loss: 11.337800 | grad_norm: 268.395691 | learning_rate: 0.000010 2025-04-12 13:14:32,850 - INFO - Epoch: 0.13 | loss: 10.926000 | grad_norm: 132.685593 | learning_rate: 0.000010 2025-04-12 13:14:37,612 - INFO - Epoch: 0.13 | loss: 10.744500 | grad_norm: 211.276596 | learning_rate: 0.000010 2025-04-12 13:14:42,888 - INFO - Epoch: 0.14 | loss: 9.679200 | grad_norm: 237.727524 | learning_rate: 0.000010 2025-04-12 13:14:47,798 - INFO - Epoch: 0.14 | loss: 9.869500 | grad_norm: 114.600510 | learning_rate: 0.000010 2025-04-12 13:14:52,563 - INFO - Epoch: 0.14 | loss: 12.670200 | grad_norm: 356.117615 | learning_rate: 0.000010 2025-04-12 13:14:57,011 - INFO - Epoch: 0.14 | loss: 9.825200 | grad_norm: 190.207169 | learning_rate: 0.000010 2025-04-12 13:15:01,885 - INFO - Epoch: 0.14 | loss: 8.672400 | grad_norm: 43.438644 | learning_rate: 0.000010 2025-04-12 13:15:06,752 - INFO - Epoch: 0.14 | loss: 13.132600 | grad_norm: 198.819275 | learning_rate: 0.000010 2025-04-12 13:15:11,655 - INFO - Epoch: 0.14 | loss: 10.446000 | grad_norm: 160.790024 | learning_rate: 0.000010 2025-04-12 13:15:16,646 - INFO - Epoch: 0.14 | loss: 10.615300 | grad_norm: 261.243927 | learning_rate: 0.000010 2025-04-12 13:15:21,473 - INFO - Epoch: 0.14 | loss: 13.122000 | grad_norm: 217.565292 | learning_rate: 0.000010 2025-04-12 13:15:26,173 - INFO - Epoch: 0.14 | loss: 11.326900 | grad_norm: 361.550934 | learning_rate: 0.000010 2025-04-12 13:15:31,179 - INFO - Epoch: 0.15 | loss: 11.022500 | grad_norm: 313.673859 | learning_rate: 0.000009 2025-04-12 13:15:35,918 - INFO - Epoch: 0.15 | loss: 9.279100 | grad_norm: 141.020462 | learning_rate: 0.000009 2025-04-12 13:15:41,556 - INFO - Epoch: 0.15 | loss: 8.138300 | grad_norm: 176.172974 | learning_rate: 0.000009 2025-04-12 13:15:46,891 - INFO - Epoch: 0.15 | loss: 9.171700 | grad_norm: 179.364594 | learning_rate: 0.000009 2025-04-12 13:15:52,039 - INFO - Epoch: 0.15 | loss: 9.265600 | grad_norm: 116.765434 | learning_rate: 0.000009 2025-04-12 13:15:57,061 - INFO - Epoch: 0.15 | loss: 8.886200 | grad_norm: 168.787003 | learning_rate: 0.000009 2025-04-12 13:16:01,951 - INFO - Epoch: 0.15 | loss: 9.663300 | grad_norm: 122.484428 | learning_rate: 0.000009 2025-04-12 13:16:06,787 - INFO - Epoch: 0.15 | loss: 8.221800 | grad_norm: 121.013901 | learning_rate: 0.000009 2025-04-12 13:16:11,560 - INFO - Epoch: 0.15 | loss: 10.473300 | grad_norm: 143.095215 | learning_rate: 0.000009 2025-04-12 13:16:16,325 - INFO - Epoch: 0.16 | loss: 11.047000 | grad_norm: 170.376892 | learning_rate: 0.000009 2025-04-12 13:16:21,176 - INFO - Epoch: 0.16 | loss: 7.486800 | grad_norm: 187.857925 | learning_rate: 0.000009 2025-04-12 13:16:26,195 - INFO - Epoch: 0.16 | loss: 11.668000 | grad_norm: 122.206276 | learning_rate: 0.000009 2025-04-12 13:16:30,935 - INFO - Epoch: 0.16 | loss: 9.785100 | grad_norm: 203.279541 | learning_rate: 0.000009 2025-04-12 13:16:35,552 - INFO - Epoch: 0.16 | loss: 10.121500 | grad_norm: 111.596519 | learning_rate: 0.000009 2025-04-12 13:16:40,302 - INFO - Epoch: 0.16 | loss: 9.139100 | grad_norm: 171.863174 | learning_rate: 0.000009 2025-04-12 13:16:45,421 - INFO - Epoch: 0.16 | loss: 12.535300 | grad_norm: 478.663269 | learning_rate: 0.000009 2025-04-12 13:16:50,139 - INFO - Epoch: 0.16 | loss: 9.026900 | grad_norm: 136.858490 | learning_rate: 0.000009 2025-04-12 13:16:55,067 - INFO - Epoch: 0.16 | loss: 8.993200 | grad_norm: 228.719360 | learning_rate: 0.000009 2025-04-12 13:16:59,880 - INFO - Epoch: 0.16 | loss: 9.624900 | grad_norm: 196.997406 | learning_rate: 0.000009 2025-04-12 13:17:04,556 - INFO - Epoch: 0.17 | loss: 9.369600 | grad_norm: 157.081589 | learning_rate: 0.000009 2025-04-12 13:17:09,207 - INFO - Epoch: 0.17 | loss: 7.470200 | grad_norm: 384.407654 | learning_rate: 0.000009 2025-04-12 13:17:13,981 - INFO - Epoch: 0.17 | loss: 11.250200 | grad_norm: 98.221336 | learning_rate: 0.000009 2025-04-12 13:17:18,704 - INFO - Epoch: 0.17 | loss: 7.889400 | grad_norm: 403.473511 | learning_rate: 0.000009 2025-04-12 13:17:23,464 - INFO - Epoch: 0.17 | loss: 7.950600 | grad_norm: 175.380264 | learning_rate: 0.000009 2025-04-12 13:17:28,450 - INFO - Epoch: 0.17 | loss: 8.299900 | grad_norm: 138.773834 | learning_rate: 0.000009 2025-04-12 13:17:33,176 - INFO - Epoch: 0.17 | loss: 11.407800 | grad_norm: 126.975990 | learning_rate: 0.000009 2025-04-12 13:17:38,267 - INFO - Epoch: 0.17 | loss: 10.452800 | grad_norm: 199.321426 | learning_rate: 0.000009 2025-04-12 13:17:42,950 - INFO - Epoch: 0.17 | loss: 13.838900 | grad_norm: 125.865280 | learning_rate: 0.000009 2025-04-12 13:17:48,096 - INFO - Epoch: 0.17 | loss: 10.910900 | grad_norm: 119.468239 | learning_rate: 0.000009 2025-04-12 13:17:53,393 - INFO - Epoch: 0.18 | loss: 10.052200 | grad_norm: 127.475319 | learning_rate: 0.000009 2025-04-12 13:17:58,483 - INFO - Epoch: 0.18 | loss: 8.309900 | grad_norm: 85.123940 | learning_rate: 0.000009 2025-04-12 13:18:03,374 - INFO - Epoch: 0.18 | loss: 12.824100 | grad_norm: 70.342445 | learning_rate: 0.000009 2025-04-12 13:18:08,399 - INFO - Epoch: 0.18 | loss: 8.963400 | grad_norm: 239.226059 | learning_rate: 0.000009 2025-04-12 13:18:13,461 - INFO - Epoch: 0.18 | loss: 10.422200 | grad_norm: 160.722229 | learning_rate: 0.000009 2025-04-12 13:18:18,594 - INFO - Epoch: 0.18 | loss: 9.861600 | grad_norm: 239.182755 | learning_rate: 0.000009 2025-04-12 13:18:23,712 - INFO - Epoch: 0.18 | loss: 9.840100 | grad_norm: 155.232559 | learning_rate: 0.000009 2025-04-12 13:18:28,658 - INFO - Epoch: 0.18 | loss: 8.459300 | grad_norm: 118.197296 | learning_rate: 0.000009 2025-04-12 13:18:33,797 - INFO - Epoch: 0.18 | loss: 11.507100 | grad_norm: 493.543488 | learning_rate: 0.000009 2025-04-12 13:18:38,324 - INFO - Epoch: 0.18 | loss: 9.893900 | grad_norm: 192.767975 | learning_rate: 0.000009 2025-04-12 13:18:43,871 - INFO - Epoch: 0.19 | loss: 11.987100 | grad_norm: 127.520386 | learning_rate: 0.000009 2025-04-12 13:18:48,541 - INFO - Epoch: 0.19 | loss: 8.927000 | grad_norm: 130.875107 | learning_rate: 0.000009 2025-04-12 13:18:53,339 - INFO - Epoch: 0.19 | loss: 11.326400 | grad_norm: 372.239288 | learning_rate: 0.000009 2025-04-12 13:18:58,231 - INFO - Epoch: 0.19 | loss: 8.505300 | grad_norm: 158.539932 | learning_rate: 0.000009 2025-04-12 13:19:03,174 - INFO - Epoch: 0.19 | loss: 11.918800 | grad_norm: 143.806641 | learning_rate: 0.000009 2025-04-12 13:19:08,072 - INFO - Epoch: 0.19 | loss: 10.185900 | grad_norm: 181.976791 | learning_rate: 0.000009 2025-04-12 13:19:13,169 - INFO - Epoch: 0.19 | loss: 9.336900 | grad_norm: 235.125412 | learning_rate: 0.000009 2025-04-12 13:19:18,134 - INFO - Epoch: 0.19 | loss: 10.596800 | grad_norm: 153.295563 | learning_rate: 0.000009 2025-04-12 13:19:23,291 - INFO - Epoch: 0.19 | loss: 11.532400 | grad_norm: 180.517410 | learning_rate: 0.000009 2025-04-12 13:19:28,087 - INFO - Epoch: 0.20 | loss: 7.950800 | grad_norm: 89.394417 | learning_rate: 0.000009 2025-04-12 13:19:33,038 - INFO - Epoch: 0.20 | loss: 11.513200 | grad_norm: 220.705627 | learning_rate: 0.000009 2025-04-12 13:19:37,835 - INFO - Epoch: 0.20 | loss: 9.515300 | grad_norm: 158.275833 | learning_rate: 0.000009 2025-04-12 13:19:42,642 - INFO - Epoch: 0.20 | loss: 7.920000 | grad_norm: 192.083725 | learning_rate: 0.000009 2025-04-12 13:19:47,685 - INFO - Epoch: 0.20 | loss: 10.522000 | grad_norm: 176.627640 | learning_rate: 0.000009 2025-04-12 13:19:52,612 - INFO - Epoch: 0.20 | loss: 8.986400 | grad_norm: 69.967316 | learning_rate: 0.000009 2025-04-12 13:19:57,338 - INFO - Epoch: 0.20 | loss: 7.393300 | grad_norm: 139.512222 | learning_rate: 0.000009 2025-04-12 13:20:02,631 - INFO - Epoch: 0.20 | loss: 11.286400 | grad_norm: 139.708160 | learning_rate: 0.000009 2025-04-12 13:20:07,536 - INFO - Epoch: 0.20 | loss: 9.165600 | grad_norm: 129.656143 | learning_rate: 0.000009 2025-04-12 13:20:12,144 - INFO - Epoch: 0.20 | loss: 8.093200 | grad_norm: 185.253296 | learning_rate: 0.000009 2025-04-12 13:20:17,360 - INFO - Epoch: 0.21 | loss: 10.281400 | grad_norm: 123.848625 | learning_rate: 0.000009 2025-04-12 13:20:22,546 - INFO - Epoch: 0.21 | loss: 9.330700 | grad_norm: 96.689911 | learning_rate: 0.000009 2025-04-12 13:20:27,460 - INFO - Epoch: 0.21 | loss: 9.685800 | grad_norm: 246.061371 | learning_rate: 0.000009 2025-04-12 13:20:32,654 - INFO - Epoch: 0.21 | loss: 9.645400 | grad_norm: 118.117348 | learning_rate: 0.000009 2025-04-12 13:20:37,420 - INFO - Epoch: 0.21 | loss: 11.327300 | grad_norm: 735.560608 | learning_rate: 0.000009 2025-04-12 13:20:42,195 - INFO - Epoch: 0.21 | loss: 8.779200 | grad_norm: 162.624527 | learning_rate: 0.000009 2025-04-12 13:20:47,100 - INFO - Epoch: 0.21 | loss: 8.040800 | grad_norm: 186.949463 | learning_rate: 0.000009 2025-04-12 13:20:52,182 - INFO - Epoch: 0.21 | loss: 11.939700 | grad_norm: 238.575089 | learning_rate: 0.000009 2025-04-12 13:20:57,466 - INFO - Epoch: 0.21 | loss: 8.386900 | grad_norm: 206.860916 | learning_rate: 0.000009 2025-04-12 13:21:02,273 - INFO - Epoch: 0.21 | loss: 10.324900 | grad_norm: 220.601776 | learning_rate: 0.000009 2025-04-12 13:21:07,134 - INFO - Epoch: 0.22 | loss: 12.337700 | grad_norm: 258.849548 | learning_rate: 0.000009 2025-04-12 13:21:11,716 - INFO - Epoch: 0.22 | loss: 8.690800 | grad_norm: 118.897179 | learning_rate: 0.000009 2025-04-12 13:21:16,157 - INFO - Epoch: 0.22 | loss: 8.456300 | grad_norm: 113.506714 | learning_rate: 0.000009 2025-04-12 13:21:20,823 - INFO - Epoch: 0.22 | loss: 10.193700 | grad_norm: 230.971069 | learning_rate: 0.000009 2025-04-12 13:21:25,942 - INFO - Epoch: 0.22 | loss: 10.914600 | grad_norm: 359.075287 | learning_rate: 0.000009 2025-04-12 13:21:30,914 - INFO - Epoch: 0.22 | loss: 9.846200 | grad_norm: 95.352348 | learning_rate: 0.000009 2025-04-12 13:21:36,184 - INFO - Epoch: 0.22 | loss: 9.183500 | grad_norm: 108.590225 | learning_rate: 0.000009 2025-04-12 13:21:41,173 - INFO - Epoch: 0.22 | loss: 8.491400 | grad_norm: 236.502258 | learning_rate: 0.000009 2025-04-12 13:21:46,268 - INFO - Epoch: 0.22 | loss: 8.148300 | grad_norm: 93.073677 | learning_rate: 0.000009 2025-04-12 13:21:51,279 - INFO - Epoch: 0.23 | loss: 8.646700 | grad_norm: 215.540909 | learning_rate: 0.000009 2025-04-12 13:21:56,243 - INFO - Epoch: 0.23 | loss: 10.064000 | grad_norm: 119.493904 | learning_rate: 0.000009 2025-04-12 13:22:01,138 - INFO - Epoch: 0.23 | loss: 8.380300 | grad_norm: 151.765625 | learning_rate: 0.000009 2025-04-12 13:22:06,243 - INFO - Epoch: 0.23 | loss: 8.200500 | grad_norm: 192.291504 | learning_rate: 0.000009 2025-04-12 13:22:11,509 - INFO - Epoch: 0.23 | loss: 12.022300 | grad_norm: 77.321487 | learning_rate: 0.000009 2025-04-12 13:22:16,526 - INFO - Epoch: 0.23 | loss: 7.255800 | grad_norm: 136.719971 | learning_rate: 0.000009 2025-04-12 13:22:21,685 - INFO - Epoch: 0.23 | loss: 11.603400 | grad_norm: 200.584198 | learning_rate: 0.000009 2025-04-12 13:22:26,330 - INFO - Epoch: 0.23 | loss: 8.317700 | grad_norm: 254.862640 | learning_rate: 0.000009 2025-04-12 13:22:31,150 - INFO - Epoch: 0.23 | loss: 8.072700 | grad_norm: 129.506866 | learning_rate: 0.000009 2025-04-12 13:22:35,740 - INFO - Epoch: 0.23 | loss: 9.700600 | grad_norm: 126.428780 | learning_rate: 0.000009 2025-04-12 13:22:40,952 - INFO - Epoch: 0.24 | loss: 10.537700 | grad_norm: 170.902603 | learning_rate: 0.000008 2025-04-12 13:22:45,861 - INFO - Epoch: 0.24 | loss: 10.995200 | grad_norm: 186.118927 | learning_rate: 0.000008 2025-04-12 13:22:50,802 - INFO - Epoch: 0.24 | loss: 8.657600 | grad_norm: 149.398468 | learning_rate: 0.000008 2025-04-12 13:22:55,723 - INFO - Epoch: 0.24 | loss: 9.032900 | grad_norm: 250.721375 | learning_rate: 0.000008 2025-04-12 13:23:00,333 - INFO - Epoch: 0.24 | loss: 9.792100 | grad_norm: 261.121857 | learning_rate: 0.000008 2025-04-12 13:23:05,312 - INFO - Epoch: 0.24 | loss: 9.730500 | grad_norm: 203.776016 | learning_rate: 0.000008 2025-04-12 13:23:10,137 - INFO - Epoch: 0.24 | loss: 9.312600 | grad_norm: 63.727493 | learning_rate: 0.000008 2025-04-12 13:23:15,009 - INFO - Epoch: 0.24 | loss: 9.760600 | grad_norm: 128.871857 | learning_rate: 0.000008 2025-04-12 13:23:19,778 - INFO - Epoch: 0.24 | loss: 8.218200 | grad_norm: 109.720314 | learning_rate: 0.000008 2025-04-12 13:23:24,597 - INFO - Epoch: 0.24 | loss: 9.186700 | grad_norm: 126.330658 | learning_rate: 0.000008 2025-04-12 13:23:29,785 - INFO - Epoch: 0.25 | loss: 11.490500 | grad_norm: 332.805328 | learning_rate: 0.000008 2025-04-12 13:23:34,616 - INFO - Epoch: 0.25 | loss: 8.006900 | grad_norm: 65.856461 | learning_rate: 0.000008 2025-04-12 13:23:39,894 - INFO - Epoch: 0.25 | loss: 9.076200 | grad_norm: 80.843346 | learning_rate: 0.000008 2025-04-12 13:23:44,735 - INFO - Epoch: 0.25 | loss: 8.252900 | grad_norm: 140.454056 | learning_rate: 0.000008 2025-04-12 13:23:49,647 - INFO - Epoch: 0.25 | loss: 9.478700 | grad_norm: 211.163757 | learning_rate: 0.000008 2025-04-12 13:23:54,971 - INFO - Epoch: 0.25 | loss: 10.933100 | grad_norm: 108.509666 | learning_rate: 0.000008 2025-04-12 13:23:59,753 - INFO - Epoch: 0.25 | loss: 8.078800 | grad_norm: 304.605865 | learning_rate: 0.000008 2025-04-12 13:24:04,914 - INFO - Epoch: 0.25 | loss: 11.836200 | grad_norm: 92.139931 | learning_rate: 0.000008 2025-04-12 13:24:09,634 - INFO - Epoch: 0.25 | loss: 9.477000 | grad_norm: 213.685043 | learning_rate: 0.000008 2025-04-12 13:24:14,745 - INFO - Epoch: 0.25 | loss: 10.333000 | grad_norm: 104.636925 | learning_rate: 0.000008 2025-04-12 13:24:19,576 - INFO - Epoch: 0.26 | loss: 8.542600 | grad_norm: 189.688599 | learning_rate: 0.000008 2025-04-12 13:24:24,135 - INFO - Epoch: 0.26 | loss: 6.630400 | grad_norm: 103.268814 | learning_rate: 0.000008 2025-04-12 13:24:28,915 - INFO - Epoch: 0.26 | loss: 11.098800 | grad_norm: 389.465973 | learning_rate: 0.000008 2025-04-12 13:24:34,030 - INFO - Epoch: 0.26 | loss: 9.243500 | grad_norm: 193.817886 | learning_rate: 0.000008 2025-04-12 13:24:38,702 - INFO - Epoch: 0.26 | loss: 9.214200 | grad_norm: 221.601730 | learning_rate: 0.000008 2025-04-12 13:24:43,701 - INFO - Epoch: 0.26 | loss: 11.502000 | grad_norm: 146.895599 | learning_rate: 0.000008 2025-04-12 13:24:48,474 - INFO - Epoch: 0.26 | loss: 9.123200 | grad_norm: 160.744232 | learning_rate: 0.000008 2025-04-12 13:24:52,982 - INFO - Epoch: 0.26 | loss: 7.053700 | grad_norm: 103.487732 | learning_rate: 0.000008 2025-04-12 13:24:57,610 - INFO - Epoch: 0.26 | loss: 11.391600 | grad_norm: 195.566742 | learning_rate: 0.000008 2025-04-12 13:25:02,467 - INFO - Epoch: 0.27 | loss: 7.873100 | grad_norm: 184.171875 | learning_rate: 0.000008 2025-04-12 13:25:07,470 - INFO - Epoch: 0.27 | loss: 11.293600 | grad_norm: 133.982697 | learning_rate: 0.000008 2025-04-12 13:25:12,257 - INFO - Epoch: 0.27 | loss: 8.463800 | grad_norm: 220.445770 | learning_rate: 0.000008 2025-04-12 13:25:17,269 - INFO - Epoch: 0.27 | loss: 9.302400 | grad_norm: 294.414337 | learning_rate: 0.000008 2025-04-12 13:25:22,051 - INFO - Epoch: 0.27 | loss: 9.946100 | grad_norm: 149.248917 | learning_rate: 0.000008 2025-04-12 13:25:26,938 - INFO - Epoch: 0.27 | loss: 7.441700 | grad_norm: 100.579521 | learning_rate: 0.000008 2025-04-12 13:25:31,719 - INFO - Epoch: 0.27 | loss: 9.369900 | grad_norm: 94.863937 | learning_rate: 0.000008 2025-04-12 13:25:36,419 - INFO - Epoch: 0.27 | loss: 7.556200 | grad_norm: 27.794888 | learning_rate: 0.000008 2025-04-12 13:25:41,450 - INFO - Epoch: 0.27 | loss: 9.516300 | grad_norm: 725.003418 | learning_rate: 0.000008 2025-04-12 13:25:45,917 - INFO - Epoch: 0.27 | loss: 8.697600 | grad_norm: 47.667351 | learning_rate: 0.000008 2025-04-12 13:25:50,642 - INFO - Epoch: 0.28 | loss: 7.805500 | grad_norm: 96.199341 | learning_rate: 0.000008 2025-04-12 13:25:55,884 - INFO - Epoch: 0.28 | loss: 10.508600 | grad_norm: 161.221939 | learning_rate: 0.000008 2025-04-12 13:26:00,952 - INFO - Epoch: 0.28 | loss: 10.779800 | grad_norm: 143.742081 | learning_rate: 0.000008 2025-04-12 13:26:06,039 - INFO - Epoch: 0.28 | loss: 10.033800 | grad_norm: 451.849701 | learning_rate: 0.000008 2025-04-12 13:26:11,425 - INFO - Epoch: 0.28 | loss: 10.052900 | grad_norm: 202.455109 | learning_rate: 0.000008 2025-04-12 13:26:16,284 - INFO - Epoch: 0.28 | loss: 10.048600 | grad_norm: 90.038460 | learning_rate: 0.000008 2025-04-12 13:26:21,138 - INFO - Epoch: 0.28 | loss: 8.175400 | grad_norm: 319.227448 | learning_rate: 0.000008 2025-04-12 13:26:26,026 - INFO - Epoch: 0.28 | loss: 7.857400 | grad_norm: 244.422607 | learning_rate: 0.000008 2025-04-12 13:26:30,796 - INFO - Epoch: 0.28 | loss: 8.614600 | grad_norm: 159.747742 | learning_rate: 0.000008 2025-04-12 13:26:35,539 - INFO - Epoch: 0.28 | loss: 7.217000 | grad_norm: 100.862137 | learning_rate: 0.000008 2025-04-12 13:26:40,717 - INFO - Epoch: 0.29 | loss: 12.090500 | grad_norm: 76.987381 | learning_rate: 0.000008 2025-04-12 13:26:46,390 - INFO - Epoch: 0.29 | loss: 11.955300 | grad_norm: 212.438522 | learning_rate: 0.000008 2025-04-12 13:26:51,842 - INFO - Epoch: 0.29 | loss: 11.392700 | grad_norm: 110.264908 | learning_rate: 0.000008 2025-04-12 13:26:57,064 - INFO - Epoch: 0.29 | loss: 10.763500 | grad_norm: 162.125595 | learning_rate: 0.000008 2025-04-12 13:27:02,124 - INFO - Epoch: 0.29 | loss: 8.373400 | grad_norm: 126.838699 | learning_rate: 0.000008 2025-04-12 13:27:07,225 - INFO - Epoch: 0.29 | loss: 8.174600 | grad_norm: 113.727081 | learning_rate: 0.000008 2025-04-12 13:27:12,201 - INFO - Epoch: 0.29 | loss: 9.299900 | grad_norm: 147.281601 | learning_rate: 0.000008 2025-04-12 13:27:17,512 - INFO - Epoch: 0.29 | loss: 10.503900 | grad_norm: 401.852814 | learning_rate: 0.000008 2025-04-12 13:27:22,792 - INFO - Epoch: 0.29 | loss: 8.194800 | grad_norm: 154.444260 | learning_rate: 0.000008 2025-04-12 13:27:27,446 - INFO - Epoch: 0.29 | loss: 9.188200 | grad_norm: 175.884293 | learning_rate: 0.000008 2025-04-12 13:27:32,261 - INFO - Epoch: 0.30 | loss: 9.333600 | grad_norm: 66.942772 | learning_rate: 0.000008 2025-04-12 13:27:36,997 - INFO - Epoch: 0.30 | loss: 9.196500 | grad_norm: 97.349648 | learning_rate: 0.000008 2025-04-12 13:27:41,744 - INFO - Epoch: 0.30 | loss: 8.980600 | grad_norm: 155.899521 | learning_rate: 0.000008 2025-04-12 13:27:46,992 - INFO - Epoch: 0.30 | loss: 10.261700 | grad_norm: 311.519012 | learning_rate: 0.000008 2025-04-12 13:27:51,764 - INFO - Epoch: 0.30 | loss: 8.733800 | grad_norm: 231.642136 | learning_rate: 0.000008 2025-04-12 13:27:56,773 - INFO - Epoch: 0.30 | loss: 6.817700 | grad_norm: 97.979324 | learning_rate: 0.000008 2025-04-12 13:28:01,575 - INFO - Epoch: 0.30 | loss: 10.759800 | grad_norm: 133.407913 | learning_rate: 0.000008 2025-04-12 13:28:06,152 - INFO - Epoch: 0.30 | loss: 7.480200 | grad_norm: 84.582497 | learning_rate: 0.000008 2025-04-12 13:28:10,770 - INFO - Epoch: 0.30 | loss: 11.834400 | grad_norm: 149.337982 | learning_rate: 0.000008 2025-04-12 13:28:15,763 - INFO - Epoch: 0.31 | loss: 8.521600 | grad_norm: 95.241623 | learning_rate: 0.000008 2025-04-12 13:28:20,714 - INFO - Epoch: 0.31 | loss: 9.920000 | grad_norm: 113.446312 | learning_rate: 0.000008 2025-04-12 13:28:25,746 - INFO - Epoch: 0.31 | loss: 7.381100 | grad_norm: 87.384842 | learning_rate: 0.000008 2025-04-12 13:28:30,447 - INFO - Epoch: 0.31 | loss: 9.909100 | grad_norm: 199.202820 | learning_rate: 0.000008 2025-04-12 13:28:35,531 - INFO - Epoch: 0.31 | loss: 8.045000 | grad_norm: 118.463348 | learning_rate: 0.000008 2025-04-12 13:28:40,459 - INFO - Epoch: 0.31 | loss: 7.967100 | grad_norm: 259.852692 | learning_rate: 0.000008 2025-04-12 13:28:45,398 - INFO - Epoch: 0.31 | loss: 9.713800 | grad_norm: 164.521454 | learning_rate: 0.000008 2025-04-12 13:28:50,228 - INFO - Epoch: 0.31 | loss: 6.237700 | grad_norm: 136.256454 | learning_rate: 0.000008 2025-04-12 13:28:55,021 - INFO - Epoch: 0.31 | loss: 7.804700 | grad_norm: 155.859436 | learning_rate: 0.000008 2025-04-12 13:28:59,943 - INFO - Epoch: 0.31 | loss: 9.355900 | grad_norm: 157.318329 | learning_rate: 0.000008 2025-04-12 13:29:04,746 - INFO - Epoch: 0.32 | loss: 8.048700 | grad_norm: 199.498703 | learning_rate: 0.000008 2025-04-12 13:29:09,813 - INFO - Epoch: 0.32 | loss: 8.833600 | grad_norm: 114.733986 | learning_rate: 0.000008 2025-04-12 13:29:14,966 - INFO - Epoch: 0.32 | loss: 7.493800 | grad_norm: 236.342178 | learning_rate: 0.000008 2025-04-12 13:29:19,613 - INFO - Epoch: 0.32 | loss: 6.883200 | grad_norm: 145.669434 | learning_rate: 0.000008 2025-04-12 13:29:24,547 - INFO - Epoch: 0.32 | loss: 7.942200 | grad_norm: 160.875854 | learning_rate: 0.000008 2025-04-12 13:29:29,720 - INFO - Epoch: 0.32 | loss: 10.404800 | grad_norm: 155.525879 | learning_rate: 0.000008 2025-04-12 13:29:34,975 - INFO - Epoch: 0.32 | loss: 9.753500 | grad_norm: 165.553604 | learning_rate: 0.000008 2025-04-12 13:29:40,257 - INFO - Epoch: 0.32 | loss: 9.564400 | grad_norm: 122.907555 | learning_rate: 0.000008 2025-04-12 13:29:45,371 - INFO - Epoch: 0.32 | loss: 7.738400 | grad_norm: 204.448212 | learning_rate: 0.000008 2025-04-12 13:29:50,249 - INFO - Epoch: 0.32 | loss: 8.447300 | grad_norm: 81.392868 | learning_rate: 0.000008 2025-04-12 13:29:55,236 - INFO - Epoch: 0.33 | loss: 7.997400 | grad_norm: 140.305038 | learning_rate: 0.000007 2025-04-12 13:30:00,099 - INFO - Epoch: 0.33 | loss: 6.283800 | grad_norm: 275.862244 | learning_rate: 0.000007 2025-04-12 13:30:05,307 - INFO - Epoch: 0.33 | loss: 8.655200 | grad_norm: 130.025330 | learning_rate: 0.000007 2025-04-12 13:30:10,044 - INFO - Epoch: 0.33 | loss: 7.753900 | grad_norm: 132.457733 | learning_rate: 0.000007 2025-04-12 13:30:15,342 - INFO - Epoch: 0.33 | loss: 8.460300 | grad_norm: 266.920258 | learning_rate: 0.000007 2025-04-12 13:30:20,367 - INFO - Epoch: 0.33 | loss: 9.049000 | grad_norm: 114.969643 | learning_rate: 0.000007 2025-04-12 13:30:25,659 - INFO - Epoch: 0.33 | loss: 10.254100 | grad_norm: 106.408005 | learning_rate: 0.000007 2025-04-12 13:30:30,376 - INFO - Epoch: 0.33 | loss: 8.346000 | grad_norm: 108.206200 | learning_rate: 0.000007 2025-04-12 13:30:35,069 - INFO - Epoch: 0.33 | loss: 8.231300 | grad_norm: 131.103180 | learning_rate: 0.000007 2025-04-12 13:30:39,783 - INFO - Epoch: 0.33 | loss: 10.749100 | grad_norm: 113.831909 | learning_rate: 0.000007 2025-04-12 13:30:44,568 - INFO - Epoch: 0.34 | loss: 7.503100 | grad_norm: 103.340950 | learning_rate: 0.000007 2025-04-12 13:30:49,365 - INFO - Epoch: 0.34 | loss: 8.801300 | grad_norm: 223.409943 | learning_rate: 0.000007 2025-04-12 13:30:54,157 - INFO - Epoch: 0.34 | loss: 6.810100 | grad_norm: 91.375427 | learning_rate: 0.000007 2025-04-12 13:30:59,603 - INFO - Epoch: 0.34 | loss: 9.922600 | grad_norm: 170.596329 | learning_rate: 0.000007 2025-04-12 13:31:04,104 - INFO - Epoch: 0.34 | loss: 8.165400 | grad_norm: 108.533684 | learning_rate: 0.000007 2025-04-12 13:31:09,153 - INFO - Epoch: 0.34 | loss: 7.652800 | grad_norm: 255.924637 | learning_rate: 0.000007 2025-04-12 13:31:13,760 - INFO - Epoch: 0.34 | loss: 7.386300 | grad_norm: 138.623032 | learning_rate: 0.000007 2025-04-12 13:31:18,458 - INFO - Epoch: 0.34 | loss: 8.419400 | grad_norm: 100.161903 | learning_rate: 0.000007 2025-04-12 13:31:23,325 - INFO - Epoch: 0.34 | loss: 8.570500 | grad_norm: 155.046677 | learning_rate: 0.000007 2025-04-12 13:31:28,052 - INFO - Epoch: 0.35 | loss: 7.805900 | grad_norm: 65.614334 | learning_rate: 0.000007 2025-04-12 13:31:32,813 - INFO - Epoch: 0.35 | loss: 9.009300 | grad_norm: 219.380600 | learning_rate: 0.000007 2025-04-12 13:31:37,538 - INFO - Epoch: 0.35 | loss: 10.551100 | grad_norm: 197.129715 | learning_rate: 0.000007 2025-04-12 13:31:42,356 - INFO - Epoch: 0.35 | loss: 7.803100 | grad_norm: 112.920868 | learning_rate: 0.000007 2025-04-12 13:31:47,706 - INFO - Epoch: 0.35 | loss: 8.262100 | grad_norm: 207.525146 | learning_rate: 0.000007 2025-04-12 13:31:52,601 - INFO - Epoch: 0.35 | loss: 9.259800 | grad_norm: 60.077244 | learning_rate: 0.000007 2025-04-12 13:31:57,789 - INFO - Epoch: 0.35 | loss: 8.917400 | grad_norm: 160.910568 | learning_rate: 0.000007 2025-04-12 13:32:02,523 - INFO - Epoch: 0.35 | loss: 7.165200 | grad_norm: 150.195282 | learning_rate: 0.000007 2025-04-12 13:32:07,489 - INFO - Epoch: 0.35 | loss: 7.498200 | grad_norm: 124.041786 | learning_rate: 0.000007 2025-04-12 13:32:12,407 - INFO - Epoch: 0.35 | loss: 7.830900 | grad_norm: 122.095314 | learning_rate: 0.000007 2025-04-12 13:32:17,330 - INFO - Epoch: 0.36 | loss: 8.191100 | grad_norm: 297.491241 | learning_rate: 0.000007 2025-04-12 13:32:22,607 - INFO - Epoch: 0.36 | loss: 10.346700 | grad_norm: 227.337860 | learning_rate: 0.000007 2025-04-12 13:32:27,250 - INFO - Epoch: 0.36 | loss: 8.832500 | grad_norm: 125.353210 | learning_rate: 0.000007 2025-04-12 13:32:32,324 - INFO - Epoch: 0.36 | loss: 8.928400 | grad_norm: 139.710922 | learning_rate: 0.000007 2025-04-12 13:32:37,578 - INFO - Epoch: 0.36 | loss: 10.714600 | grad_norm: 214.902267 | learning_rate: 0.000007 2025-04-12 13:32:42,592 - INFO - Epoch: 0.36 | loss: 8.664500 | grad_norm: 110.046028 | learning_rate: 0.000007 2025-04-12 13:32:47,717 - INFO - Epoch: 0.36 | loss: 7.402300 | grad_norm: 137.443176 | learning_rate: 0.000007 2025-04-12 13:32:52,566 - INFO - Epoch: 0.36 | loss: 6.671600 | grad_norm: 158.042404 | learning_rate: 0.000007 2025-04-12 13:32:57,255 - INFO - Epoch: 0.36 | loss: 6.724300 | grad_norm: 95.921463 | learning_rate: 0.000007 2025-04-12 13:33:02,164 - INFO - Epoch: 0.36 | loss: 8.346500 | grad_norm: 75.325623 | learning_rate: 0.000007 2025-04-12 13:33:06,931 - INFO - Epoch: 0.37 | loss: 9.524400 | grad_norm: 281.148804 | learning_rate: 0.000007 2025-04-12 13:33:11,658 - INFO - Epoch: 0.37 | loss: 7.323800 | grad_norm: 81.329933 | learning_rate: 0.000007 2025-04-12 13:33:16,434 - INFO - Epoch: 0.37 | loss: 8.780200 | grad_norm: 190.293427 | learning_rate: 0.000007 2025-04-12 13:33:21,047 - INFO - Epoch: 0.37 | loss: 7.609700 | grad_norm: 152.426468 | learning_rate: 0.000007 2025-04-12 13:33:25,990 - INFO - Epoch: 0.37 | loss: 7.494200 | grad_norm: 116.296112 | learning_rate: 0.000007 2025-04-12 13:33:31,272 - INFO - Epoch: 0.37 | loss: 8.357700 | grad_norm: 157.266235 | learning_rate: 0.000007 2025-04-12 13:33:36,821 - INFO - Epoch: 0.37 | loss: 8.131100 | grad_norm: 132.316711 | learning_rate: 0.000007 2025-04-12 13:33:41,872 - INFO - Epoch: 0.37 | loss: 10.969300 | grad_norm: 181.673172 | learning_rate: 0.000007 2025-04-12 13:33:46,594 - INFO - Epoch: 0.37 | loss: 6.281400 | grad_norm: 84.058815 | learning_rate: 0.000007 2025-04-12 13:33:51,453 - INFO - Epoch: 0.38 | loss: 6.039000 | grad_norm: 58.173431 | learning_rate: 0.000007 2025-04-12 13:33:56,278 - INFO - Epoch: 0.38 | loss: 6.002100 | grad_norm: 83.736008 | learning_rate: 0.000007 2025-04-12 13:34:01,309 - INFO - Epoch: 0.38 | loss: 9.000600 | grad_norm: 455.049744 | learning_rate: 0.000007 2025-04-12 13:34:06,172 - INFO - Epoch: 0.38 | loss: 8.971500 | grad_norm: 158.415649 | learning_rate: 0.000007 2025-04-12 13:34:10,895 - INFO - Epoch: 0.38 | loss: 7.897000 | grad_norm: 161.563049 | learning_rate: 0.000007 2025-04-12 13:34:15,536 - INFO - Epoch: 0.38 | loss: 8.316500 | grad_norm: 314.269592 | learning_rate: 0.000007 2025-04-12 13:34:20,379 - INFO - Epoch: 0.38 | loss: 6.779400 | grad_norm: 141.975174 | learning_rate: 0.000007 2025-04-12 13:34:25,353 - INFO - Epoch: 0.38 | loss: 9.912900 | grad_norm: 130.441040 | learning_rate: 0.000007 2025-04-12 13:34:30,070 - INFO - Epoch: 0.38 | loss: 12.186500 | grad_norm: 572.474670 | learning_rate: 0.000007 2025-04-12 13:34:35,169 - INFO - Epoch: 0.38 | loss: 9.265200 | grad_norm: 161.524994 | learning_rate: 0.000007 2025-04-12 13:34:39,903 - INFO - Epoch: 0.39 | loss: 7.861200 | grad_norm: 121.118675 | learning_rate: 0.000007 2025-04-12 13:34:44,658 - INFO - Epoch: 0.39 | loss: 6.921000 | grad_norm: 169.739273 | learning_rate: 0.000007 2025-04-12 13:34:49,531 - INFO - Epoch: 0.39 | loss: 7.666700 | grad_norm: 112.471092 | learning_rate: 0.000007 2025-04-12 13:34:54,514 - INFO - Epoch: 0.39 | loss: 6.859700 | grad_norm: 180.014114 | learning_rate: 0.000007 2025-04-12 13:34:59,301 - INFO - Epoch: 0.39 | loss: 7.164100 | grad_norm: 150.080276 | learning_rate: 0.000007 2025-04-12 13:35:04,650 - INFO - Epoch: 0.39 | loss: 8.703800 | grad_norm: 120.537666 | learning_rate: 0.000007 2025-04-12 13:35:09,387 - INFO - Epoch: 0.39 | loss: 8.521700 | grad_norm: 103.473137 | learning_rate: 0.000007 2025-04-12 13:35:14,676 - INFO - Epoch: 0.39 | loss: 8.940700 | grad_norm: 193.226318 | learning_rate: 0.000007 2025-04-12 13:35:20,275 - INFO - Epoch: 0.39 | loss: 9.262300 | grad_norm: 309.417877 | learning_rate: 0.000007 2025-04-12 13:35:25,330 - INFO - Epoch: 0.39 | loss: 6.159400 | grad_norm: 114.360336 | learning_rate: 0.000007 2025-04-12 13:35:29,843 - INFO - Epoch: 0.40 | loss: 7.958400 | grad_norm: 116.887321 | learning_rate: 0.000007 2025-04-12 13:35:35,030 - INFO - Epoch: 0.40 | loss: 9.106200 | grad_norm: 143.438049 | learning_rate: 0.000007 2025-04-12 13:35:39,837 - INFO - Epoch: 0.40 | loss: 6.751300 | grad_norm: 97.496552 | learning_rate: 0.000007 2025-04-12 13:35:45,435 - INFO - Epoch: 0.40 | loss: 9.032500 | grad_norm: 213.936447 | learning_rate: 0.000007 2025-04-12 13:35:50,455 - INFO - Epoch: 0.40 | loss: 7.928800 | grad_norm: 165.937943 | learning_rate: 0.000007 2025-04-12 13:35:55,332 - INFO - Epoch: 0.40 | loss: 7.658800 | grad_norm: 126.284576 | learning_rate: 0.000007 2025-04-12 13:36:00,223 - INFO - Epoch: 0.40 | loss: 8.216900 | grad_norm: 518.530762 | learning_rate: 0.000007 2025-04-12 13:36:04,951 - INFO - Epoch: 0.40 | loss: 8.991100 | grad_norm: 294.959625 | learning_rate: 0.000007 2025-04-12 13:36:09,861 - INFO - Epoch: 0.40 | loss: 7.395800 | grad_norm: 49.526184 | learning_rate: 0.000007 2025-04-12 13:36:15,008 - INFO - Epoch: 0.40 | loss: 9.841900 | grad_norm: 330.640564 | learning_rate: 0.000007 2025-04-12 13:36:19,784 - INFO - Epoch: 0.41 | loss: 8.932300 | grad_norm: 77.139168 | learning_rate: 0.000007 2025-04-12 13:36:24,781 - INFO - Epoch: 0.41 | loss: 7.097000 | grad_norm: 121.446899 | learning_rate: 0.000007 2025-04-12 13:36:29,816 - INFO - Epoch: 0.41 | loss: 10.706300 | grad_norm: 211.232193 | learning_rate: 0.000007 2025-04-12 13:36:34,974 - INFO - Epoch: 0.41 | loss: 8.237900 | grad_norm: 106.728851 | learning_rate: 0.000007 2025-04-12 13:36:40,025 - INFO - Epoch: 0.41 | loss: 7.225600 | grad_norm: 57.294689 | learning_rate: 0.000007 2025-04-12 13:36:45,056 - INFO - Epoch: 0.41 | loss: 11.631100 | grad_norm: 146.799942 | learning_rate: 0.000007 2025-04-12 13:36:49,710 - INFO - Epoch: 0.41 | loss: 8.077700 | grad_norm: 212.525330 | learning_rate: 0.000007 2025-04-12 13:36:54,555 - INFO - Epoch: 0.41 | loss: 10.555600 | grad_norm: 163.351532 | learning_rate: 0.000007 2025-04-12 13:36:59,260 - INFO - Epoch: 0.41 | loss: 8.826600 | grad_norm: 344.795288 | learning_rate: 0.000007 2025-04-12 13:37:03,937 - INFO - Epoch: 0.42 | loss: 6.949200 | grad_norm: 164.750809 | learning_rate: 0.000007 2025-04-12 13:37:09,211 - INFO - Epoch: 0.42 | loss: 8.337300 | grad_norm: 180.926941 | learning_rate: 0.000006 2025-04-12 13:37:14,368 - INFO - Epoch: 0.42 | loss: 8.008900 | grad_norm: 77.195847 | learning_rate: 0.000006 2025-04-12 13:37:19,929 - INFO - Epoch: 0.42 | loss: 9.749600 | grad_norm: 142.585220 | learning_rate: 0.000006 2025-04-12 13:37:24,594 - INFO - Epoch: 0.42 | loss: 9.650800 | grad_norm: 99.062943 | learning_rate: 0.000006 2025-04-12 13:37:29,422 - INFO - Epoch: 0.42 | loss: 8.458100 | grad_norm: 271.533234 | learning_rate: 0.000006 2025-04-12 13:37:35,145 - INFO - Epoch: 0.42 | loss: 9.072000 | grad_norm: 90.749290 | learning_rate: 0.000006 2025-04-12 13:37:40,475 - INFO - Epoch: 0.42 | loss: 7.423200 | grad_norm: 151.150238 | learning_rate: 0.000006 2025-04-12 13:37:46,854 - INFO - Epoch: 0.42 | loss: 5.673100 | grad_norm: 207.876770 | learning_rate: 0.000006 2025-04-12 13:37:51,222 - INFO - Epoch: 0.42 | loss: 7.593000 | grad_norm: 97.659393 | learning_rate: 0.000006 2025-04-12 13:37:56,129 - INFO - Epoch: 0.43 | loss: 7.284700 | grad_norm: 258.810394 | learning_rate: 0.000006 2025-04-12 13:38:01,492 - INFO - Epoch: 0.43 | loss: 9.367300 | grad_norm: 138.195587 | learning_rate: 0.000006 2025-04-12 13:38:06,479 - INFO - Epoch: 0.43 | loss: 5.863500 | grad_norm: 124.952423 | learning_rate: 0.000006 2025-04-12 13:38:11,115 - INFO - Epoch: 0.43 | loss: 10.339700 | grad_norm: 192.030106 | learning_rate: 0.000006 2025-04-12 13:38:15,862 - INFO - Epoch: 0.43 | loss: 7.182000 | grad_norm: 104.064697 | learning_rate: 0.000006 2025-04-12 13:38:20,672 - INFO - Epoch: 0.43 | loss: 7.436300 | grad_norm: 93.847656 | learning_rate: 0.000006 2025-04-12 13:38:25,707 - INFO - Epoch: 0.43 | loss: 8.499200 | grad_norm: 100.826294 | learning_rate: 0.000006 2025-04-12 13:38:30,611 - INFO - Epoch: 0.43 | loss: 9.393300 | grad_norm: 84.619492 | learning_rate: 0.000006 2025-04-12 13:38:35,303 - INFO - Epoch: 0.43 | loss: 7.860700 | grad_norm: 106.457756 | learning_rate: 0.000006 2025-04-12 13:38:40,446 - INFO - Epoch: 0.43 | loss: 7.516900 | grad_norm: 143.487488 | learning_rate: 0.000006 2025-04-12 13:38:44,859 - INFO - Epoch: 0.44 | loss: 8.483600 | grad_norm: 124.585121 | learning_rate: 0.000006 2025-04-12 13:38:50,196 - INFO - Epoch: 0.44 | loss: 6.883400 | grad_norm: 72.187675 | learning_rate: 0.000006 2025-04-12 13:38:55,385 - INFO - Epoch: 0.44 | loss: 7.592600 | grad_norm: 234.674515 | learning_rate: 0.000006 2025-04-12 13:39:00,511 - INFO - Epoch: 0.44 | loss: 9.263800 | grad_norm: 294.429535 | learning_rate: 0.000006 2025-04-12 13:39:05,987 - INFO - Epoch: 0.44 | loss: 8.529500 | grad_norm: 143.076782 | learning_rate: 0.000006 2025-04-12 13:39:11,388 - INFO - Epoch: 0.44 | loss: 10.623300 | grad_norm: 144.280960 | learning_rate: 0.000006 2025-04-12 13:39:16,605 - INFO - Epoch: 0.44 | loss: 6.293600 | grad_norm: 88.807236 | learning_rate: 0.000006 2025-04-12 13:39:21,530 - INFO - Epoch: 0.44 | loss: 7.601400 | grad_norm: 84.619766 | learning_rate: 0.000006 2025-04-12 13:39:26,408 - INFO - Epoch: 0.44 | loss: 7.190500 | grad_norm: 92.842819 | learning_rate: 0.000006 2025-04-12 13:39:31,568 - INFO - Epoch: 0.44 | loss: 8.565100 | grad_norm: 200.899185 | learning_rate: 0.000006 2025-04-12 13:39:36,173 - INFO - Epoch: 0.45 | loss: 7.659600 | grad_norm: 157.994843 | learning_rate: 0.000006 2025-04-12 13:39:41,433 - INFO - Epoch: 0.45 | loss: 7.576600 | grad_norm: 173.751617 | learning_rate: 0.000006 2025-04-12 13:39:46,753 - INFO - Epoch: 0.45 | loss: 8.039900 | grad_norm: 131.868942 | learning_rate: 0.000006 2025-04-12 13:39:51,848 - INFO - Epoch: 0.45 | loss: 8.383800 | grad_norm: 171.757034 | learning_rate: 0.000006 2025-04-12 13:39:56,535 - INFO - Epoch: 0.45 | loss: 8.941300 | grad_norm: 192.129898 | learning_rate: 0.000006 2025-04-12 13:40:01,146 - INFO - Epoch: 0.45 | loss: 7.850700 | grad_norm: 51.708397 | learning_rate: 0.000006 2025-04-12 13:40:06,170 - INFO - Epoch: 0.45 | loss: 7.010700 | grad_norm: 64.016922 | learning_rate: 0.000006 2025-04-12 13:40:11,127 - INFO - Epoch: 0.45 | loss: 9.452700 | grad_norm: 166.865936 | learning_rate: 0.000006 2025-04-12 13:40:16,257 - INFO - Epoch: 0.45 | loss: 8.865500 | grad_norm: 132.596375 | learning_rate: 0.000006 2025-04-12 13:40:20,626 - INFO - Epoch: 0.46 | loss: 7.701300 | grad_norm: 132.177017 | learning_rate: 0.000006 2025-04-12 13:40:25,807 - INFO - Epoch: 0.46 | loss: 7.938700 | grad_norm: 70.278664 | learning_rate: 0.000006 2025-04-12 13:40:30,770 - INFO - Epoch: 0.46 | loss: 7.413200 | grad_norm: 103.444145 | learning_rate: 0.000006 2025-04-12 13:40:35,575 - INFO - Epoch: 0.46 | loss: 7.693300 | grad_norm: 407.049957 | learning_rate: 0.000006 2025-04-12 13:40:40,409 - INFO - Epoch: 0.46 | loss: 9.162500 | grad_norm: 127.403603 | learning_rate: 0.000006 2025-04-12 13:40:45,019 - INFO - Epoch: 0.46 | loss: 6.434100 | grad_norm: 126.068512 | learning_rate: 0.000006 2025-04-12 13:40:50,121 - INFO - Epoch: 0.46 | loss: 8.506900 | grad_norm: 223.444290 | learning_rate: 0.000006 2025-04-12 13:40:55,204 - INFO - Epoch: 0.46 | loss: 6.919600 | grad_norm: 153.865784 | learning_rate: 0.000006 2025-04-12 13:41:00,270 - INFO - Epoch: 0.46 | loss: 7.163600 | grad_norm: 89.109810 | learning_rate: 0.000006 2025-04-12 13:41:04,706 - INFO - Epoch: 0.46 | loss: 7.259200 | grad_norm: 129.827301 | learning_rate: 0.000006 2025-04-12 13:41:09,621 - INFO - Epoch: 0.47 | loss: 7.312400 | grad_norm: 97.635315 | learning_rate: 0.000006 2025-04-12 13:41:14,727 - INFO - Epoch: 0.47 | loss: 7.282400 | grad_norm: 138.000473 | learning_rate: 0.000006 2025-04-12 13:41:19,890 - INFO - Epoch: 0.47 | loss: 9.784100 | grad_norm: 98.835609 | learning_rate: 0.000006 2025-04-12 13:41:25,097 - INFO - Epoch: 0.47 | loss: 7.973500 | grad_norm: 56.816101 | learning_rate: 0.000006 2025-04-12 13:41:30,156 - INFO - Epoch: 0.47 | loss: 9.304800 | grad_norm: 152.235275 | learning_rate: 0.000006 2025-04-12 13:41:34,840 - INFO - Epoch: 0.47 | loss: 6.754000 | grad_norm: 41.702042 | learning_rate: 0.000006 2025-04-12 13:41:39,785 - INFO - Epoch: 0.47 | loss: 7.646600 | grad_norm: 225.541794 | learning_rate: 0.000006 2025-04-12 13:41:45,118 - INFO - Epoch: 0.47 | loss: 8.970100 | grad_norm: 209.542953 | learning_rate: 0.000006 2025-04-12 13:41:50,102 - INFO - Epoch: 0.47 | loss: 7.803900 | grad_norm: 48.810986 | learning_rate: 0.000006 2025-04-12 13:41:54,890 - INFO - Epoch: 0.47 | loss: 7.857800 | grad_norm: 46.498169 | learning_rate: 0.000006 2025-04-12 13:41:59,661 - INFO - Epoch: 0.48 | loss: 7.448800 | grad_norm: 159.853485 | learning_rate: 0.000006 2025-04-12 13:42:04,543 - INFO - Epoch: 0.48 | loss: 7.102300 | grad_norm: 95.813751 | learning_rate: 0.000006 2025-04-12 13:42:09,496 - INFO - Epoch: 0.48 | loss: 7.899300 | grad_norm: 41.483932 | learning_rate: 0.000006 2025-04-12 13:42:14,676 - INFO - Epoch: 0.48 | loss: 9.229100 | grad_norm: 116.116730 | learning_rate: 0.000006 2025-04-12 13:42:19,225 - INFO - Epoch: 0.48 | loss: 6.719100 | grad_norm: 182.991989 | learning_rate: 0.000006 2025-04-12 13:42:24,193 - INFO - Epoch: 0.48 | loss: 7.317800 | grad_norm: 92.644432 | learning_rate: 0.000006 2025-04-12 13:42:29,015 - INFO - Epoch: 0.48 | loss: 7.665800 | grad_norm: 351.289124 | learning_rate: 0.000006 2025-04-12 13:42:33,926 - INFO - Epoch: 0.48 | loss: 7.104700 | grad_norm: 167.842392 | learning_rate: 0.000006 2025-04-12 13:42:39,066 - INFO - Epoch: 0.48 | loss: 10.203000 | grad_norm: 160.379807 | learning_rate: 0.000006 2025-04-12 13:42:43,970 - INFO - Epoch: 0.48 | loss: 8.322500 | grad_norm: 155.469528 | learning_rate: 0.000006 2025-04-12 13:42:48,690 - INFO - Epoch: 0.49 | loss: 7.892300 | grad_norm: 139.079544 | learning_rate: 0.000006 2025-04-12 13:42:53,716 - INFO - Epoch: 0.49 | loss: 8.819400 | grad_norm: 69.909485 | learning_rate: 0.000006 2025-04-12 13:42:58,653 - INFO - Epoch: 0.49 | loss: 7.196100 | grad_norm: 115.654968 | learning_rate: 0.000006 2025-04-12 13:43:03,956 - INFO - Epoch: 0.49 | loss: 8.866300 | grad_norm: 112.508148 | learning_rate: 0.000006 2025-04-12 13:43:09,536 - INFO - Epoch: 0.49 | loss: 9.324000 | grad_norm: 302.271729 | learning_rate: 0.000006 2025-04-12 13:43:14,775 - INFO - Epoch: 0.49 | loss: 7.636800 | grad_norm: 220.137283 | learning_rate: 0.000006 2025-04-12 13:43:19,739 - INFO - Epoch: 0.49 | loss: 9.692200 | grad_norm: 173.310822 | learning_rate: 0.000006 2025-04-12 13:43:24,759 - INFO - Epoch: 0.49 | loss: 8.536800 | grad_norm: 181.791168 | learning_rate: 0.000006 2025-04-12 13:43:29,708 - INFO - Epoch: 0.49 | loss: 8.338500 | grad_norm: 62.224159 | learning_rate: 0.000006 2025-04-12 13:43:34,697 - INFO - Epoch: 0.50 | loss: 8.836300 | grad_norm: 225.584930 | learning_rate: 0.000006 2025-04-12 13:43:39,764 - INFO - Epoch: 0.50 | loss: 7.542000 | grad_norm: 277.506714 | learning_rate: 0.000006 2025-04-12 13:43:44,972 - INFO - Epoch: 0.50 | loss: 7.396600 | grad_norm: 174.159515 | learning_rate: 0.000006 2025-04-12 13:43:49,569 - INFO - Epoch: 0.50 | loss: 6.533100 | grad_norm: 118.376633 | learning_rate: 0.000006 2025-04-12 13:43:54,472 - INFO - Epoch: 0.50 | loss: 8.371200 | grad_norm: 198.957001 | learning_rate: 0.000006 2025-04-12 13:43:59,303 - INFO - Epoch: 0.50 | loss: 9.251400 | grad_norm: 254.582474 | learning_rate: 0.000006 2025-04-12 13:44:04,127 - INFO - Epoch: 0.50 | loss: 9.400300 | grad_norm: 125.777313 | learning_rate: 0.000006 2025-04-12 13:44:08,750 - INFO - Epoch: 0.50 | loss: 8.938900 | grad_norm: 103.183762 | learning_rate: 0.000006 2025-04-12 13:44:13,552 - INFO - Epoch: 0.50 | loss: 7.430500 | grad_norm: 148.381851 | learning_rate: 0.000006 2025-04-12 13:44:18,338 - INFO - Epoch: 0.50 | loss: 8.725100 | grad_norm: 158.095673 | learning_rate: 0.000006 2025-04-12 13:44:23,752 - INFO - Epoch: 0.51 | loss: 9.614200 | grad_norm: 177.594986 | learning_rate: 0.000005 2025-04-12 13:44:28,808 - INFO - Epoch: 0.51 | loss: 7.116000 | grad_norm: 112.111923 | learning_rate: 0.000005 2025-04-12 13:44:33,628 - INFO - Epoch: 0.51 | loss: 6.482400 | grad_norm: 177.981125 | learning_rate: 0.000005 2025-04-12 13:44:38,504 - INFO - Epoch: 0.51 | loss: 6.111700 | grad_norm: 262.447205 | learning_rate: 0.000005 2025-04-12 13:44:43,281 - INFO - Epoch: 0.51 | loss: 8.997200 | grad_norm: 134.729691 | learning_rate: 0.000005 2025-04-12 13:44:47,718 - INFO - Epoch: 0.51 | loss: 7.945200 | grad_norm: 163.222702 | learning_rate: 0.000005 2025-04-12 13:44:52,701 - INFO - Epoch: 0.51 | loss: 8.464700 | grad_norm: 230.417709 | learning_rate: 0.000005 2025-04-12 13:44:57,831 - INFO - Epoch: 0.51 | loss: 9.652000 | grad_norm: 89.003738 | learning_rate: 0.000005 2025-04-12 13:45:02,763 - INFO - Epoch: 0.51 | loss: 8.549600 | grad_norm: 280.576172 | learning_rate: 0.000005 2025-04-12 13:45:07,775 - INFO - Epoch: 0.51 | loss: 7.892400 | grad_norm: 180.792831 | learning_rate: 0.000005 2025-04-12 13:45:12,790 - INFO - Epoch: 0.52 | loss: 7.503600 | grad_norm: 68.137115 | learning_rate: 0.000005 2025-04-12 13:45:17,705 - INFO - Epoch: 0.52 | loss: 6.796300 | grad_norm: 116.107613 | learning_rate: 0.000005 2025-04-12 13:45:22,686 - INFO - Epoch: 0.52 | loss: 7.161800 | grad_norm: 107.911064 | learning_rate: 0.000005 2025-04-12 13:45:27,362 - INFO - Epoch: 0.52 | loss: 8.856800 | grad_norm: 76.528519 | learning_rate: 0.000005 2025-04-12 13:45:32,156 - INFO - Epoch: 0.52 | loss: 7.461400 | grad_norm: 205.211060 | learning_rate: 0.000005 2025-04-12 13:45:37,190 - INFO - Epoch: 0.52 | loss: 8.462300 | grad_norm: 325.559143 | learning_rate: 0.000005 2025-04-12 13:45:41,957 - INFO - Epoch: 0.52 | loss: 7.675800 | grad_norm: 83.635986 | learning_rate: 0.000005 2025-04-12 13:45:47,395 - INFO - Epoch: 0.52 | loss: 8.209400 | grad_norm: 126.361099 | learning_rate: 0.000005 2025-04-12 13:45:52,384 - INFO - Epoch: 0.52 | loss: 9.400100 | grad_norm: 317.282074 | learning_rate: 0.000005 2025-04-12 13:45:57,021 - INFO - Epoch: 0.53 | loss: 6.387200 | grad_norm: 283.372864 | learning_rate: 0.000005 2025-04-12 13:46:02,368 - INFO - Epoch: 0.53 | loss: 9.287400 | grad_norm: 128.985123 | learning_rate: 0.000005 2025-04-12 13:46:07,319 - INFO - Epoch: 0.53 | loss: 7.091400 | grad_norm: 123.476173 | learning_rate: 0.000005 2025-04-12 13:46:12,183 - INFO - Epoch: 0.53 | loss: 7.604600 | grad_norm: 85.736046 | learning_rate: 0.000005 2025-04-12 13:46:17,026 - INFO - Epoch: 0.53 | loss: 8.701000 | grad_norm: 162.271790 | learning_rate: 0.000005 2025-04-12 13:46:21,520 - INFO - Epoch: 0.53 | loss: 5.812700 | grad_norm: 53.534756 | learning_rate: 0.000005 2025-04-12 13:46:26,475 - INFO - Epoch: 0.53 | loss: 8.412400 | grad_norm: 166.046600 | learning_rate: 0.000005 2025-04-12 13:46:31,335 - INFO - Epoch: 0.53 | loss: 6.697500 | grad_norm: 55.455887 | learning_rate: 0.000005 2025-04-12 13:46:36,260 - INFO - Epoch: 0.53 | loss: 6.096100 | grad_norm: 95.616577 | learning_rate: 0.000005 2025-04-12 13:46:40,999 - INFO - Epoch: 0.53 | loss: 8.062000 | grad_norm: 140.035583 | learning_rate: 0.000005 2025-04-12 13:46:45,821 - INFO - Epoch: 0.54 | loss: 8.104100 | grad_norm: 257.818085 | learning_rate: 0.000005 2025-04-12 13:46:51,058 - INFO - Epoch: 0.54 | loss: 9.152500 | grad_norm: 167.507217 | learning_rate: 0.000005 2025-04-12 13:46:55,491 - INFO - Epoch: 0.54 | loss: 6.877100 | grad_norm: 78.398041 | learning_rate: 0.000005 2025-04-12 13:47:00,393 - INFO - Epoch: 0.54 | loss: 7.695300 | grad_norm: 97.415504 | learning_rate: 0.000005 2025-04-12 13:47:05,309 - INFO - Epoch: 0.54 | loss: 6.922300 | grad_norm: 66.278328 | learning_rate: 0.000005 2025-04-12 13:47:10,602 - INFO - Epoch: 0.54 | loss: 7.515800 | grad_norm: 86.509857 | learning_rate: 0.000005 2025-04-12 13:47:15,208 - INFO - Epoch: 0.54 | loss: 8.074100 | grad_norm: 182.236893 | learning_rate: 0.000005 2025-04-12 13:47:20,166 - INFO - Epoch: 0.54 | loss: 10.366300 | grad_norm: 120.404694 | learning_rate: 0.000005 2025-04-12 13:47:25,605 - INFO - Epoch: 0.54 | loss: 6.853700 | grad_norm: 144.487305 | learning_rate: 0.000005 2025-04-12 13:47:30,968 - INFO - Epoch: 0.54 | loss: 7.902300 | grad_norm: 206.099014 | learning_rate: 0.000005 2025-04-12 13:47:36,318 - INFO - Epoch: 0.55 | loss: 8.576800 | grad_norm: 119.464142 | learning_rate: 0.000005 2025-04-12 13:47:41,069 - INFO - Epoch: 0.55 | loss: 8.254900 | grad_norm: 152.263580 | learning_rate: 0.000005 2025-04-12 13:47:46,053 - INFO - Epoch: 0.55 | loss: 6.657300 | grad_norm: 157.291306 | learning_rate: 0.000005 2025-04-12 13:47:50,684 - INFO - Epoch: 0.55 | loss: 7.214500 | grad_norm: 174.099487 | learning_rate: 0.000005 2025-04-12 13:47:55,738 - INFO - Epoch: 0.55 | loss: 8.686400 | grad_norm: 339.972778 | learning_rate: 0.000005 2025-04-12 13:48:00,886 - INFO - Epoch: 0.55 | loss: 11.519800 | grad_norm: 161.255432 | learning_rate: 0.000005 2025-04-12 13:48:05,639 - INFO - Epoch: 0.55 | loss: 7.564500 | grad_norm: 129.013535 | learning_rate: 0.000005 2025-04-12 13:48:10,510 - INFO - Epoch: 0.55 | loss: 10.447200 | grad_norm: 102.767021 | learning_rate: 0.000005 2025-04-12 13:48:15,075 - INFO - Epoch: 0.55 | loss: 9.497600 | grad_norm: 219.315094 | learning_rate: 0.000005 2025-04-12 13:48:19,992 - INFO - Epoch: 0.55 | loss: 9.180600 | grad_norm: 172.283951 | learning_rate: 0.000005 2025-04-12 13:48:24,988 - INFO - Epoch: 0.56 | loss: 7.294500 | grad_norm: 121.391937 | learning_rate: 0.000005 2025-04-12 13:48:29,861 - INFO - Epoch: 0.56 | loss: 8.393700 | grad_norm: 248.229355 | learning_rate: 0.000005 2025-04-12 13:48:34,743 - INFO - Epoch: 0.56 | loss: 6.814000 | grad_norm: 147.798309 | learning_rate: 0.000005 2025-04-12 13:48:39,390 - INFO - Epoch: 0.56 | loss: 7.623700 | grad_norm: 68.236938 | learning_rate: 0.000005 2025-04-12 13:48:44,915 - INFO - Epoch: 0.56 | loss: 8.613900 | grad_norm: 89.190147 | learning_rate: 0.000005 2025-04-12 13:48:49,879 - INFO - Epoch: 0.56 | loss: 7.258800 | grad_norm: 167.422394 | learning_rate: 0.000005 2025-04-12 13:48:54,767 - INFO - Epoch: 0.56 | loss: 8.901300 | grad_norm: 231.189270 | learning_rate: 0.000005 2025-04-12 13:48:59,782 - INFO - Epoch: 0.56 | loss: 6.640800 | grad_norm: 192.616287 | learning_rate: 0.000005 2025-04-12 13:49:04,762 - INFO - Epoch: 0.56 | loss: 7.421100 | grad_norm: 111.127136 | learning_rate: 0.000005 2025-04-12 13:49:10,025 - INFO - Epoch: 0.57 | loss: 7.262200 | grad_norm: 104.028534 | learning_rate: 0.000005 2025-04-12 13:49:14,988 - INFO - Epoch: 0.57 | loss: 7.057800 | grad_norm: 49.147598 | learning_rate: 0.000005 2025-04-12 13:49:20,284 - INFO - Epoch: 0.57 | loss: 9.221300 | grad_norm: 141.987534 | learning_rate: 0.000005 2025-04-12 13:49:25,147 - INFO - Epoch: 0.57 | loss: 7.690000 | grad_norm: 103.728256 | learning_rate: 0.000005 2025-04-12 13:49:30,431 - INFO - Epoch: 0.57 | loss: 7.101000 | grad_norm: 106.187141 | learning_rate: 0.000005 2025-04-12 13:49:35,470 - INFO - Epoch: 0.57 | loss: 8.769000 | grad_norm: 174.615128 | learning_rate: 0.000005 2025-04-12 13:49:40,842 - INFO - Epoch: 0.57 | loss: 9.432200 | grad_norm: 207.362457 | learning_rate: 0.000005 2025-04-12 13:49:45,743 - INFO - Epoch: 0.57 | loss: 7.621800 | grad_norm: 86.343628 | learning_rate: 0.000005 2025-04-12 13:49:50,523 - INFO - Epoch: 0.57 | loss: 6.481800 | grad_norm: 107.234932 | learning_rate: 0.000005 2025-04-12 13:49:55,812 - INFO - Epoch: 0.57 | loss: 7.694400 | grad_norm: 116.890327 | learning_rate: 0.000005 2025-04-12 13:50:00,384 - INFO - Epoch: 0.58 | loss: 5.895100 | grad_norm: 65.710052 | learning_rate: 0.000005 2025-04-12 13:50:05,624 - INFO - Epoch: 0.58 | loss: 12.460800 | grad_norm: 89.822243 | learning_rate: 0.000005 2025-04-12 13:50:10,721 - INFO - Epoch: 0.58 | loss: 7.199900 | grad_norm: 111.166489 | learning_rate: 0.000005 2025-04-12 13:50:15,258 - INFO - Epoch: 0.58 | loss: 7.588900 | grad_norm: 162.689880 | learning_rate: 0.000005 2025-04-12 13:50:19,983 - INFO - Epoch: 0.58 | loss: 5.610400 | grad_norm: 105.708046 | learning_rate: 0.000005 2025-04-12 13:50:24,821 - INFO - Epoch: 0.58 | loss: 7.454100 | grad_norm: 100.356483 | learning_rate: 0.000005 2025-04-12 13:50:29,336 - INFO - Epoch: 0.58 | loss: 6.454200 | grad_norm: 91.124001 | learning_rate: 0.000005 2025-04-12 13:50:34,245 - INFO - Epoch: 0.58 | loss: 7.640900 | grad_norm: 137.535431 | learning_rate: 0.000005 2025-04-12 13:50:39,073 - INFO - Epoch: 0.58 | loss: 7.497800 | grad_norm: 190.791855 | learning_rate: 0.000005 2025-04-12 13:50:43,947 - INFO - Epoch: 0.58 | loss: 8.307000 | grad_norm: 98.720535 | learning_rate: 0.000005 2025-04-12 13:50:48,911 - INFO - Epoch: 0.59 | loss: 8.845200 | grad_norm: 139.203247 | learning_rate: 0.000005 2025-04-12 13:50:53,665 - INFO - Epoch: 0.59 | loss: 6.645500 | grad_norm: 93.454002 | learning_rate: 0.000005 2025-04-12 13:50:58,862 - INFO - Epoch: 0.59 | loss: 7.659700 | grad_norm: 191.378647 | learning_rate: 0.000005 2025-04-12 13:51:03,728 - INFO - Epoch: 0.59 | loss: 7.052600 | grad_norm: 185.916916 | learning_rate: 0.000005 2025-04-12 13:51:08,993 - INFO - Epoch: 0.59 | loss: 9.518100 | grad_norm: 56.658413 | learning_rate: 0.000005 2025-04-12 13:51:13,934 - INFO - Epoch: 0.59 | loss: 7.328700 | grad_norm: 106.576500 | learning_rate: 0.000005 2025-04-12 13:51:18,950 - INFO - Epoch: 0.59 | loss: 6.284300 | grad_norm: 170.159500 | learning_rate: 0.000005 2025-04-12 13:51:24,137 - INFO - Epoch: 0.59 | loss: 9.447300 | grad_norm: 88.487091 | learning_rate: 0.000005 2025-04-12 13:51:29,339 - INFO - Epoch: 0.59 | loss: 7.289500 | grad_norm: 79.191521 | learning_rate: 0.000005 2025-04-12 13:51:34,457 - INFO - Epoch: 0.59 | loss: 8.350300 | grad_norm: 195.030273 | learning_rate: 0.000005 2025-04-12 13:51:39,459 - INFO - Epoch: 0.60 | loss: 6.318500 | grad_norm: 69.272301 | learning_rate: 0.000004 2025-04-12 13:51:44,571 - INFO - Epoch: 0.60 | loss: 7.532400 | grad_norm: 85.979355 | learning_rate: 0.000004 2025-04-12 13:51:49,692 - INFO - Epoch: 0.60 | loss: 7.103500 | grad_norm: 199.659927 | learning_rate: 0.000004 2025-04-12 13:51:55,065 - INFO - Epoch: 0.60 | loss: 8.247900 | grad_norm: 100.831863 | learning_rate: 0.000004 2025-04-12 13:51:59,846 - INFO - Epoch: 0.60 | loss: 7.110200 | grad_norm: 299.240417 | learning_rate: 0.000004 2025-04-12 13:52:04,766 - INFO - Epoch: 0.60 | loss: 8.140600 | grad_norm: 134.745178 | learning_rate: 0.000004 2025-04-12 13:52:09,876 - INFO - Epoch: 0.60 | loss: 7.083100 | grad_norm: 98.126663 | learning_rate: 0.000004 2025-04-12 13:52:15,016 - INFO - Epoch: 0.60 | loss: 7.965800 | grad_norm: 171.845245 | learning_rate: 0.000004 2025-04-12 13:52:19,572 - INFO - Epoch: 0.60 | loss: 7.714400 | grad_norm: 150.890762 | learning_rate: 0.000004 2025-04-12 13:52:24,603 - INFO - Epoch: 0.61 | loss: 6.913400 | grad_norm: 99.013939 | learning_rate: 0.000004 2025-04-12 13:52:29,893 - INFO - Epoch: 0.61 | loss: 9.692500 | grad_norm: 73.970314 | learning_rate: 0.000004 2025-04-12 13:52:34,862 - INFO - Epoch: 0.61 | loss: 7.067900 | grad_norm: 271.184814 | learning_rate: 0.000004 2025-04-12 13:52:39,638 - INFO - Epoch: 0.61 | loss: 6.328300 | grad_norm: 131.369415 | learning_rate: 0.000004 2025-04-12 13:52:44,418 - INFO - Epoch: 0.61 | loss: 7.488300 | grad_norm: 125.505302 | learning_rate: 0.000004 2025-04-12 13:52:49,446 - INFO - Epoch: 0.61 | loss: 8.593600 | grad_norm: 188.890579 | learning_rate: 0.000004 2025-04-12 13:52:54,322 - INFO - Epoch: 0.61 | loss: 8.090900 | grad_norm: 67.619270 | learning_rate: 0.000004 2025-04-12 13:52:59,309 - INFO - Epoch: 0.61 | loss: 7.117700 | grad_norm: 169.958572 | learning_rate: 0.000004 2025-04-12 13:53:04,450 - INFO - Epoch: 0.61 | loss: 8.188500 | grad_norm: 83.204338 | learning_rate: 0.000004 2025-04-12 13:53:09,159 - INFO - Epoch: 0.61 | loss: 7.559600 | grad_norm: 173.614502 | learning_rate: 0.000004 2025-04-12 13:53:13,932 - INFO - Epoch: 0.62 | loss: 7.176200 | grad_norm: 115.113403 | learning_rate: 0.000004 2025-04-12 13:53:18,500 - INFO - Epoch: 0.62 | loss: 8.868700 | grad_norm: 175.524750 | learning_rate: 0.000004 2025-04-12 13:53:23,944 - INFO - Epoch: 0.62 | loss: 7.076000 | grad_norm: 144.940735 | learning_rate: 0.000004 2025-04-12 13:53:29,286 - INFO - Epoch: 0.62 | loss: 7.713200 | grad_norm: 191.035721 | learning_rate: 0.000004 2025-04-12 13:53:34,196 - INFO - Epoch: 0.62 | loss: 6.550100 | grad_norm: 255.066956 | learning_rate: 0.000004 2025-04-12 13:53:39,038 - INFO - Epoch: 0.62 | loss: 7.799100 | grad_norm: 372.429413 | learning_rate: 0.000004 2025-04-12 13:53:44,305 - INFO - Epoch: 0.62 | loss: 7.891100 | grad_norm: 171.134338 | learning_rate: 0.000004 2025-04-12 13:53:49,624 - INFO - Epoch: 0.62 | loss: 7.142000 | grad_norm: 154.658081 | learning_rate: 0.000004 2025-04-12 13:53:54,785 - INFO - Epoch: 0.62 | loss: 7.597000 | grad_norm: 150.300079 | learning_rate: 0.000004 2025-04-12 13:53:59,713 - INFO - Epoch: 0.62 | loss: 6.745500 | grad_norm: 131.186523 | learning_rate: 0.000004 2025-04-12 13:54:04,696 - INFO - Epoch: 0.63 | loss: 8.316300 | grad_norm: 199.131134 | learning_rate: 0.000004 2025-04-12 13:54:09,378 - INFO - Epoch: 0.63 | loss: 7.962700 | grad_norm: 139.291748 | learning_rate: 0.000004 2025-04-12 13:54:14,196 - INFO - Epoch: 0.63 | loss: 6.602600 | grad_norm: 94.647148 | learning_rate: 0.000004 2025-04-12 13:54:19,177 - INFO - Epoch: 0.63 | loss: 9.106500 | grad_norm: 300.620667 | learning_rate: 0.000004 2025-04-12 13:54:24,330 - INFO - Epoch: 0.63 | loss: 7.023100 | grad_norm: 158.447479 | learning_rate: 0.000004 2025-04-12 13:54:29,256 - INFO - Epoch: 0.63 | loss: 9.044000 | grad_norm: 162.776459 | learning_rate: 0.000004 2025-04-12 13:54:34,279 - INFO - Epoch: 0.63 | loss: 7.376200 | grad_norm: 153.929398 | learning_rate: 0.000004 2025-04-12 13:54:39,116 - INFO - Epoch: 0.63 | loss: 8.086100 | grad_norm: 119.955917 | learning_rate: 0.000004 2025-04-12 13:54:43,619 - INFO - Epoch: 0.63 | loss: 5.918200 | grad_norm: 169.407745 | learning_rate: 0.000004 2025-04-12 13:54:48,363 - INFO - Epoch: 0.64 | loss: 6.807700 | grad_norm: 79.403503 | learning_rate: 0.000004 2025-04-12 13:54:52,960 - INFO - Epoch: 0.64 | loss: 7.260100 | grad_norm: 202.307281 | learning_rate: 0.000004 2025-04-12 13:54:57,552 - INFO - Epoch: 0.64 | loss: 9.117000 | grad_norm: 216.418518 | learning_rate: 0.000004 2025-04-12 13:55:02,309 - INFO - Epoch: 0.64 | loss: 7.769500 | grad_norm: 65.350967 | learning_rate: 0.000004 2025-04-12 13:55:07,116 - INFO - Epoch: 0.64 | loss: 5.564600 | grad_norm: 273.628357 | learning_rate: 0.000004 2025-04-12 13:55:12,329 - INFO - Epoch: 0.64 | loss: 8.805900 | grad_norm: 214.079742 | learning_rate: 0.000004 2025-04-12 13:55:17,156 - INFO - Epoch: 0.64 | loss: 8.846300 | grad_norm: 71.150284 | learning_rate: 0.000004 2025-04-12 13:55:22,356 - INFO - Epoch: 0.64 | loss: 9.147400 | grad_norm: 40.188351 | learning_rate: 0.000004 2025-04-12 13:55:27,312 - INFO - Epoch: 0.64 | loss: 8.672700 | grad_norm: 125.115440 | learning_rate: 0.000004 2025-04-12 13:55:32,214 - INFO - Epoch: 0.64 | loss: 7.954100 | grad_norm: 151.150497 | learning_rate: 0.000004 2025-04-12 13:55:37,406 - INFO - Epoch: 0.65 | loss: 8.382700 | grad_norm: 124.565010 | learning_rate: 0.000004 2025-04-12 13:55:42,333 - INFO - Epoch: 0.65 | loss: 10.153000 | grad_norm: 253.066269 | learning_rate: 0.000004 2025-04-12 13:55:47,552 - INFO - Epoch: 0.65 | loss: 8.353200 | grad_norm: 137.007660 | learning_rate: 0.000004 2025-04-12 13:55:52,673 - INFO - Epoch: 0.65 | loss: 7.596000 | grad_norm: 169.519501 | learning_rate: 0.000004 2025-04-12 13:55:58,130 - INFO - Epoch: 0.65 | loss: 7.353700 | grad_norm: 80.141090 | learning_rate: 0.000004 2025-04-12 13:56:03,182 - INFO - Epoch: 0.65 | loss: 8.797700 | grad_norm: 70.983963 | learning_rate: 0.000004 2025-04-12 13:56:07,890 - INFO - Epoch: 0.65 | loss: 5.736700 | grad_norm: 110.609261 | learning_rate: 0.000004 2025-04-12 13:56:12,674 - INFO - Epoch: 0.65 | loss: 9.234600 | grad_norm: 135.534134 | learning_rate: 0.000004 2025-04-12 13:56:17,772 - INFO - Epoch: 0.65 | loss: 5.724800 | grad_norm: 110.458725 | learning_rate: 0.000004 2025-04-12 13:56:22,674 - INFO - Epoch: 0.65 | loss: 8.123800 | grad_norm: 93.662010 | learning_rate: 0.000004 2025-04-12 13:56:27,694 - INFO - Epoch: 0.66 | loss: 5.467200 | grad_norm: 95.964813 | learning_rate: 0.000004 2025-04-12 13:56:32,771 - INFO - Epoch: 0.66 | loss: 8.671000 | grad_norm: 75.907928 | learning_rate: 0.000004 2025-04-12 13:56:38,024 - INFO - Epoch: 0.66 | loss: 8.381800 | grad_norm: 76.477997 | learning_rate: 0.000004 2025-04-12 13:56:42,863 - INFO - Epoch: 0.66 | loss: 6.473800 | grad_norm: 162.046997 | learning_rate: 0.000004 2025-04-12 13:56:47,782 - INFO - Epoch: 0.66 | loss: 8.480800 | grad_norm: 285.138123 | learning_rate: 0.000004 2025-04-12 13:56:52,767 - INFO - Epoch: 0.66 | loss: 10.065800 | grad_norm: 377.587769 | learning_rate: 0.000004 2025-04-12 13:56:58,058 - INFO - Epoch: 0.66 | loss: 10.325100 | grad_norm: 368.296082 | learning_rate: 0.000004 2025-04-12 13:57:03,344 - INFO - Epoch: 0.66 | loss: 10.458400 | grad_norm: 119.952385 | learning_rate: 0.000004 2025-04-12 13:57:08,358 - INFO - Epoch: 0.66 | loss: 7.018500 | grad_norm: 167.514557 | learning_rate: 0.000004 2025-04-12 13:57:13,199 - INFO - Epoch: 0.66 | loss: 6.167200 | grad_norm: 92.898941 | learning_rate: 0.000004 2025-04-12 13:57:18,387 - INFO - Epoch: 0.67 | loss: 5.364100 | grad_norm: 98.195099 | learning_rate: 0.000004 2025-04-12 13:57:23,897 - INFO - Epoch: 0.67 | loss: 8.535400 | grad_norm: 188.282837 | learning_rate: 0.000004 2025-04-12 13:57:29,488 - INFO - Epoch: 0.67 | loss: 9.198800 | grad_norm: 148.826218 | learning_rate: 0.000004 2025-04-12 13:57:34,435 - INFO - Epoch: 0.67 | loss: 6.340100 | grad_norm: 83.497635 | learning_rate: 0.000004 2025-04-12 13:57:39,642 - INFO - Epoch: 0.67 | loss: 7.501500 | grad_norm: 89.900673 | learning_rate: 0.000004 2025-04-12 13:57:44,373 - INFO - Epoch: 0.67 | loss: 6.807500 | grad_norm: 145.813599 | learning_rate: 0.000004 2025-04-12 13:57:49,481 - INFO - Epoch: 0.67 | loss: 6.723400 | grad_norm: 67.145668 | learning_rate: 0.000004 2025-04-12 13:57:54,345 - INFO - Epoch: 0.67 | loss: 7.685000 | grad_norm: 117.547287 | learning_rate: 0.000004 2025-04-12 13:57:59,230 - INFO - Epoch: 0.67 | loss: 5.782500 | grad_norm: 70.291855 | learning_rate: 0.000004 2025-04-12 13:58:03,988 - INFO - Epoch: 0.68 | loss: 6.755900 | grad_norm: 77.952080 | learning_rate: 0.000004 2025-04-12 13:58:09,216 - INFO - Epoch: 0.68 | loss: 8.609300 | grad_norm: 188.440933 | learning_rate: 0.000004 2025-04-12 13:58:14,022 - INFO - Epoch: 0.68 | loss: 7.466000 | grad_norm: 139.307861 | learning_rate: 0.000004 2025-04-12 13:58:18,923 - INFO - Epoch: 0.68 | loss: 8.877500 | grad_norm: 119.573578 | learning_rate: 0.000004 2025-04-12 13:58:23,874 - INFO - Epoch: 0.68 | loss: 8.836600 | grad_norm: 130.536102 | learning_rate: 0.000004 2025-04-12 13:58:28,704 - INFO - Epoch: 0.68 | loss: 6.418800 | grad_norm: 118.047829 | learning_rate: 0.000004 2025-04-12 13:58:33,845 - INFO - Epoch: 0.68 | loss: 7.435200 | grad_norm: 246.974762 | learning_rate: 0.000004 2025-04-12 13:58:38,564 - INFO - Epoch: 0.68 | loss: 6.458600 | grad_norm: 73.707436 | learning_rate: 0.000004 2025-04-12 13:58:43,154 - INFO - Epoch: 0.68 | loss: 9.360700 | grad_norm: 113.690887 | learning_rate: 0.000004 2025-04-12 13:58:48,041 - INFO - Epoch: 0.68 | loss: 7.698800 | grad_norm: 112.543243 | learning_rate: 0.000004 2025-04-12 13:58:53,105 - INFO - Epoch: 0.69 | loss: 4.795900 | grad_norm: 55.064560 | learning_rate: 0.000003 2025-04-12 13:58:58,252 - INFO - Epoch: 0.69 | loss: 8.496500 | grad_norm: 167.327408 | learning_rate: 0.000003 2025-04-12 13:59:03,238 - INFO - Epoch: 0.69 | loss: 6.872300 | grad_norm: 91.952423 | learning_rate: 0.000003 2025-04-12 13:59:08,544 - INFO - Epoch: 0.69 | loss: 8.664800 | grad_norm: 136.926147 | learning_rate: 0.000003 2025-04-12 13:59:13,615 - INFO - Epoch: 0.69 | loss: 7.854800 | grad_norm: 335.734680 | learning_rate: 0.000003 2025-04-12 13:59:17,987 - INFO - Epoch: 0.69 | loss: 8.431800 | grad_norm: 71.942070 | learning_rate: 0.000003 2025-04-12 13:59:22,874 - INFO - Epoch: 0.69 | loss: 6.291300 | grad_norm: 149.225555 | learning_rate: 0.000003 2025-04-12 13:59:27,772 - INFO - Epoch: 0.69 | loss: 7.941500 | grad_norm: 121.908020 | learning_rate: 0.000003 2025-04-12 13:59:32,819 - INFO - Epoch: 0.69 | loss: 9.898600 | grad_norm: 181.084763 | learning_rate: 0.000003 2025-04-12 13:59:37,760 - INFO - Epoch: 0.69 | loss: 7.295300 | grad_norm: 76.176987 | learning_rate: 0.000003 2025-04-12 13:59:43,017 - INFO - Epoch: 0.70 | loss: 7.333500 | grad_norm: 109.812561 | learning_rate: 0.000003 2025-04-12 13:59:48,294 - INFO - Epoch: 0.70 | loss: 8.314500 | grad_norm: 130.659637 | learning_rate: 0.000003 2025-04-12 13:59:53,206 - INFO - Epoch: 0.70 | loss: 6.391900 | grad_norm: 102.499451 | learning_rate: 0.000003 2025-04-12 13:59:58,181 - INFO - Epoch: 0.70 | loss: 7.331100 | grad_norm: 72.675209 | learning_rate: 0.000003 2025-04-12 14:00:03,085 - INFO - Epoch: 0.70 | loss: 8.933500 | grad_norm: 272.660370 | learning_rate: 0.000003 2025-04-12 14:00:08,098 - INFO - Epoch: 0.70 | loss: 6.232800 | grad_norm: 149.270126 | learning_rate: 0.000003 2025-04-12 14:00:13,111 - INFO - Epoch: 0.70 | loss: 8.914400 | grad_norm: 244.773422 | learning_rate: 0.000003 2025-04-12 14:00:18,048 - INFO - Epoch: 0.70 | loss: 8.411600 | grad_norm: 225.314163 | learning_rate: 0.000003 2025-04-12 14:00:23,102 - INFO - Epoch: 0.70 | loss: 7.706800 | grad_norm: 121.381821 | learning_rate: 0.000003 2025-04-12 14:00:27,732 - INFO - Epoch: 0.70 | loss: 8.224500 | grad_norm: 392.064972 | learning_rate: 0.000003 2025-04-12 14:00:33,112 - INFO - Epoch: 0.71 | loss: 11.838600 | grad_norm: 229.110519 | learning_rate: 0.000003 2025-04-12 14:00:38,071 - INFO - Epoch: 0.71 | loss: 7.867000 | grad_norm: 100.288620 | learning_rate: 0.000003 2025-04-12 14:00:42,723 - INFO - Epoch: 0.71 | loss: 7.276900 | grad_norm: 212.299072 | learning_rate: 0.000003 2025-04-12 14:00:47,740 - INFO - Epoch: 0.71 | loss: 8.938900 | grad_norm: 309.502258 | learning_rate: 0.000003 2025-04-12 14:00:52,492 - INFO - Epoch: 0.71 | loss: 6.233400 | grad_norm: 120.191353 | learning_rate: 0.000003 2025-04-12 14:00:57,561 - INFO - Epoch: 0.71 | loss: 6.598600 | grad_norm: 134.774612 | learning_rate: 0.000003 2025-04-12 14:01:02,344 - INFO - Epoch: 0.71 | loss: 7.368800 | grad_norm: 102.987831 | learning_rate: 0.000003 2025-04-12 14:01:07,207 - INFO - Epoch: 0.71 | loss: 8.022100 | grad_norm: 150.819550 | learning_rate: 0.000003 2025-04-12 14:01:11,756 - INFO - Epoch: 0.71 | loss: 7.003400 | grad_norm: 224.555710 | learning_rate: 0.000003 2025-04-12 14:01:16,523 - INFO - Epoch: 0.72 | loss: 7.445900 | grad_norm: 130.177155 | learning_rate: 0.000003 2025-04-12 14:01:21,708 - INFO - Epoch: 0.72 | loss: 6.470400 | grad_norm: 166.080811 | learning_rate: 0.000003 2025-04-12 14:01:26,920 - INFO - Epoch: 0.72 | loss: 4.975500 | grad_norm: 97.151100 | learning_rate: 0.000003 2025-04-12 14:01:32,113 - INFO - Epoch: 0.72 | loss: 6.781700 | grad_norm: 148.280624 | learning_rate: 0.000003 2025-04-12 14:01:36,894 - INFO - Epoch: 0.72 | loss: 9.557800 | grad_norm: 68.143478 | learning_rate: 0.000003 2025-04-12 14:01:42,051 - INFO - Epoch: 0.72 | loss: 7.723800 | grad_norm: 101.595612 | learning_rate: 0.000003 2025-04-12 14:01:46,929 - INFO - Epoch: 0.72 | loss: 5.554000 | grad_norm: 78.067413 | learning_rate: 0.000003 2025-04-12 14:01:51,473 - INFO - Epoch: 0.72 | loss: 6.302000 | grad_norm: 89.662949 | learning_rate: 0.000003 2025-04-12 14:01:56,506 - INFO - Epoch: 0.72 | loss: 9.122700 | grad_norm: 59.744831 | learning_rate: 0.000003 2025-04-12 14:02:01,115 - INFO - Epoch: 0.72 | loss: 8.250300 | grad_norm: 131.886444 | learning_rate: 0.000003 2025-04-12 14:02:06,191 - INFO - Epoch: 0.73 | loss: 7.929100 | grad_norm: 221.158081 | learning_rate: 0.000003 2025-04-12 14:02:11,101 - INFO - Epoch: 0.73 | loss: 6.631100 | grad_norm: 176.069214 | learning_rate: 0.000003 2025-04-12 14:02:15,775 - INFO - Epoch: 0.73 | loss: 7.001700 | grad_norm: 131.988495 | learning_rate: 0.000003 2025-04-12 14:02:20,870 - INFO - Epoch: 0.73 | loss: 6.316900 | grad_norm: 270.232544 | learning_rate: 0.000003 2025-04-12 14:02:25,904 - INFO - Epoch: 0.73 | loss: 8.904900 | grad_norm: 190.625473 | learning_rate: 0.000003 2025-04-12 14:02:30,996 - INFO - Epoch: 0.73 | loss: 8.623800 | grad_norm: 237.264313 | learning_rate: 0.000003 2025-04-12 14:02:36,025 - INFO - Epoch: 0.73 | loss: 8.496700 | grad_norm: 181.914551 | learning_rate: 0.000003 2025-04-12 14:02:41,273 - INFO - Epoch: 0.73 | loss: 8.995500 | grad_norm: 145.049683 | learning_rate: 0.000003 2025-04-12 14:02:46,064 - INFO - Epoch: 0.73 | loss: 8.976600 | grad_norm: 113.477196 | learning_rate: 0.000003 2025-04-12 14:02:51,135 - INFO - Epoch: 0.73 | loss: 7.067200 | grad_norm: 101.541641 | learning_rate: 0.000003 2025-04-12 14:02:55,803 - INFO - Epoch: 0.74 | loss: 7.788800 | grad_norm: 146.414139 | learning_rate: 0.000003 2025-04-12 14:03:00,849 - INFO - Epoch: 0.74 | loss: 7.201500 | grad_norm: 125.441330 | learning_rate: 0.000003 2025-04-12 14:03:06,001 - INFO - Epoch: 0.74 | loss: 7.689100 | grad_norm: 139.845703 | learning_rate: 0.000003 2025-04-12 14:03:10,955 - INFO - Epoch: 0.74 | loss: 8.468700 | grad_norm: 124.138710 | learning_rate: 0.000003 2025-04-12 14:03:16,155 - INFO - Epoch: 0.74 | loss: 7.286300 | grad_norm: 113.318024 | learning_rate: 0.000003 2025-04-12 14:03:20,962 - INFO - Epoch: 0.74 | loss: 8.161000 | grad_norm: 135.181168 | learning_rate: 0.000003 2025-04-12 14:03:25,749 - INFO - Epoch: 0.74 | loss: 6.773900 | grad_norm: 103.459816 | learning_rate: 0.000003 2025-04-12 14:03:30,902 - INFO - Epoch: 0.74 | loss: 9.416100 | grad_norm: 85.067009 | learning_rate: 0.000003 2025-04-12 14:03:35,432 - INFO - Epoch: 0.74 | loss: 5.708200 | grad_norm: 118.097137 | learning_rate: 0.000003 2025-04-12 14:03:40,591 - INFO - Epoch: 0.74 | loss: 11.202100 | grad_norm: 243.371567 | learning_rate: 0.000003 2025-04-12 14:03:45,948 - INFO - Epoch: 0.75 | loss: 6.392300 | grad_norm: 254.714340 | learning_rate: 0.000003 2025-04-12 14:03:50,759 - INFO - Epoch: 0.75 | loss: 8.638000 | grad_norm: 134.944489 | learning_rate: 0.000003 2025-04-12 14:03:55,756 - INFO - Epoch: 0.75 | loss: 7.154500 | grad_norm: 160.465073 | learning_rate: 0.000003 2025-04-12 14:04:00,906 - INFO - Epoch: 0.75 | loss: 7.090000 | grad_norm: 117.719414 | learning_rate: 0.000003 2025-04-12 14:04:05,667 - INFO - Epoch: 0.75 | loss: 6.501500 | grad_norm: 189.210403 | learning_rate: 0.000003 2025-04-12 14:04:10,455 - INFO - Epoch: 0.75 | loss: 8.604600 | grad_norm: 232.840683 | learning_rate: 0.000003 2025-04-12 14:04:15,354 - INFO - Epoch: 0.75 | loss: 6.412400 | grad_norm: 145.116653 | learning_rate: 0.000003 2025-04-12 14:04:20,120 - INFO - Epoch: 0.75 | loss: 8.935400 | grad_norm: 118.923851 | learning_rate: 0.000003 2025-04-12 14:04:25,298 - INFO - Epoch: 0.75 | loss: 7.540000 | grad_norm: 103.823540 | learning_rate: 0.000003 2025-04-12 14:04:30,157 - INFO - Epoch: 0.76 | loss: 8.066900 | grad_norm: 202.183502 | learning_rate: 0.000003 2025-04-12 14:04:35,127 - INFO - Epoch: 0.76 | loss: 6.689300 | grad_norm: 51.027462 | learning_rate: 0.000003 2025-04-12 14:04:40,244 - INFO - Epoch: 0.76 | loss: 9.013800 | grad_norm: 81.933998 | learning_rate: 0.000003 2025-04-12 14:04:45,048 - INFO - Epoch: 0.76 | loss: 6.599200 | grad_norm: 111.475296 | learning_rate: 0.000003 2025-04-12 14:04:49,794 - INFO - Epoch: 0.76 | loss: 6.567500 | grad_norm: 136.200928 | learning_rate: 0.000003 2025-04-12 14:04:54,894 - INFO - Epoch: 0.76 | loss: 8.980500 | grad_norm: 115.272453 | learning_rate: 0.000003 2025-04-12 14:04:59,605 - INFO - Epoch: 0.76 | loss: 6.510600 | grad_norm: 185.724716 | learning_rate: 0.000003 2025-04-12 14:05:04,885 - INFO - Epoch: 0.76 | loss: 8.921100 | grad_norm: 211.465485 | learning_rate: 0.000003 2025-04-12 14:05:09,463 - INFO - Epoch: 0.76 | loss: 5.282600 | grad_norm: 103.227631 | learning_rate: 0.000003 2025-04-12 14:05:13,734 - INFO - Epoch: 0.76 | loss: 6.060100 | grad_norm: 120.200493 | learning_rate: 0.000003 2025-04-12 14:05:18,639 - INFO - Epoch: 0.77 | loss: 6.540900 | grad_norm: 147.612442 | learning_rate: 0.000003 2025-04-12 14:05:23,686 - INFO - Epoch: 0.77 | loss: 9.541200 | grad_norm: 131.342072 | learning_rate: 0.000003 2025-04-12 14:05:28,677 - INFO - Epoch: 0.77 | loss: 7.695700 | grad_norm: 156.285049 | learning_rate: 0.000003 2025-04-12 14:05:33,636 - INFO - Epoch: 0.77 | loss: 8.577800 | grad_norm: 181.561981 | learning_rate: 0.000003 2025-04-12 14:05:38,439 - INFO - Epoch: 0.77 | loss: 7.394100 | grad_norm: 123.193932 | learning_rate: 0.000003 2025-04-12 14:05:43,391 - INFO - Epoch: 0.77 | loss: 6.535800 | grad_norm: 194.914459 | learning_rate: 0.000003 2025-04-12 14:05:48,283 - INFO - Epoch: 0.77 | loss: 7.849800 | grad_norm: 86.957001 | learning_rate: 0.000003 2025-04-12 14:05:53,413 - INFO - Epoch: 0.77 | loss: 8.458300 | grad_norm: 118.541458 | learning_rate: 0.000003 2025-04-12 14:05:58,350 - INFO - Epoch: 0.77 | loss: 7.441500 | grad_norm: 81.181862 | learning_rate: 0.000003 2025-04-12 14:06:03,401 - INFO - Epoch: 0.77 | loss: 7.681900 | grad_norm: 186.354126 | learning_rate: 0.000003 2025-04-12 14:06:08,048 - INFO - Epoch: 0.78 | loss: 9.392300 | grad_norm: 125.466873 | learning_rate: 0.000002 2025-04-12 14:06:12,891 - INFO - Epoch: 0.78 | loss: 6.499900 | grad_norm: 152.605789 | learning_rate: 0.000002 2025-04-12 14:06:17,667 - INFO - Epoch: 0.78 | loss: 8.015200 | grad_norm: 150.515320 | learning_rate: 0.000002 2025-04-12 14:06:22,770 - INFO - Epoch: 0.78 | loss: 8.027600 | grad_norm: 168.982178 | learning_rate: 0.000002 2025-04-12 14:06:28,007 - INFO - Epoch: 0.78 | loss: 6.313200 | grad_norm: 88.978615 | learning_rate: 0.000002 2025-04-12 14:06:32,809 - INFO - Epoch: 0.78 | loss: 6.659100 | grad_norm: 176.383194 | learning_rate: 0.000002 2025-04-12 14:06:37,650 - INFO - Epoch: 0.78 | loss: 7.417100 | grad_norm: 112.971359 | learning_rate: 0.000002 2025-04-12 14:06:42,264 - INFO - Epoch: 0.78 | loss: 7.224800 | grad_norm: 237.537109 | learning_rate: 0.000002 2025-04-12 14:06:47,107 - INFO - Epoch: 0.78 | loss: 7.343900 | grad_norm: 137.827408 | learning_rate: 0.000002 2025-04-12 14:06:52,049 - INFO - Epoch: 0.79 | loss: 7.072100 | grad_norm: 115.822975 | learning_rate: 0.000002 2025-04-12 14:06:57,096 - INFO - Epoch: 0.79 | loss: 7.997900 | grad_norm: 233.292999 | learning_rate: 0.000002 2025-04-12 14:07:02,460 - INFO - Epoch: 0.79 | loss: 8.491000 | grad_norm: 94.840858 | learning_rate: 0.000002 2025-04-12 14:07:07,399 - INFO - Epoch: 0.79 | loss: 12.739400 | grad_norm: 140.953705 | learning_rate: 0.000002 2025-04-12 14:07:12,063 - INFO - Epoch: 0.79 | loss: 6.312200 | grad_norm: 178.747635 | learning_rate: 0.000002 2025-04-12 14:07:16,892 - INFO - Epoch: 0.79 | loss: 7.613600 | grad_norm: 342.395111 | learning_rate: 0.000002 2025-04-12 14:07:22,029 - INFO - Epoch: 0.79 | loss: 11.532700 | grad_norm: 100.797119 | learning_rate: 0.000002 2025-04-12 14:07:27,117 - INFO - Epoch: 0.79 | loss: 7.585500 | grad_norm: 162.377777 | learning_rate: 0.000002 2025-04-12 14:07:32,323 - INFO - Epoch: 0.79 | loss: 7.856700 | grad_norm: 140.142654 | learning_rate: 0.000002 2025-04-12 14:07:37,203 - INFO - Epoch: 0.79 | loss: 7.022600 | grad_norm: 128.033722 | learning_rate: 0.000002 2025-04-12 14:07:42,174 - INFO - Epoch: 0.80 | loss: 8.771800 | grad_norm: 172.847687 | learning_rate: 0.000002 2025-04-12 14:07:47,307 - INFO - Epoch: 0.80 | loss: 6.919700 | grad_norm: 99.812912 | learning_rate: 0.000002 2025-04-12 14:07:52,674 - INFO - Epoch: 0.80 | loss: 7.063100 | grad_norm: 128.549606 | learning_rate: 0.000002 2025-04-12 14:07:57,584 - INFO - Epoch: 0.80 | loss: 6.314500 | grad_norm: 89.717468 | learning_rate: 0.000002 2025-04-12 14:08:02,601 - INFO - Epoch: 0.80 | loss: 10.469700 | grad_norm: 67.369118 | learning_rate: 0.000002 2025-04-12 14:08:07,648 - INFO - Epoch: 0.80 | loss: 7.154400 | grad_norm: 96.530357 | learning_rate: 0.000002 2025-04-12 14:08:12,903 - INFO - Epoch: 0.80 | loss: 8.426500 | grad_norm: 156.670013 | learning_rate: 0.000002 2025-04-12 14:08:17,588 - INFO - Epoch: 0.80 | loss: 6.263500 | grad_norm: 43.592186 | learning_rate: 0.000002 2025-04-12 14:08:22,392 - INFO - Epoch: 0.80 | loss: 6.233100 | grad_norm: 79.210907 | learning_rate: 0.000002 2025-04-12 14:08:27,417 - INFO - Epoch: 0.80 | loss: 11.064100 | grad_norm: 92.353996 | learning_rate: 0.000002 2025-04-12 14:08:32,341 - INFO - Epoch: 0.81 | loss: 7.230400 | grad_norm: 182.695419 | learning_rate: 0.000002 2025-04-12 14:08:37,490 - INFO - Epoch: 0.81 | loss: 8.651100 | grad_norm: 86.137650 | learning_rate: 0.000002 2025-04-12 14:08:42,263 - INFO - Epoch: 0.81 | loss: 7.666700 | grad_norm: 61.470020 | learning_rate: 0.000002 2025-04-12 14:08:47,411 - INFO - Epoch: 0.81 | loss: 9.184100 | grad_norm: 110.811119 | learning_rate: 0.000002 2025-04-12 14:08:52,008 - INFO - Epoch: 0.81 | loss: 7.609900 | grad_norm: 112.561028 | learning_rate: 0.000002 2025-04-12 14:08:56,911 - INFO - Epoch: 0.81 | loss: 7.264100 | grad_norm: 67.532188 | learning_rate: 0.000002 2025-04-12 14:09:01,806 - INFO - Epoch: 0.81 | loss: 7.348600 | grad_norm: 238.657593 | learning_rate: 0.000002 2025-04-12 14:09:06,542 - INFO - Epoch: 0.81 | loss: 7.445000 | grad_norm: 87.996468 | learning_rate: 0.000002 2025-04-12 14:09:11,115 - INFO - Epoch: 0.81 | loss: 6.400000 | grad_norm: 76.780884 | learning_rate: 0.000002 2025-04-12 14:09:15,852 - INFO - Epoch: 0.81 | loss: 6.322500 | grad_norm: 154.790314 | learning_rate: 0.000002 2025-04-12 14:09:20,975 - INFO - Epoch: 0.82 | loss: 5.691700 | grad_norm: 50.766376 | learning_rate: 0.000002 2025-04-12 14:09:26,228 - INFO - Epoch: 0.82 | loss: 5.618900 | grad_norm: 117.284195 | learning_rate: 0.000002 2025-04-12 14:09:31,221 - INFO - Epoch: 0.82 | loss: 7.909100 | grad_norm: 88.224770 | learning_rate: 0.000002 2025-04-12 14:09:36,031 - INFO - Epoch: 0.82 | loss: 9.290400 | grad_norm: 163.871719 | learning_rate: 0.000002 2025-04-12 14:09:40,919 - INFO - Epoch: 0.82 | loss: 7.088100 | grad_norm: 193.667023 | learning_rate: 0.000002 2025-04-12 14:09:46,126 - INFO - Epoch: 0.82 | loss: 6.591000 | grad_norm: 75.780632 | learning_rate: 0.000002 2025-04-12 14:09:50,837 - INFO - Epoch: 0.82 | loss: 7.034800 | grad_norm: 166.217896 | learning_rate: 0.000002 2025-04-12 14:09:55,732 - INFO - Epoch: 0.82 | loss: 7.969100 | grad_norm: 137.195038 | learning_rate: 0.000002 2025-04-12 14:10:00,916 - INFO - Epoch: 0.82 | loss: 6.635300 | grad_norm: 185.890411 | learning_rate: 0.000002 2025-04-12 14:10:06,295 - INFO - Epoch: 0.83 | loss: 7.945000 | grad_norm: 254.228668 | learning_rate: 0.000002 2025-04-12 14:10:11,411 - INFO - Epoch: 0.83 | loss: 8.847400 | grad_norm: 186.758987 | learning_rate: 0.000002 2025-04-12 14:10:16,568 - INFO - Epoch: 0.83 | loss: 8.281500 | grad_norm: 163.093643 | learning_rate: 0.000002 2025-04-12 14:10:21,790 - INFO - Epoch: 0.83 | loss: 9.589200 | grad_norm: 194.278244 | learning_rate: 0.000002 2025-04-12 14:10:26,742 - INFO - Epoch: 0.83 | loss: 8.039100 | grad_norm: 142.720108 | learning_rate: 0.000002 2025-04-12 14:10:31,480 - INFO - Epoch: 0.83 | loss: 7.406000 | grad_norm: 64.783257 | learning_rate: 0.000002 2025-04-12 14:10:36,554 - INFO - Epoch: 0.83 | loss: 8.576600 | grad_norm: 134.337982 | learning_rate: 0.000002 2025-04-12 14:10:41,641 - INFO - Epoch: 0.83 | loss: 7.294800 | grad_norm: 82.427437 | learning_rate: 0.000002 2025-04-12 14:10:46,764 - INFO - Epoch: 0.83 | loss: 6.198600 | grad_norm: 103.759193 | learning_rate: 0.000002 2025-04-12 14:10:51,766 - INFO - Epoch: 0.83 | loss: 7.913500 | grad_norm: 90.015190 | learning_rate: 0.000002 2025-04-12 14:10:56,266 - INFO - Epoch: 0.84 | loss: 5.294600 | grad_norm: 72.985252 | learning_rate: 0.000002 2025-04-12 14:11:01,098 - INFO - Epoch: 0.84 | loss: 6.813600 | grad_norm: 149.073593 | learning_rate: 0.000002 2025-04-12 14:11:06,051 - INFO - Epoch: 0.84 | loss: 7.182200 | grad_norm: 327.777527 | learning_rate: 0.000002 2025-04-12 14:11:10,632 - INFO - Epoch: 0.84 | loss: 5.270800 | grad_norm: 38.795666 | learning_rate: 0.000002 2025-04-12 14:11:15,588 - INFO - Epoch: 0.84 | loss: 9.924000 | grad_norm: 292.370575 | learning_rate: 0.000002 2025-04-12 14:11:21,130 - INFO - Epoch: 0.84 | loss: 8.349700 | grad_norm: 208.550095 | learning_rate: 0.000002 2025-04-12 14:11:26,150 - INFO - Epoch: 0.84 | loss: 8.059200 | grad_norm: 104.341232 | learning_rate: 0.000002 2025-04-12 14:11:31,274 - INFO - Epoch: 0.84 | loss: 7.035600 | grad_norm: 320.120544 | learning_rate: 0.000002 2025-04-12 14:11:36,254 - INFO - Epoch: 0.84 | loss: 6.236400 | grad_norm: 102.520874 | learning_rate: 0.000002 2025-04-12 14:11:41,226 - INFO - Epoch: 0.84 | loss: 6.894900 | grad_norm: 86.475342 | learning_rate: 0.000002 2025-04-12 14:11:46,264 - INFO - Epoch: 0.85 | loss: 6.279200 | grad_norm: 80.860062 | learning_rate: 0.000002 2025-04-12 14:11:51,269 - INFO - Epoch: 0.85 | loss: 7.691100 | grad_norm: 219.457001 | learning_rate: 0.000002 2025-04-12 14:11:56,236 - INFO - Epoch: 0.85 | loss: 6.671800 | grad_norm: 56.539070 | learning_rate: 0.000002 2025-04-12 14:12:01,775 - INFO - Epoch: 0.85 | loss: 7.320500 | grad_norm: 96.195862 | learning_rate: 0.000002 2025-04-12 14:12:07,207 - INFO - Epoch: 0.85 | loss: 7.206800 | grad_norm: 72.119049 | learning_rate: 0.000002 2025-04-12 14:12:12,359 - INFO - Epoch: 0.85 | loss: 7.788700 | grad_norm: 59.315784 | learning_rate: 0.000002 2025-04-12 14:12:17,303 - INFO - Epoch: 0.85 | loss: 9.209900 | grad_norm: 189.671631 | learning_rate: 0.000002 2025-04-12 14:12:22,131 - INFO - Epoch: 0.85 | loss: 6.623000 | grad_norm: 136.599976 | learning_rate: 0.000002 2025-04-12 14:12:27,218 - INFO - Epoch: 0.85 | loss: 6.540100 | grad_norm: 225.006256 | learning_rate: 0.000002 2025-04-12 14:12:31,931 - INFO - Epoch: 0.85 | loss: 8.296600 | grad_norm: 100.141251 | learning_rate: 0.000002 2025-04-12 14:12:36,758 - INFO - Epoch: 0.86 | loss: 8.347200 | grad_norm: 162.260391 | learning_rate: 0.000002 2025-04-12 14:12:41,802 - INFO - Epoch: 0.86 | loss: 7.139000 | grad_norm: 101.761322 | learning_rate: 0.000002 2025-04-12 14:12:46,702 - INFO - Epoch: 0.86 | loss: 6.599400 | grad_norm: 180.299973 | learning_rate: 0.000002 2025-04-12 14:12:51,546 - INFO - Epoch: 0.86 | loss: 5.166400 | grad_norm: 132.216522 | learning_rate: 0.000002 2025-04-12 14:12:56,590 - INFO - Epoch: 0.86 | loss: 9.662400 | grad_norm: 103.120476 | learning_rate: 0.000002 2025-04-12 14:13:01,507 - INFO - Epoch: 0.86 | loss: 6.340200 | grad_norm: 70.375702 | learning_rate: 0.000002 2025-04-12 14:13:06,146 - INFO - Epoch: 0.86 | loss: 7.705200 | grad_norm: 156.951340 | learning_rate: 0.000002 2025-04-12 14:13:11,132 - INFO - Epoch: 0.86 | loss: 8.510000 | grad_norm: 271.931244 | learning_rate: 0.000002 2025-04-12 14:13:16,030 - INFO - Epoch: 0.86 | loss: 8.300100 | grad_norm: 143.532684 | learning_rate: 0.000002 2025-04-12 14:13:20,911 - INFO - Epoch: 0.87 | loss: 7.081900 | grad_norm: 80.196617 | learning_rate: 0.000001 2025-04-12 14:13:25,726 - INFO - Epoch: 0.87 | loss: 6.219000 | grad_norm: 86.241035 | learning_rate: 0.000001 2025-04-12 14:13:30,636 - INFO - Epoch: 0.87 | loss: 7.662300 | grad_norm: 127.918747 | learning_rate: 0.000001 2025-04-12 14:13:35,972 - INFO - Epoch: 0.87 | loss: 8.634900 | grad_norm: 136.974686 | learning_rate: 0.000001 2025-04-12 14:13:41,071 - INFO - Epoch: 0.87 | loss: 5.689800 | grad_norm: 92.235580 | learning_rate: 0.000001 2025-04-12 14:13:46,197 - INFO - Epoch: 0.87 | loss: 6.371800 | grad_norm: 76.876991 | learning_rate: 0.000001 2025-04-12 14:13:51,257 - INFO - Epoch: 0.87 | loss: 6.199300 | grad_norm: 81.613945 | learning_rate: 0.000001 2025-04-12 14:13:56,243 - INFO - Epoch: 0.87 | loss: 7.644200 | grad_norm: 453.788574 | learning_rate: 0.000001 2025-04-12 14:14:01,300 - INFO - Epoch: 0.87 | loss: 7.590100 | grad_norm: 156.585007 | learning_rate: 0.000001 2025-04-12 14:14:06,604 - INFO - Epoch: 0.87 | loss: 9.368400 | grad_norm: 118.761108 | learning_rate: 0.000001 2025-04-12 14:14:11,965 - INFO - Epoch: 0.88 | loss: 6.876800 | grad_norm: 234.269623 | learning_rate: 0.000001 2025-04-12 14:14:16,203 - INFO - Epoch: 0.88 | loss: 4.830100 | grad_norm: 115.167404 | learning_rate: 0.000001 2025-04-12 14:14:21,089 - INFO - Epoch: 0.88 | loss: 7.800500 | grad_norm: 94.778023 | learning_rate: 0.000001 2025-04-12 14:14:26,011 - INFO - Epoch: 0.88 | loss: 8.408100 | grad_norm: 166.248032 | learning_rate: 0.000001 2025-04-12 14:14:30,596 - INFO - Epoch: 0.88 | loss: 7.165000 | grad_norm: 163.921524 | learning_rate: 0.000001 2025-04-12 14:14:35,620 - INFO - Epoch: 0.88 | loss: 6.319200 | grad_norm: 59.240334 | learning_rate: 0.000001 2025-04-12 14:14:40,620 - INFO - Epoch: 0.88 | loss: 6.213500 | grad_norm: 78.463806 | learning_rate: 0.000001 2025-04-12 14:14:45,725 - INFO - Epoch: 0.88 | loss: 6.148000 | grad_norm: 57.611980 | learning_rate: 0.000001 2025-04-12 14:14:50,167 - INFO - Epoch: 0.88 | loss: 5.861700 | grad_norm: 60.235207 | learning_rate: 0.000001 2025-04-12 14:14:55,456 - INFO - Epoch: 0.88 | loss: 6.103600 | grad_norm: 88.040115 | learning_rate: 0.000001 2025-04-12 14:15:00,469 - INFO - Epoch: 0.89 | loss: 8.123900 | grad_norm: 88.255806 | learning_rate: 0.000001 2025-04-12 14:15:05,510 - INFO - Epoch: 0.89 | loss: 8.183100 | grad_norm: 30.198666 | learning_rate: 0.000001 2025-04-12 14:15:10,510 - INFO - Epoch: 0.89 | loss: 7.357700 | grad_norm: 283.721680 | learning_rate: 0.000001 2025-04-12 14:15:15,843 - INFO - Epoch: 0.89 | loss: 8.059100 | grad_norm: 127.693954 | learning_rate: 0.000001 2025-04-12 14:15:21,115 - INFO - Epoch: 0.89 | loss: 8.266400 | grad_norm: 85.423714 | learning_rate: 0.000001 2025-04-12 14:15:26,205 - INFO - Epoch: 0.89 | loss: 5.902600 | grad_norm: 100.344803 | learning_rate: 0.000001 2025-04-12 14:15:31,385 - INFO - Epoch: 0.89 | loss: 7.061900 | grad_norm: 118.888756 | learning_rate: 0.000001 2025-04-12 14:15:36,381 - INFO - Epoch: 0.89 | loss: 8.238600 | grad_norm: 343.642151 | learning_rate: 0.000001 2025-04-12 14:15:41,557 - INFO - Epoch: 0.89 | loss: 6.972400 | grad_norm: 185.050583 | learning_rate: 0.000001 2025-04-12 14:15:46,937 - INFO - Epoch: 0.89 | loss: 5.971500 | grad_norm: 131.428940 | learning_rate: 0.000001 2025-04-12 14:15:51,951 - INFO - Epoch: 0.90 | loss: 6.633500 | grad_norm: 102.667976 | learning_rate: 0.000001 2025-04-12 14:15:56,857 - INFO - Epoch: 0.90 | loss: 7.141500 | grad_norm: 108.247452 | learning_rate: 0.000001 2025-04-12 14:16:02,725 - INFO - Epoch: 0.90 | loss: 9.564700 | grad_norm: 218.256927 | learning_rate: 0.000001 2025-04-12 14:16:07,536 - INFO - Epoch: 0.90 | loss: 7.103100 | grad_norm: 121.163490 | learning_rate: 0.000001 2025-04-12 14:16:12,382 - INFO - Epoch: 0.90 | loss: 5.919300 | grad_norm: 144.336929 | learning_rate: 0.000001 2025-04-12 14:16:17,224 - INFO - Epoch: 0.90 | loss: 6.823700 | grad_norm: 196.078522 | learning_rate: 0.000001 2025-04-12 14:16:22,173 - INFO - Epoch: 0.90 | loss: 7.136200 | grad_norm: 137.613251 | learning_rate: 0.000001 2025-04-12 14:16:26,853 - INFO - Epoch: 0.90 | loss: 6.578200 | grad_norm: 43.702595 | learning_rate: 0.000001 2025-04-12 14:16:31,893 - INFO - Epoch: 0.90 | loss: 6.409900 | grad_norm: 106.773338 | learning_rate: 0.000001 2025-04-12 14:16:37,029 - INFO - Epoch: 0.91 | loss: 6.556700 | grad_norm: 98.665054 | learning_rate: 0.000001 2025-04-12 14:16:41,880 - INFO - Epoch: 0.91 | loss: 6.781900 | grad_norm: 111.641678 | learning_rate: 0.000001 2025-04-12 14:16:46,748 - INFO - Epoch: 0.91 | loss: 7.220900 | grad_norm: 134.963989 | learning_rate: 0.000001 2025-04-12 14:16:51,661 - INFO - Epoch: 0.91 | loss: 6.310700 | grad_norm: 114.159554 | learning_rate: 0.000001 2025-04-12 14:16:56,999 - INFO - Epoch: 0.91 | loss: 8.758700 | grad_norm: 64.293564 | learning_rate: 0.000001 2025-04-12 14:17:01,622 - INFO - Epoch: 0.91 | loss: 5.634300 | grad_norm: 123.391418 | learning_rate: 0.000001 2025-04-12 14:17:06,969 - INFO - Epoch: 0.91 | loss: 7.678200 | grad_norm: 102.888718 | learning_rate: 0.000001 2025-04-12 14:17:11,897 - INFO - Epoch: 0.91 | loss: 7.913300 | grad_norm: 83.833511 | learning_rate: 0.000001 2025-04-12 14:17:16,692 - INFO - Epoch: 0.91 | loss: 7.183400 | grad_norm: 145.717041 | learning_rate: 0.000001 2025-04-12 14:17:21,372 - INFO - Epoch: 0.91 | loss: 7.100500 | grad_norm: 67.733002 | learning_rate: 0.000001 2025-04-12 14:17:26,678 - INFO - Epoch: 0.92 | loss: 8.013000 | grad_norm: 239.246414 | learning_rate: 0.000001 2025-04-12 14:17:31,553 - INFO - Epoch: 0.92 | loss: 7.590400 | grad_norm: 77.307480 | learning_rate: 0.000001 2025-04-12 14:17:36,318 - INFO - Epoch: 0.92 | loss: 7.943000 | grad_norm: 101.957756 | learning_rate: 0.000001 2025-04-12 14:17:41,033 - INFO - Epoch: 0.92 | loss: 8.142500 | grad_norm: 83.682991 | learning_rate: 0.000001 2025-04-12 14:17:46,074 - INFO - Epoch: 0.92 | loss: 10.627100 | grad_norm: 68.626114 | learning_rate: 0.000001 2025-04-12 14:17:51,024 - INFO - Epoch: 0.92 | loss: 9.507000 | grad_norm: 78.969299 | learning_rate: 0.000001 2025-04-12 14:17:55,693 - INFO - Epoch: 0.92 | loss: 6.867700 | grad_norm: 183.337860 | learning_rate: 0.000001 2025-04-12 14:18:00,374 - INFO - Epoch: 0.92 | loss: 7.686400 | grad_norm: 141.386108 | learning_rate: 0.000001 2025-04-12 14:18:05,406 - INFO - Epoch: 0.92 | loss: 7.746900 | grad_norm: 137.175385 | learning_rate: 0.000001 2025-04-12 14:18:10,255 - INFO - Epoch: 0.92 | loss: 5.986500 | grad_norm: 106.162781 | learning_rate: 0.000001 2025-04-12 14:18:14,973 - INFO - Epoch: 0.93 | loss: 7.336900 | grad_norm: 67.373207 | learning_rate: 0.000001 2025-04-12 14:18:19,918 - INFO - Epoch: 0.93 | loss: 7.813200 | grad_norm: 168.120605 | learning_rate: 0.000001 2025-04-12 14:18:25,066 - INFO - Epoch: 0.93 | loss: 8.140600 | grad_norm: 118.096306 | learning_rate: 0.000001 2025-04-12 14:18:29,876 - INFO - Epoch: 0.93 | loss: 7.402500 | grad_norm: 196.262985 | learning_rate: 0.000001 2025-04-12 14:18:35,102 - INFO - Epoch: 0.93 | loss: 8.216500 | grad_norm: 198.446335 | learning_rate: 0.000001 2025-04-12 14:18:40,018 - INFO - Epoch: 0.93 | loss: 6.490400 | grad_norm: 83.006378 | learning_rate: 0.000001 2025-04-12 14:18:44,967 - INFO - Epoch: 0.93 | loss: 5.733500 | grad_norm: 43.176987 | learning_rate: 0.000001 2025-04-12 14:18:49,634 - INFO - Epoch: 0.93 | loss: 6.314700 | grad_norm: 193.796158 | learning_rate: 0.000001 2025-04-12 14:18:54,904 - INFO - Epoch: 0.93 | loss: 7.178300 | grad_norm: 239.509750 | learning_rate: 0.000001 2025-04-12 14:19:00,262 - INFO - Epoch: 0.94 | loss: 8.183200 | grad_norm: 51.142193 | learning_rate: 0.000001 2025-04-12 14:19:05,722 - INFO - Epoch: 0.94 | loss: 6.534500 | grad_norm: 107.894081 | learning_rate: 0.000001 2025-04-12 14:19:11,105 - INFO - Epoch: 0.94 | loss: 7.978900 | grad_norm: 194.803391 | learning_rate: 0.000001 2025-04-12 14:19:15,935 - INFO - Epoch: 0.94 | loss: 7.314600 | grad_norm: 150.738052 | learning_rate: 0.000001 2025-04-12 14:19:21,010 - INFO - Epoch: 0.94 | loss: 7.619700 | grad_norm: 91.530937 | learning_rate: 0.000001 2025-04-12 14:19:25,963 - INFO - Epoch: 0.94 | loss: 7.612700 | grad_norm: 123.909988 | learning_rate: 0.000001 2025-04-12 14:19:30,842 - INFO - Epoch: 0.94 | loss: 7.588000 | grad_norm: 124.602531 | learning_rate: 0.000001 2025-04-12 14:19:35,469 - INFO - Epoch: 0.94 | loss: 7.383000 | grad_norm: 194.454987 | learning_rate: 0.000001 2025-04-12 14:19:40,096 - INFO - Epoch: 0.94 | loss: 8.792300 | grad_norm: 58.659931 | learning_rate: 0.000001 2025-04-12 14:19:45,003 - INFO - Epoch: 0.94 | loss: 8.920100 | grad_norm: 521.525085 | learning_rate: 0.000001 2025-04-12 14:19:49,707 - INFO - Epoch: 0.95 | loss: 7.501600 | grad_norm: 230.712646 | learning_rate: 0.000001 2025-04-12 14:19:54,896 - INFO - Epoch: 0.95 | loss: 6.190800 | grad_norm: 88.973846 | learning_rate: 0.000001 2025-04-12 14:19:59,924 - INFO - Epoch: 0.95 | loss: 8.467100 | grad_norm: 244.627960 | learning_rate: 0.000001 2025-04-12 14:20:04,758 - INFO - Epoch: 0.95 | loss: 9.058400 | grad_norm: 251.407394 | learning_rate: 0.000001 2025-04-12 14:20:09,910 - INFO - Epoch: 0.95 | loss: 7.327500 | grad_norm: 126.511734 | learning_rate: 0.000001 2025-04-12 14:20:14,860 - INFO - Epoch: 0.95 | loss: 7.926700 | grad_norm: 214.059464 | learning_rate: 0.000001 2025-04-12 14:20:20,007 - INFO - Epoch: 0.95 | loss: 6.433600 | grad_norm: 271.874756 | learning_rate: 0.000001 2025-04-12 14:20:25,067 - INFO - Epoch: 0.95 | loss: 5.129800 | grad_norm: 87.835434 | learning_rate: 0.000001 2025-04-12 14:20:30,044 - INFO - Epoch: 0.95 | loss: 8.453600 | grad_norm: 373.031860 | learning_rate: 0.000001 2025-04-12 14:20:34,868 - INFO - Epoch: 0.95 | loss: 6.622100 | grad_norm: 119.187691 | learning_rate: 0.000001 2025-04-12 14:20:39,867 - INFO - Epoch: 0.96 | loss: 6.336500 | grad_norm: 183.568207 | learning_rate: 0.000000 2025-04-12 14:20:45,056 - INFO - Epoch: 0.96 | loss: 9.619300 | grad_norm: 103.802536 | learning_rate: 0.000000 2025-04-12 14:20:49,879 - INFO - Epoch: 0.96 | loss: 7.241900 | grad_norm: 243.293457 | learning_rate: 0.000000 2025-04-12 14:20:54,703 - INFO - Epoch: 0.96 | loss: 8.985600 | grad_norm: 209.808212 | learning_rate: 0.000000 2025-04-12 14:20:59,300 - INFO - Epoch: 0.96 | loss: 6.505100 | grad_norm: 94.562004 | learning_rate: 0.000000 2025-04-12 14:21:04,275 - INFO - Epoch: 0.96 | loss: 7.103500 | grad_norm: 202.864319 | learning_rate: 0.000000 2025-04-12 14:21:09,071 - INFO - Epoch: 0.96 | loss: 6.935200 | grad_norm: 47.786163 | learning_rate: 0.000000 2025-04-12 14:21:14,106 - INFO - Epoch: 0.96 | loss: 7.916500 | grad_norm: 208.502777 | learning_rate: 0.000000 2025-04-12 14:21:18,598 - INFO - Epoch: 0.96 | loss: 6.134800 | grad_norm: 89.781540 | learning_rate: 0.000000 2025-04-12 14:21:23,208 - INFO - Epoch: 0.96 | loss: 8.848000 | grad_norm: 202.714996 | learning_rate: 0.000000 2025-04-12 14:21:28,266 - INFO - Epoch: 0.97 | loss: 9.311300 | grad_norm: 127.235268 | learning_rate: 0.000000 2025-04-12 14:21:33,590 - INFO - Epoch: 0.97 | loss: 7.311700 | grad_norm: 72.903061 | learning_rate: 0.000000 2025-04-12 14:21:38,397 - INFO - Epoch: 0.97 | loss: 7.798900 | grad_norm: 107.339798 | learning_rate: 0.000000 2025-04-12 14:21:43,109 - INFO - Epoch: 0.97 | loss: 7.758400 | grad_norm: 129.622620 | learning_rate: 0.000000 2025-04-12 14:21:47,880 - INFO - Epoch: 0.97 | loss: 5.503200 | grad_norm: 106.414665 | learning_rate: 0.000000 2025-04-12 14:21:52,674 - INFO - Epoch: 0.97 | loss: 6.194600 | grad_norm: 127.979538 | learning_rate: 0.000000 2025-04-12 14:21:57,660 - INFO - Epoch: 0.97 | loss: 5.379300 | grad_norm: 198.786392 | learning_rate: 0.000000 2025-04-12 14:22:03,293 - INFO - Epoch: 0.97 | loss: 8.289700 | grad_norm: 59.276703 | learning_rate: 0.000000 2025-04-12 14:22:08,375 - INFO - Epoch: 0.97 | loss: 6.626300 | grad_norm: 139.981506 | learning_rate: 0.000000 2025-04-12 14:22:13,232 - INFO - Epoch: 0.98 | loss: 8.858000 | grad_norm: 104.756424 | learning_rate: 0.000000 2025-04-12 14:22:18,418 - INFO - Epoch: 0.98 | loss: 6.849200 | grad_norm: 137.016266 | learning_rate: 0.000000 2025-04-12 14:22:23,487 - INFO - Epoch: 0.98 | loss: 5.274200 | grad_norm: 174.497940 | learning_rate: 0.000000 2025-04-12 14:22:28,694 - INFO - Epoch: 0.98 | loss: 9.262600 | grad_norm: 178.315582 | learning_rate: 0.000000 2025-04-12 14:22:33,510 - INFO - Epoch: 0.98 | loss: 8.400300 | grad_norm: 220.358093 | learning_rate: 0.000000 2025-04-12 14:22:38,606 - INFO - Epoch: 0.98 | loss: 10.565900 | grad_norm: 254.064590 | learning_rate: 0.000000 2025-04-12 14:22:43,493 - INFO - Epoch: 0.98 | loss: 8.107400 | grad_norm: 138.445312 | learning_rate: 0.000000 2025-04-12 14:22:48,367 - INFO - Epoch: 0.98 | loss: 6.041900 | grad_norm: 84.856750 | learning_rate: 0.000000 2025-04-12 14:22:52,821 - INFO - Epoch: 0.98 | loss: 6.063000 | grad_norm: 69.788055 | learning_rate: 0.000000 2025-04-12 14:22:58,450 - INFO - Epoch: 0.98 | loss: 8.799900 | grad_norm: 193.120270 | learning_rate: 0.000000 2025-04-12 14:23:03,334 - INFO - Epoch: 0.99 | loss: 6.983900 | grad_norm: 116.685722 | learning_rate: 0.000000 2025-04-12 14:23:08,308 - INFO - Epoch: 0.99 | loss: 7.198200 | grad_norm: 100.424492 | learning_rate: 0.000000 2025-04-12 14:23:13,148 - INFO - Epoch: 0.99 | loss: 6.251300 | grad_norm: 147.272476 | learning_rate: 0.000000 2025-04-12 14:23:18,117 - INFO - Epoch: 0.99 | loss: 6.519300 | grad_norm: 92.330826 | learning_rate: 0.000000 2025-04-12 14:23:22,680 - INFO - Epoch: 0.99 | loss: 6.164100 | grad_norm: 217.873428 | learning_rate: 0.000000 2025-04-12 14:23:27,665 - INFO - Epoch: 0.99 | loss: 6.228900 | grad_norm: 63.586498 | learning_rate: 0.000000 2025-04-12 14:23:32,380 - INFO - Epoch: 0.99 | loss: 7.699100 | grad_norm: 102.415344 | learning_rate: 0.000000 2025-04-12 14:23:37,230 - INFO - Epoch: 0.99 | loss: 7.476600 | grad_norm: 105.151733 | learning_rate: 0.000000 2025-04-12 14:23:41,759 - INFO - Epoch: 0.99 | loss: 7.364900 | grad_norm: 86.492447 | learning_rate: 0.000000 2025-04-12 14:23:46,671 - INFO - Epoch: 0.99 | loss: 6.775000 | grad_norm: 170.540863 | learning_rate: 0.000000 2025-04-12 14:23:52,201 - INFO - Epoch: 1.00 | loss: 7.498300 | grad_norm: 91.406471 | learning_rate: 0.000000 2025-04-12 14:23:56,964 - INFO - Epoch: 1.00 | loss: 7.756500 | grad_norm: 220.808762 | learning_rate: 0.000000 2025-04-12 14:24:01,957 - INFO - Epoch: 1.00 | loss: 7.139200 | grad_norm: 103.799271 | learning_rate: 0.000000 2025-04-12 14:24:07,216 - INFO - Epoch: 1.00 | loss: 6.632100 | grad_norm: 238.658905 | learning_rate: 0.000000 2025-04-12 14:24:12,291 - INFO - Epoch: 1.00 | loss: 6.822700 | grad_norm: 134.274002 | learning_rate: 0.000000 2025-04-12 14:24:13,014 - INFO - Epoch: 1.00 | train_runtime: 4818.696700 | train_samples_per_second: 6.059000 | train_steps_per_second: 2.020000 | total_flos: 0.000000 | train_loss: 10.142912 2025-04-12 14:24:13,015 - INFO - Training complete. Attempting to save final model... 2025-04-12 14:24:13,015 - INFO - Attempting standard model save... 2025-04-12 14:24:14,063 - INFO - Model successfully saved with standard method to gliner_finetuned_20250412_130340 2025-04-12 14:24:14,064 - INFO - Model successfully saved to gliner_finetuned_20250412_130340 2025-04-12 14:24:14,064 - INFO - Testing the saved model... 2025-04-12 14:24:14,064 - INFO - Testing the saved model... 2025-04-12 14:24:17,600 - INFO - Model loaded successfully with standard method 2025-04-12 14:24:17,601 - INFO - Running prediction on test text... 2025-04-12 14:24:18,745 - INFO - Predicted entities: 2025-04-12 14:24:18,745 - INFO - Michael Johnson => PERSON 2025-04-12 14:24:18,745 - INFO - 15.04.2025 => DATE_TIME 2025-04-12 14:24:18,745 - INFO - Sarah Williams => PERSON 2025-04-12 14:24:18,745 - INFO - 123 Pine Street, Oslo => NO_ADDRESS 2025-04-12 14:24:18,745 - INFO - +47 98765432 => NO_PHONE_NUMBER 2025-04-12 14:24:18,745 - INFO - sarah.w@example.com => EMAIL_ADDRESS 2025-04-12 14:24:18,745 - INFO - seasonal allergies => HEALTH_INFO 2025-04-12 14:24:18,745 - INFO - Model testing completed successfully! 2025-04-12 14:24:18,946 - INFO - Model testing completed successfully! 2025-04-12 14:24:18,946 - INFO - --------------------------------------- 2025-04-12 14:24:18,946 - INFO - Final Model Evaluation Test 2025-04-12 14:24:18,946 - INFO - --------------------------------------- 2025-04-12 14:24:22,612 - INFO - Running comprehensive entity detection test... 2025-04-12 14:24:24,733 - INFO - Detected entities in comprehensive test: 2025-04-12 14:24:24,733 - INFO - Kristian Hansen => PERSON (confidence: 0.993) 2025-04-12 14:24:24,733 - INFO - 15.03.2024 10:45 => DATE_TIME (confidence: 0.979) 2025-04-12 14:24:24,733 - INFO - Maria Olsen => PERSON (confidence: 0.995) 2025-04-12 14:24:24,733 - INFO - Skogveien 8, 5020 Bergen => NO_ADDRESS (confidence: 0.986) 2025-04-12 14:24:24,733 - INFO - 98765432 => NO_PHONE_NUMBER (confidence: 0.981) 2025-04-12 14:24:24,733 - INFO - Veterinær => EMPLOYMENT_INFO (confidence: 0.547) 2025-04-12 14:24:24,733 - INFO - Jonas Nilsen => PERSON (confidence: 0.908) 2025-04-12 14:24:24,733 - INFO - mulig dyrevernsak => CONTEXT_SENSITIVE (confidence: 0.457) 2025-04-12 14:24:24,733 - INFO - 20.03.2024 => DATE_TIME (confidence: 0.888) 2025-04-12 14:24:24,734 - INFO - 10.03.2024 => DATE_TIME (confidence: 0.878) 2025-04-12 14:24:24,734 - INFO - Sofie Andersen => PERSON (confidence: 0.983) 2025-04-12 14:24:24,734 - INFO - sofie.a@example.org => EMAIL_ADDRESS (confidence: 0.999) 2025-04-12 14:24:24,734 - INFO - Training and evaluation completed. Model is ready for use. 2025-04-12 14:24:24,734 - INFO - =============================================== 2025-04-12 14:24:24,734 - INFO - GLiNER training process complete. Output directory: gliner_finetuned_20250412_130340 2025-04-12 14:24:24,734 - INFO - ===============================================