2025-04-12 14:31:41,929 - INFO - Starting GLiNER fine-tuning at 20250412_143141 2025-04-12 14:31:41,930 - INFO - Output directory: gliner_finetuned_20250412_143141 2025-04-12 14:31:41,930 - INFO - Loading data from ./gliner_dataset_fixed.json... 2025-04-12 14:31:46,675 - INFO - Successfully loaded 32439 samples from ./gliner_dataset_fixed.json 2025-04-12 14:31:46,695 - INFO - Sequence Length Analysis: 2025-04-12 14:31:46,695 - INFO - Minimum length: 246 2025-04-12 14:31:46,695 - INFO - Maximum length: 1286 2025-04-12 14:31:46,695 - INFO - Mean length: 450.5 2025-04-12 14:31:46,695 - INFO - Median length: 447 2025-04-12 14:31:46,695 - INFO - 90th percentile: 540 2025-04-12 14:31:46,695 - INFO - 95th percentile: 567 2025-04-12 14:31:46,695 - INFO - 99th percentile: 636 2025-04-12 14:31:46,695 - INFO - Sequences exceeding 1024 tokens: 19 2025-04-12 14:31:46,695 - INFO - Sequences exceeding 2048 tokens: 0 2025-04-12 14:31:46,695 - INFO - Recommended max_len setting: 640 2025-04-12 14:31:46,695 - INFO - Using maximum sequence length: 640 2025-04-12 14:31:46,759 - INFO - Extracted entity types: ['AGE', 'AGE_INFO', 'ANIMAL_INFO', 'BEHAVIORAL_PATTERN', 'CONTEXT_SENSITIVE', 'CRIMINAL_RECORD', 'DATE_TIME', 'ECONOMIC_STATUS', 'EMAIL_ADDRESS', 'EMPLOYMENT_INFO', 'FAMILY_RELATION', 'FINANCIAL_INFO', 'GOV_ID', 'HEALTH_INFO', 'IDENTIFIABLE_IMAGE', 'NO_ADDRESS', 'NO_PHONE_NUMBER', 'PERSON', 'POLITICAL_CASE', 'POSTAL_CODE', 'SEXUAL_ORIENTATION'] 2025-04-12 14:31:46,868 - INFO - Dataset Statistics: 2025-04-12 14:31:46,869 - INFO - Total samples: 32439 2025-04-12 14:31:46,869 - INFO - Samples with entities: 32439 (100.0%) 2025-04-12 14:31:46,869 - INFO - Total entities: 583306 2025-04-12 14:31:46,869 - INFO - Average entities per sample: 17.98 2025-04-12 14:31:46,869 - INFO - Entity type distribution: 2025-04-12 14:31:46,869 - INFO - PERSON: 80730 (13.8%) 2025-04-12 14:31:46,869 - INFO - DATE_TIME: 76795 (13.2%) 2025-04-12 14:31:46,869 - INFO - HEALTH_INFO: 52559 (9.0%) 2025-04-12 14:31:46,869 - INFO - GOV_ID: 47823 (8.2%) 2025-04-12 14:31:46,869 - INFO - NO_ADDRESS: 45076 (7.7%) 2025-04-12 14:31:46,869 - INFO - CRIMINAL_RECORD: 37081 (6.4%) 2025-04-12 14:31:46,869 - INFO - NO_PHONE_NUMBER: 32198 (5.5%) 2025-04-12 14:31:46,869 - INFO - EMAIL_ADDRESS: 30921 (5.3%) 2025-04-12 14:31:46,869 - INFO - FAMILY_RELATION: 29615 (5.1%) 2025-04-12 14:31:46,869 - INFO - CONTEXT_SENSITIVE: 23812 (4.1%) 2025-04-12 14:31:46,869 - INFO - EMPLOYMENT_INFO: 22452 (3.8%) 2025-04-12 14:31:46,869 - INFO - FINANCIAL_INFO: 20784 (3.6%) 2025-04-12 14:31:46,869 - INFO - POLITICAL_CASE: 18895 (3.2%) 2025-04-12 14:31:46,869 - INFO - BEHAVIORAL_PATTERN: 15996 (2.7%) 2025-04-12 14:31:46,869 - INFO - ECONOMIC_STATUS: 14990 (2.6%) 2025-04-12 14:31:46,869 - INFO - IDENTIFIABLE_IMAGE: 14825 (2.5%) 2025-04-12 14:31:46,869 - INFO - SEXUAL_ORIENTATION: 11758 (2.0%) 2025-04-12 14:31:46,869 - INFO - POSTAL_CODE: 6985 (1.2%) 2025-04-12 14:31:46,869 - INFO - ANIMAL_INFO: 6 (0.0%) 2025-04-12 14:31:46,869 - INFO - AGE: 3 (0.0%) 2025-04-12 14:31:46,869 - INFO - AGE_INFO: 2 (0.0%) 2025-04-12 14:31:46,885 - INFO - Dataset split: 29195 training samples, 3244 validation samples 2025-04-12 14:31:48,013 - INFO - Successfully imported GLiNER components 2025-04-12 14:31:48,069 - INFO - Using device: cuda 2025-04-12 14:31:48,070 - INFO - Estimated GPU memory requirement: 1.1 GB with batch size 4 2025-04-12 14:31:48,070 - INFO - Loading base model: urchade/gliner_multi_pii-v1 with max_length=640 2025-04-12 14:31:54,988 - INFO - Updating model configuration with max_len=640 2025-04-12 14:31:55,341 - INFO - Starting optimized training for 5 epochs... 2025-04-12 14:31:55,341 - INFO - Training with batch size 4 × 2 gradient accumulation steps = effective batch size of 8 2025-04-12 14:31:55,341 - INFO - Learning rate: 1e-05, Label smoothing: 0.05 2025-04-12 14:31:55,341 - INFO - Mixed precision training: Enabled 2025-04-12 14:32:03,381 - INFO - Epoch: 0.00 | loss: 0.000400 | grad_norm: 0.008976 | learning_rate: 0.000000 2025-04-12 14:32:09,447 - INFO - Epoch: 0.01 | loss: 0.000400 | grad_norm: 0.007134 | learning_rate: 0.000000 2025-04-12 14:32:15,386 - INFO - Epoch: 0.01 | loss: 0.000300 | grad_norm: 0.006254 | learning_rate: 0.000000 2025-04-12 14:32:21,817 - INFO - Epoch: 0.01 | loss: 0.000300 | grad_norm: 0.009069 | learning_rate: 0.000000 2025-04-12 14:32:28,159 - INFO - Epoch: 0.01 | loss: 0.000300 | grad_norm: 0.008129 | learning_rate: 0.000001 2025-04-12 14:32:33,997 - INFO - Epoch: 0.02 | loss: 0.000300 | grad_norm: 0.006472 | learning_rate: 0.000001 2025-04-12 14:32:39,901 - INFO - Epoch: 0.02 | loss: 0.000300 | grad_norm: 0.005810 | learning_rate: 0.000001 2025-04-12 14:32:46,178 - INFO - Epoch: 0.02 | loss: 0.000300 | grad_norm: 0.004463 | learning_rate: 0.000001 2025-04-12 14:32:52,421 - INFO - Epoch: 0.02 | loss: 0.000200 | grad_norm: 0.005331 | learning_rate: 0.000001 2025-04-12 14:32:58,572 - INFO - Epoch: 0.03 | loss: 0.000200 | grad_norm: 0.004509 | learning_rate: 0.000001 2025-04-12 14:33:04,756 - INFO - Epoch: 0.03 | loss: 0.000200 | grad_norm: 0.003401 | learning_rate: 0.000001 2025-04-12 14:33:10,803 - INFO - Epoch: 0.03 | loss: 0.000200 | grad_norm: 0.003334 | learning_rate: 0.000001 2025-04-12 14:33:16,764 - INFO - Epoch: 0.04 | loss: 0.000200 | grad_norm: 0.003013 | learning_rate: 0.000001 2025-04-12 14:33:22,726 - INFO - Epoch: 0.04 | loss: 0.000200 | grad_norm: 0.002217 | learning_rate: 0.000002 2025-04-12 14:33:28,606 - INFO - Epoch: 0.04 | loss: 0.000200 | grad_norm: 0.002430 | learning_rate: 0.000002 2025-04-12 14:33:34,527 - INFO - Epoch: 0.04 | loss: 0.000200 | grad_norm: 0.001940 | learning_rate: 0.000002 2025-04-12 14:33:40,758 - INFO - Epoch: 0.05 | loss: 0.000200 | grad_norm: 0.002029 | learning_rate: 0.000002 2025-04-12 14:33:46,605 - INFO - Epoch: 0.05 | loss: 0.000200 | grad_norm: 0.002110 | learning_rate: 0.000002 2025-04-12 14:33:52,921 - INFO - Epoch: 0.05 | loss: 0.000100 | grad_norm: 0.001823 | learning_rate: 0.000002 2025-04-12 14:33:59,123 - INFO - Epoch: 0.05 | loss: 0.000200 | grad_norm: 0.002083 | learning_rate: 0.000002 2025-04-12 14:34:05,061 - INFO - Epoch: 0.06 | loss: 0.000100 | grad_norm: 0.001646 | learning_rate: 0.000002 2025-04-12 14:34:10,665 - INFO - Epoch: 0.06 | loss: 0.000100 | grad_norm: 0.001561 | learning_rate: 0.000002 2025-04-12 14:34:16,905 - INFO - Epoch: 0.06 | loss: 0.000100 | grad_norm: 0.001643 | learning_rate: 0.000003 2025-04-12 14:34:22,238 - INFO - Epoch: 0.07 | loss: 0.000100 | grad_norm: 0.001952 | learning_rate: 0.000003 2025-04-12 14:34:28,066 - INFO - Epoch: 0.07 | loss: 0.000100 | grad_norm: 0.001721 | learning_rate: 0.000003 2025-04-12 14:34:34,281 - INFO - Epoch: 0.07 | loss: 0.000100 | grad_norm: 0.002044 | learning_rate: 0.000003 2025-04-12 14:34:40,482 - INFO - Epoch: 0.07 | loss: 0.000100 | grad_norm: 0.002142 | learning_rate: 0.000003 2025-04-12 14:34:46,491 - INFO - Epoch: 0.08 | loss: 0.000100 | grad_norm: 0.001618 | learning_rate: 0.000003 2025-04-12 14:34:52,586 - INFO - Epoch: 0.08 | loss: 0.000100 | grad_norm: 0.001914 | learning_rate: 0.000003 2025-04-12 14:34:58,601 - INFO - Epoch: 0.08 | loss: 0.000100 | grad_norm: 0.001440 | learning_rate: 0.000003 2025-04-12 14:35:04,726 - INFO - Epoch: 0.08 | loss: 0.000100 | grad_norm: 0.002663 | learning_rate: 0.000003 2025-04-12 14:35:10,893 - INFO - Epoch: 0.09 | loss: 0.000100 | grad_norm: 0.001744 | learning_rate: 0.000003 2025-04-12 14:35:16,323 - INFO - Epoch: 0.09 | loss: 0.000100 | grad_norm: 0.002244 | learning_rate: 0.000004 2025-04-12 14:35:22,657 - INFO - Epoch: 0.09 | loss: 0.000100 | grad_norm: 0.001388 | learning_rate: 0.000004 2025-04-12 14:35:28,894 - INFO - Epoch: 0.10 | loss: 0.000100 | grad_norm: 0.001788 | learning_rate: 0.000004 2025-04-12 14:35:34,940 - INFO - Epoch: 0.10 | loss: 0.000100 | grad_norm: 0.001770 | learning_rate: 0.000004 2025-04-12 14:35:40,801 - INFO - Epoch: 0.10 | loss: 0.000100 | grad_norm: 0.001854 | learning_rate: 0.000004 2025-04-12 14:35:46,836 - INFO - Epoch: 0.10 | loss: 0.000100 | grad_norm: 0.001835 | learning_rate: 0.000004 2025-04-12 14:35:53,294 - INFO - Epoch: 0.11 | loss: 0.000100 | grad_norm: 0.001691 | learning_rate: 0.000004 2025-04-12 14:35:59,123 - INFO - Epoch: 0.11 | loss: 0.000100 | grad_norm: 0.001461 | learning_rate: 0.000004 2025-04-12 14:36:05,353 - INFO - Epoch: 0.11 | loss: 0.000100 | grad_norm: 0.002840 | learning_rate: 0.000004 2025-04-12 14:36:11,644 - INFO - Epoch: 0.12 | loss: 0.000100 | grad_norm: 0.001457 | learning_rate: 0.000005 2025-04-12 14:36:17,999 - INFO - Epoch: 0.12 | loss: 0.000100 | grad_norm: 0.002060 | learning_rate: 0.000005 2025-04-12 14:36:24,345 - INFO - Epoch: 0.12 | loss: 0.000100 | grad_norm: 0.002975 | learning_rate: 0.000005 2025-04-12 14:36:30,444 - INFO - Epoch: 0.12 | loss: 0.000100 | grad_norm: 0.001741 | learning_rate: 0.000005 2025-04-12 14:36:36,112 - INFO - Epoch: 0.13 | loss: 0.000100 | grad_norm: 0.001985 | learning_rate: 0.000005 2025-04-12 14:36:42,332 - INFO - Epoch: 0.13 | loss: 0.000100 | grad_norm: 0.002438 | learning_rate: 0.000005 2025-04-12 14:36:48,325 - INFO - Epoch: 0.13 | loss: 0.000100 | grad_norm: 0.001881 | learning_rate: 0.000005 2025-04-12 14:36:54,505 - INFO - Epoch: 0.13 | loss: 0.000100 | grad_norm: 0.001800 | learning_rate: 0.000005 2025-04-12 14:37:00,894 - INFO - Epoch: 0.14 | loss: 0.000100 | grad_norm: 0.001818 | learning_rate: 0.000005 2025-04-12 14:37:07,094 - INFO - Epoch: 0.14 | loss: 0.000100 | grad_norm: 0.001949 | learning_rate: 0.000006 2025-04-12 14:37:13,168 - INFO - Epoch: 0.14 | loss: 0.000100 | grad_norm: 0.001848 | learning_rate: 0.000006 2025-04-12 14:37:19,084 - INFO - Epoch: 0.15 | loss: 0.000100 | grad_norm: 0.002152 | learning_rate: 0.000006 2025-04-12 14:37:25,642 - INFO - Epoch: 0.15 | loss: 0.000100 | grad_norm: 0.002295 | learning_rate: 0.000006 2025-04-12 14:37:31,895 - INFO - Epoch: 0.15 | loss: 0.000100 | grad_norm: 0.001442 | learning_rate: 0.000006 2025-04-12 14:37:37,925 - INFO - Epoch: 0.15 | loss: 0.000100 | grad_norm: 0.001857 | learning_rate: 0.000006 2025-04-12 14:37:44,367 - INFO - Epoch: 0.16 | loss: 0.000100 | grad_norm: 0.001360 | learning_rate: 0.000006 2025-04-12 14:37:50,484 - INFO - Epoch: 0.16 | loss: 0.000100 | grad_norm: 0.002232 | learning_rate: 0.000006 2025-04-12 14:37:55,972 - INFO - Epoch: 0.16 | loss: 0.000100 | grad_norm: 0.001800 | learning_rate: 0.000006 2025-04-12 14:38:02,202 - INFO - Epoch: 0.16 | loss: 0.000100 | grad_norm: 0.001355 | learning_rate: 0.000007 2025-04-12 14:38:08,352 - INFO - Epoch: 0.17 | loss: 0.000100 | grad_norm: 0.004660 | learning_rate: 0.000007 2025-04-12 14:38:14,515 - INFO - Epoch: 0.17 | loss: 0.000100 | grad_norm: 0.001644 | learning_rate: 0.000007 2025-04-12 14:38:20,561 - INFO - Epoch: 0.17 | loss: 0.000100 | grad_norm: 0.001328 | learning_rate: 0.000007 2025-04-12 14:38:26,969 - INFO - Epoch: 0.18 | loss: 0.000100 | grad_norm: 0.001895 | learning_rate: 0.000007 2025-04-12 14:38:33,242 - INFO - Epoch: 0.18 | loss: 0.000100 | grad_norm: 0.001222 | learning_rate: 0.000007 2025-04-12 14:38:39,200 - INFO - Epoch: 0.18 | loss: 0.000100 | grad_norm: 0.001480 | learning_rate: 0.000007 2025-04-12 14:38:45,466 - INFO - Epoch: 0.18 | loss: 0.000100 | grad_norm: 0.002608 | learning_rate: 0.000007 2025-04-12 14:38:51,797 - INFO - Epoch: 0.19 | loss: 0.000100 | grad_norm: 0.001474 | learning_rate: 0.000007 2025-04-12 14:38:57,893 - INFO - Epoch: 0.19 | loss: 0.000100 | grad_norm: 0.001641 | learning_rate: 0.000008 2025-04-12 14:39:03,922 - INFO - Epoch: 0.19 | loss: 0.000100 | grad_norm: 0.001833 | learning_rate: 0.000008 2025-04-12 14:39:10,203 - INFO - Epoch: 0.19 | loss: 0.000100 | grad_norm: 0.001808 | learning_rate: 0.000008 2025-04-12 14:39:16,364 - INFO - Epoch: 0.20 | loss: 0.000100 | grad_norm: 0.003410 | learning_rate: 0.000008 2025-04-12 14:39:22,369 - INFO - Epoch: 0.20 | loss: 0.000100 | grad_norm: 0.001262 | learning_rate: 0.000008 2025-04-12 14:39:28,525 - INFO - Epoch: 0.20 | loss: 0.000100 | grad_norm: 0.001467 | learning_rate: 0.000008 2025-04-12 14:39:35,117 - INFO - Epoch: 0.21 | loss: 0.000100 | grad_norm: 0.001122 | learning_rate: 0.000008 2025-04-12 14:39:40,742 - INFO - Epoch: 0.21 | loss: 0.000100 | grad_norm: 0.001331 | learning_rate: 0.000008 2025-04-12 14:39:46,898 - INFO - Epoch: 0.21 | loss: 0.000100 | grad_norm: 0.001877 | learning_rate: 0.000008 2025-04-12 14:39:52,798 - INFO - Epoch: 0.21 | loss: 0.000100 | grad_norm: 0.001567 | learning_rate: 0.000009 2025-04-12 14:39:58,711 - INFO - Epoch: 0.22 | loss: 0.000100 | grad_norm: 0.001318 | learning_rate: 0.000009 2025-04-12 14:40:05,071 - INFO - Epoch: 0.22 | loss: 0.000100 | grad_norm: 0.001132 | learning_rate: 0.000009 2025-04-12 14:40:11,380 - INFO - Epoch: 0.22 | loss: 0.000100 | grad_norm: 0.001585 | learning_rate: 0.000009 2025-04-12 14:40:17,357 - INFO - Epoch: 0.22 | loss: 0.000100 | grad_norm: 0.001620 | learning_rate: 0.000009 2025-04-12 14:40:23,548 - INFO - Epoch: 0.23 | loss: 0.000100 | grad_norm: 0.001960 | learning_rate: 0.000009 2025-04-12 14:40:29,804 - INFO - Epoch: 0.23 | loss: 0.000100 | grad_norm: 0.001303 | learning_rate: 0.000009 2025-04-12 14:40:35,664 - INFO - Epoch: 0.23 | loss: 0.000100 | grad_norm: 0.001238 | learning_rate: 0.000009 2025-04-12 14:40:41,968 - INFO - Epoch: 0.24 | loss: 0.000100 | grad_norm: 0.001544 | learning_rate: 0.000009 2025-04-12 14:40:48,206 - INFO - Epoch: 0.24 | loss: 0.000100 | grad_norm: 0.001317 | learning_rate: 0.000010 2025-04-12 14:40:54,448 - INFO - Epoch: 0.24 | loss: 0.000100 | grad_norm: 0.001783 | learning_rate: 0.000010 2025-04-12 14:40:59,976 - INFO - Epoch: 0.24 | loss: 0.000100 | grad_norm: 0.001593 | learning_rate: 0.000010 2025-04-12 14:41:05,881 - INFO - Epoch: 0.25 | loss: 0.000100 | grad_norm: 0.001159 | learning_rate: 0.000010 2025-04-12 14:41:11,922 - INFO - Epoch: 0.25 | loss: 0.000100 | grad_norm: 0.001280 | learning_rate: 0.000010 2025-04-12 14:41:17,907 - INFO - Epoch: 0.25 | loss: 0.000100 | grad_norm: 0.001046 | learning_rate: 0.000010 2025-04-12 14:41:24,173 - INFO - Epoch: 0.25 | loss: 0.000100 | grad_norm: 0.000999 | learning_rate: 0.000010 2025-04-12 14:41:30,059 - INFO - Epoch: 0.26 | loss: 0.000100 | grad_norm: 0.001076 | learning_rate: 0.000010 2025-04-12 14:41:35,986 - INFO - Epoch: 0.26 | loss: 0.000100 | grad_norm: 0.001942 | learning_rate: 0.000010 2025-04-12 14:41:42,058 - INFO - Epoch: 0.26 | loss: 0.000100 | grad_norm: 0.001158 | learning_rate: 0.000011 2025-04-12 14:41:48,410 - INFO - Epoch: 0.27 | loss: 0.000100 | grad_norm: 0.001727 | learning_rate: 0.000011 2025-04-12 14:41:54,098 - INFO - Epoch: 0.27 | loss: 0.000100 | grad_norm: 0.002012 | learning_rate: 0.000011 2025-04-12 14:42:00,065 - INFO - Epoch: 0.27 | loss: 0.000100 | grad_norm: 0.001077 | learning_rate: 0.000011 2025-04-12 14:42:06,506 - INFO - Epoch: 0.27 | loss: 0.000100 | grad_norm: 0.001535 | learning_rate: 0.000011 2025-04-12 14:42:12,353 - INFO - Epoch: 0.28 | loss: 0.000100 | grad_norm: 0.001791 | learning_rate: 0.000011 2025-04-12 14:42:17,953 - INFO - Epoch: 0.28 | loss: 0.000100 | grad_norm: 0.001150 | learning_rate: 0.000011 2025-04-12 14:42:24,147 - INFO - Epoch: 0.28 | loss: 0.000100 | grad_norm: 0.000983 | learning_rate: 0.000011 2025-04-12 14:42:30,353 - INFO - Epoch: 0.28 | loss: 0.000100 | grad_norm: 0.001245 | learning_rate: 0.000011 2025-04-12 14:42:36,678 - INFO - Epoch: 0.29 | loss: 0.000100 | grad_norm: 0.000903 | learning_rate: 0.000011 2025-04-12 14:42:42,576 - INFO - Epoch: 0.29 | loss: 0.000100 | grad_norm: 0.001210 | learning_rate: 0.000012 2025-04-12 14:42:48,675 - INFO - Epoch: 0.29 | loss: 0.000100 | grad_norm: 0.001497 | learning_rate: 0.000012 2025-04-12 14:42:54,590 - INFO - Epoch: 0.30 | loss: 0.000100 | grad_norm: 0.001691 | learning_rate: 0.000012 2025-04-12 14:43:00,914 - INFO - Epoch: 0.30 | loss: 0.000100 | grad_norm: 0.001297 | learning_rate: 0.000012 2025-04-12 14:43:07,336 - INFO - Epoch: 0.30 | loss: 0.000100 | grad_norm: 0.000775 | learning_rate: 0.000012 2025-04-12 14:43:13,450 - INFO - Epoch: 0.30 | loss: 0.000100 | grad_norm: 0.000532 | learning_rate: 0.000012 2025-04-12 14:43:19,614 - INFO - Epoch: 0.31 | loss: 0.000100 | grad_norm: 0.001167 | learning_rate: 0.000012 2025-04-12 14:43:25,552 - INFO - Epoch: 0.31 | loss: 0.000100 | grad_norm: 0.001218 | learning_rate: 0.000012 2025-04-12 14:43:31,502 - INFO - Epoch: 0.31 | loss: 0.000100 | grad_norm: 0.001191 | learning_rate: 0.000012 2025-04-12 14:43:37,714 - INFO - Epoch: 0.32 | loss: 0.000100 | grad_norm: 0.000898 | learning_rate: 0.000013 2025-04-12 14:43:43,738 - INFO - Epoch: 0.32 | loss: 0.000100 | grad_norm: 0.000880 | learning_rate: 0.000013 2025-04-12 14:43:49,857 - INFO - Epoch: 0.32 | loss: 0.000100 | grad_norm: 0.000769 | learning_rate: 0.000013 2025-04-12 14:43:55,966 - INFO - Epoch: 0.32 | loss: 0.000100 | grad_norm: 0.000982 | learning_rate: 0.000013 2025-04-12 14:44:02,276 - INFO - Epoch: 0.33 | loss: 0.000100 | grad_norm: 0.001227 | learning_rate: 0.000013 2025-04-12 14:44:08,995 - INFO - Epoch: 0.33 | loss: 0.000100 | grad_norm: 0.001221 | learning_rate: 0.000013 2025-04-12 14:44:15,386 - INFO - Epoch: 0.33 | loss: 0.000100 | grad_norm: 0.001244 | learning_rate: 0.000013 2025-04-12 14:44:21,493 - INFO - Epoch: 0.33 | loss: 0.000100 | grad_norm: 0.000863 | learning_rate: 0.000013 2025-04-12 14:44:27,386 - INFO - Epoch: 0.34 | loss: 0.000100 | grad_norm: 0.001124 | learning_rate: 0.000013 2025-04-12 14:44:33,754 - INFO - Epoch: 0.34 | loss: 0.000100 | grad_norm: 0.001080 | learning_rate: 0.000014 2025-04-12 14:44:40,027 - INFO - Epoch: 0.34 | loss: 0.000100 | grad_norm: 0.001340 | learning_rate: 0.000014 2025-04-12 14:44:45,951 - INFO - Epoch: 0.35 | loss: 0.000100 | grad_norm: 0.001143 | learning_rate: 0.000014 2025-04-12 14:44:52,223 - INFO - Epoch: 0.35 | loss: 0.000100 | grad_norm: 0.000837 | learning_rate: 0.000014 2025-04-12 14:44:58,340 - INFO - Epoch: 0.35 | loss: 0.000100 | grad_norm: 0.000914 | learning_rate: 0.000014 2025-04-12 14:45:04,352 - INFO - Epoch: 0.35 | loss: 0.000100 | grad_norm: 0.002157 | learning_rate: 0.000014 2025-04-12 14:45:10,419 - INFO - Epoch: 0.36 | loss: 0.000100 | grad_norm: 0.001375 | learning_rate: 0.000014 2025-04-12 14:45:16,559 - INFO - Epoch: 0.36 | loss: 0.000100 | grad_norm: 0.001066 | learning_rate: 0.000014 2025-04-12 14:45:22,654 - INFO - Epoch: 0.36 | loss: 0.000100 | grad_norm: 0.000823 | learning_rate: 0.000014 2025-04-12 14:45:28,678 - INFO - Epoch: 0.36 | loss: 0.000100 | grad_norm: 0.000836 | learning_rate: 0.000015 2025-04-12 14:45:34,820 - INFO - Epoch: 0.37 | loss: 0.000100 | grad_norm: 0.001046 | learning_rate: 0.000015 2025-04-12 14:45:41,044 - INFO - Epoch: 0.37 | loss: 0.000100 | grad_norm: 0.001884 | learning_rate: 0.000015 2025-04-12 14:45:47,517 - INFO - Epoch: 0.37 | loss: 0.000100 | grad_norm: 0.000651 | learning_rate: 0.000015 2025-04-12 14:45:53,562 - INFO - Epoch: 0.38 | loss: 0.000100 | grad_norm: 0.000990 | learning_rate: 0.000015 2025-04-12 14:45:59,556 - INFO - Epoch: 0.38 | loss: 0.000100 | grad_norm: 0.000727 | learning_rate: 0.000015 2025-04-12 14:46:05,670 - INFO - Epoch: 0.38 | loss: 0.000100 | grad_norm: 0.001181 | learning_rate: 0.000015 2025-04-12 14:46:12,105 - INFO - Epoch: 0.38 | loss: 0.000100 | grad_norm: 0.001005 | learning_rate: 0.000015 2025-04-12 14:46:17,883 - INFO - Epoch: 0.39 | loss: 0.000100 | grad_norm: 0.000802 | learning_rate: 0.000015 2025-04-12 14:46:23,858 - INFO - Epoch: 0.39 | loss: 0.000100 | grad_norm: 0.001625 | learning_rate: 0.000016 2025-04-12 14:46:30,063 - INFO - Epoch: 0.39 | loss: 0.000100 | grad_norm: 0.001142 | learning_rate: 0.000016 2025-04-12 14:46:36,270 - INFO - Epoch: 0.39 | loss: 0.000100 | grad_norm: 0.001161 | learning_rate: 0.000016 2025-04-12 14:46:42,556 - INFO - Epoch: 0.40 | loss: 0.000100 | grad_norm: 0.001298 | learning_rate: 0.000016 2025-04-12 14:46:48,180 - INFO - Epoch: 0.40 | loss: 0.000100 | grad_norm: 0.000748 | learning_rate: 0.000016 2025-04-12 14:46:54,236 - INFO - Epoch: 0.40 | loss: 0.000100 | grad_norm: 0.000847 | learning_rate: 0.000016 2025-04-12 14:47:00,440 - INFO - Epoch: 0.41 | loss: 0.000100 | grad_norm: 0.001093 | learning_rate: 0.000016 2025-04-12 14:47:06,374 - INFO - Epoch: 0.41 | loss: 0.000100 | grad_norm: 0.001557 | learning_rate: 0.000016 2025-04-12 14:47:12,471 - INFO - Epoch: 0.41 | loss: 0.000100 | grad_norm: 0.000710 | learning_rate: 0.000016 2025-04-12 14:47:18,561 - INFO - Epoch: 0.41 | loss: 0.000100 | grad_norm: 0.000650 | learning_rate: 0.000017 2025-04-12 14:47:24,690 - INFO - Epoch: 0.42 | loss: 0.000100 | grad_norm: 0.001104 | learning_rate: 0.000017 2025-04-12 14:47:30,917 - INFO - Epoch: 0.42 | loss: 0.000100 | grad_norm: 0.000890 | learning_rate: 0.000017 2025-04-12 14:47:37,150 - INFO - Epoch: 0.42 | loss: 0.000100 | grad_norm: 0.001545 | learning_rate: 0.000017 2025-04-12 14:47:43,277 - INFO - Epoch: 0.42 | loss: 0.000100 | grad_norm: 0.001055 | learning_rate: 0.000017 2025-04-12 14:47:49,026 - INFO - Epoch: 0.43 | loss: 0.000100 | grad_norm: 0.000968 | learning_rate: 0.000017 2025-04-12 14:47:55,132 - INFO - Epoch: 0.43 | loss: 0.000100 | grad_norm: 0.000891 | learning_rate: 0.000017 2025-04-12 14:48:01,294 - INFO - Epoch: 0.43 | loss: 0.000100 | grad_norm: 0.000955 | learning_rate: 0.000017 2025-04-12 14:48:07,312 - INFO - Epoch: 0.44 | loss: 0.000100 | grad_norm: 0.000948 | learning_rate: 0.000017 2025-04-12 14:48:13,215 - INFO - Epoch: 0.44 | loss: 0.000100 | grad_norm: 0.000905 | learning_rate: 0.000018 2025-04-12 14:48:19,738 - INFO - Epoch: 0.44 | loss: 0.000100 | grad_norm: 0.000733 | learning_rate: 0.000018 2025-04-12 14:48:25,779 - INFO - Epoch: 0.44 | loss: 0.000100 | grad_norm: 0.001414 | learning_rate: 0.000018 2025-04-12 14:48:31,929 - INFO - Epoch: 0.45 | loss: 0.000100 | grad_norm: 0.001021 | learning_rate: 0.000018 2025-04-12 14:48:37,806 - INFO - Epoch: 0.45 | loss: 0.000100 | grad_norm: 0.001576 | learning_rate: 0.000018 2025-04-12 14:48:43,876 - INFO - Epoch: 0.45 | loss: 0.000100 | grad_norm: 0.001107 | learning_rate: 0.000018 2025-04-12 14:48:50,031 - INFO - Epoch: 0.45 | loss: 0.000100 | grad_norm: 0.001350 | learning_rate: 0.000018 2025-04-12 14:48:56,568 - INFO - Epoch: 0.46 | loss: 0.000100 | grad_norm: 0.001147 | learning_rate: 0.000018 2025-04-12 14:49:02,208 - INFO - Epoch: 0.46 | loss: 0.000100 | grad_norm: 0.001424 | learning_rate: 0.000018 2025-04-12 14:49:08,716 - INFO - Epoch: 0.46 | loss: 0.000100 | grad_norm: 0.001392 | learning_rate: 0.000019 2025-04-12 14:49:15,197 - INFO - Epoch: 0.47 | loss: 0.000100 | grad_norm: 0.001084 | learning_rate: 0.000019 2025-04-12 14:49:21,364 - INFO - Epoch: 0.47 | loss: 0.000100 | grad_norm: 0.000964 | learning_rate: 0.000019 2025-04-12 14:49:27,267 - INFO - Epoch: 0.47 | loss: 0.000100 | grad_norm: 0.000891 | learning_rate: 0.000019 2025-04-12 14:49:33,131 - INFO - Epoch: 0.47 | loss: 0.000100 | grad_norm: 0.001394 | learning_rate: 0.000019 2025-04-12 14:49:39,221 - INFO - Epoch: 0.48 | loss: 0.000100 | grad_norm: 0.000692 | learning_rate: 0.000019 2025-04-12 14:49:45,156 - INFO - Epoch: 0.48 | loss: 0.000100 | grad_norm: 0.000784 | learning_rate: 0.000019 2025-04-12 14:49:51,266 - INFO - Epoch: 0.48 | loss: 0.000100 | grad_norm: 0.000791 | learning_rate: 0.000019 2025-04-12 14:49:57,246 - INFO - Epoch: 0.48 | loss: 0.000100 | grad_norm: 0.000922 | learning_rate: 0.000019 2025-04-12 14:50:03,209 - INFO - Epoch: 0.49 | loss: 0.000100 | grad_norm: 0.001069 | learning_rate: 0.000019 2025-04-12 14:50:09,115 - INFO - Epoch: 0.49 | loss: 0.000100 | grad_norm: 0.000859 | learning_rate: 0.000020 2025-04-12 14:50:15,431 - INFO - Epoch: 0.49 | loss: 0.000100 | grad_norm: 0.001105 | learning_rate: 0.000020 2025-04-12 14:50:21,711 - INFO - Epoch: 0.50 | loss: 0.000100 | grad_norm: 0.001087 | learning_rate: 0.000020 2025-04-12 14:50:28,128 - INFO - Epoch: 0.50 | loss: 0.000100 | grad_norm: 0.000760 | learning_rate: 0.000020 2025-04-12 14:50:34,172 - INFO - Epoch: 0.50 | loss: 0.000100 | grad_norm: 0.000949 | learning_rate: 0.000020 2025-04-12 14:50:40,278 - INFO - Epoch: 0.50 | loss: 0.000100 | grad_norm: 0.000896 | learning_rate: 0.000020 2025-04-12 14:50:46,801 - INFO - Epoch: 0.51 | loss: 0.000100 | grad_norm: 0.000734 | learning_rate: 0.000020 2025-04-12 14:50:52,903 - INFO - Epoch: 0.51 | loss: 0.000100 | grad_norm: 0.001196 | learning_rate: 0.000020 2025-04-12 14:50:59,054 - INFO - Epoch: 0.51 | loss: 0.000100 | grad_norm: 0.000744 | learning_rate: 0.000020 2025-04-12 14:51:05,156 - INFO - Epoch: 0.52 | loss: 0.000100 | grad_norm: 0.000976 | learning_rate: 0.000020 2025-04-12 14:51:11,004 - INFO - Epoch: 0.52 | loss: 0.000100 | grad_norm: 0.000728 | learning_rate: 0.000020 2025-04-12 14:51:16,566 - INFO - Epoch: 0.52 | loss: 0.000100 | grad_norm: 0.001559 | learning_rate: 0.000020 2025-04-12 14:51:22,835 - INFO - Epoch: 0.52 | loss: 0.000100 | grad_norm: 0.000768 | learning_rate: 0.000020 2025-04-12 14:51:29,065 - INFO - Epoch: 0.53 | loss: 0.000100 | grad_norm: 0.000620 | learning_rate: 0.000020 2025-04-12 14:51:34,923 - INFO - Epoch: 0.53 | loss: 0.000100 | grad_norm: 0.000769 | learning_rate: 0.000020 2025-04-12 14:51:40,825 - INFO - Epoch: 0.53 | loss: 0.000100 | grad_norm: 0.001308 | learning_rate: 0.000020 2025-04-12 14:51:47,181 - INFO - Epoch: 0.53 | loss: 0.000100 | grad_norm: 0.000614 | learning_rate: 0.000020 2025-04-12 14:51:53,205 - INFO - Epoch: 0.54 | loss: 0.000100 | grad_norm: 0.000885 | learning_rate: 0.000020 2025-04-12 14:51:59,210 - INFO - Epoch: 0.54 | loss: 0.000100 | grad_norm: 0.001015 | learning_rate: 0.000020 2025-04-12 14:52:05,329 - INFO - Epoch: 0.54 | loss: 0.000100 | grad_norm: 0.000994 | learning_rate: 0.000020 2025-04-12 14:52:11,808 - INFO - Epoch: 0.55 | loss: 0.000100 | grad_norm: 0.000793 | learning_rate: 0.000020 2025-04-12 14:52:17,705 - INFO - Epoch: 0.55 | loss: 0.000100 | grad_norm: 0.000988 | learning_rate: 0.000020 2025-04-12 14:52:23,942 - INFO - Epoch: 0.55 | loss: 0.000100 | grad_norm: 0.000670 | learning_rate: 0.000020 2025-04-12 14:52:29,955 - INFO - Epoch: 0.55 | loss: 0.000100 | grad_norm: 0.001039 | learning_rate: 0.000020 2025-04-12 14:52:35,988 - INFO - Epoch: 0.56 | loss: 0.000100 | grad_norm: 0.000571 | learning_rate: 0.000020 2025-04-12 14:52:42,191 - INFO - Epoch: 0.56 | loss: 0.000100 | grad_norm: 0.000826 | learning_rate: 0.000020 2025-04-12 14:52:48,126 - INFO - Epoch: 0.56 | loss: 0.000100 | grad_norm: 0.001197 | learning_rate: 0.000020 2025-04-12 14:52:54,064 - INFO - Epoch: 0.56 | loss: 0.000100 | grad_norm: 0.000644 | learning_rate: 0.000020 2025-04-12 14:53:00,248 - INFO - Epoch: 0.57 | loss: 0.000100 | grad_norm: 0.000958 | learning_rate: 0.000020 2025-04-12 14:53:06,597 - INFO - Epoch: 0.57 | loss: 0.000100 | grad_norm: 0.002878 | learning_rate: 0.000020 2025-04-12 14:53:12,902 - INFO - Epoch: 0.57 | loss: 0.000100 | grad_norm: 0.000664 | learning_rate: 0.000020 2025-04-12 14:53:19,017 - INFO - Epoch: 0.58 | loss: 0.000100 | grad_norm: 0.000625 | learning_rate: 0.000020 2025-04-12 14:53:25,538 - INFO - Epoch: 0.58 | loss: 0.000100 | grad_norm: 0.000723 | learning_rate: 0.000020 2025-04-12 14:53:32,107 - INFO - Epoch: 0.58 | loss: 0.000100 | grad_norm: 0.001000 | learning_rate: 0.000020 2025-04-12 14:53:38,201 - INFO - Epoch: 0.58 | loss: 0.000100 | grad_norm: 0.001172 | learning_rate: 0.000020 2025-04-12 14:53:44,342 - INFO - Epoch: 0.59 | loss: 0.000100 | grad_norm: 0.000712 | learning_rate: 0.000020 2025-04-12 14:53:50,352 - INFO - Epoch: 0.59 | loss: 0.000100 | grad_norm: 0.001030 | learning_rate: 0.000020 2025-04-12 14:53:56,438 - INFO - Epoch: 0.59 | loss: 0.000100 | grad_norm: 0.001662 | learning_rate: 0.000020 2025-04-12 14:54:02,524 - INFO - Epoch: 0.59 | loss: 0.000100 | grad_norm: 0.000882 | learning_rate: 0.000020 2025-04-12 14:54:08,748 - INFO - Epoch: 0.60 | loss: 0.000100 | grad_norm: 0.000944 | learning_rate: 0.000020 2025-04-12 14:54:14,865 - INFO - Epoch: 0.60 | loss: 0.000100 | grad_norm: 0.000988 | learning_rate: 0.000020 2025-04-12 14:54:20,917 - INFO - Epoch: 0.60 | loss: 0.000100 | grad_norm: 0.000622 | learning_rate: 0.000020 2025-04-12 14:54:27,183 - INFO - Epoch: 0.61 | loss: 0.000100 | grad_norm: 0.000556 | learning_rate: 0.000020 2025-04-12 14:54:33,356 - INFO - Epoch: 0.61 | loss: 0.000100 | grad_norm: 0.001014 | learning_rate: 0.000020 2025-04-12 14:54:39,421 - INFO - Epoch: 0.61 | loss: 0.000100 | grad_norm: 0.000875 | learning_rate: 0.000020 2025-04-12 14:54:45,760 - INFO - Epoch: 0.61 | loss: 0.000100 | grad_norm: 0.000994 | learning_rate: 0.000020 2025-04-12 14:54:51,990 - INFO - Epoch: 0.62 | loss: 0.000100 | grad_norm: 0.000840 | learning_rate: 0.000020 2025-04-12 14:54:58,364 - INFO - Epoch: 0.62 | loss: 0.000100 | grad_norm: 0.000776 | learning_rate: 0.000020 2025-04-12 14:55:04,519 - INFO - Epoch: 0.62 | loss: 0.000100 | grad_norm: 0.000628 | learning_rate: 0.000020 2025-04-12 14:55:10,776 - INFO - Epoch: 0.62 | loss: 0.000100 | grad_norm: 0.000956 | learning_rate: 0.000020 2025-04-12 14:55:16,671 - INFO - Epoch: 0.63 | loss: 0.000100 | grad_norm: 0.000503 | learning_rate: 0.000020 2025-04-12 14:55:22,593 - INFO - Epoch: 0.63 | loss: 0.000100 | grad_norm: 0.000771 | learning_rate: 0.000020 2025-04-12 14:55:28,600 - INFO - Epoch: 0.63 | loss: 0.000100 | grad_norm: 0.000979 | learning_rate: 0.000020 2025-04-12 14:55:34,679 - INFO - Epoch: 0.64 | loss: 0.000100 | grad_norm: 0.000744 | learning_rate: 0.000020 2025-04-12 14:55:40,660 - INFO - Epoch: 0.64 | loss: 0.000100 | grad_norm: 0.000678 | learning_rate: 0.000020 2025-04-12 14:55:46,957 - INFO - Epoch: 0.64 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000020 2025-04-12 14:55:53,047 - INFO - Epoch: 0.64 | loss: 0.000100 | grad_norm: 0.000957 | learning_rate: 0.000020 2025-04-12 14:55:58,856 - INFO - Epoch: 0.65 | loss: 0.000100 | grad_norm: 0.000707 | learning_rate: 0.000020 2025-04-12 14:56:05,716 - INFO - Epoch: 0.65 | loss: 0.000100 | grad_norm: 0.000776 | learning_rate: 0.000020 2025-04-12 14:56:11,662 - INFO - Epoch: 0.65 | loss: 0.000100 | grad_norm: 0.000813 | learning_rate: 0.000020 2025-04-12 14:56:17,924 - INFO - Epoch: 0.65 | loss: 0.000100 | grad_norm: 0.000697 | learning_rate: 0.000020 2025-04-12 14:56:24,381 - INFO - Epoch: 0.66 | loss: 0.000100 | grad_norm: 0.000797 | learning_rate: 0.000020 2025-04-12 14:56:30,271 - INFO - Epoch: 0.66 | loss: 0.000100 | grad_norm: 0.001566 | learning_rate: 0.000020 2025-04-12 14:56:36,688 - INFO - Epoch: 0.66 | loss: 0.000100 | grad_norm: 0.000930 | learning_rate: 0.000020 2025-04-12 14:56:42,828 - INFO - Epoch: 0.67 | loss: 0.000100 | grad_norm: 0.000813 | learning_rate: 0.000020 2025-04-12 14:56:48,816 - INFO - Epoch: 0.67 | loss: 0.000100 | grad_norm: 0.000680 | learning_rate: 0.000020 2025-04-12 14:56:55,273 - INFO - Epoch: 0.67 | loss: 0.000100 | grad_norm: 0.001227 | learning_rate: 0.000020 2025-04-12 14:57:01,319 - INFO - Epoch: 0.67 | loss: 0.000100 | grad_norm: 0.000745 | learning_rate: 0.000020 2025-04-12 14:57:07,415 - INFO - Epoch: 0.68 | loss: 0.000100 | grad_norm: 0.000652 | learning_rate: 0.000020 2025-04-12 14:57:13,702 - INFO - Epoch: 0.68 | loss: 0.000100 | grad_norm: 0.000854 | learning_rate: 0.000020 2025-04-12 14:57:19,600 - INFO - Epoch: 0.68 | loss: 0.000100 | grad_norm: 0.000751 | learning_rate: 0.000020 2025-04-12 14:57:25,846 - INFO - Epoch: 0.69 | loss: 0.000100 | grad_norm: 0.000759 | learning_rate: 0.000020 2025-04-12 14:57:31,579 - INFO - Epoch: 0.69 | loss: 0.000100 | grad_norm: 0.000789 | learning_rate: 0.000020 2025-04-12 14:57:37,862 - INFO - Epoch: 0.69 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000020 2025-04-12 14:57:43,807 - INFO - Epoch: 0.69 | loss: 0.000100 | grad_norm: 0.000676 | learning_rate: 0.000020 2025-04-12 14:57:50,178 - INFO - Epoch: 0.70 | loss: 0.000100 | grad_norm: 0.000661 | learning_rate: 0.000020 2025-04-12 14:57:56,208 - INFO - Epoch: 0.70 | loss: 0.000100 | grad_norm: 0.000832 | learning_rate: 0.000020 2025-04-12 14:58:02,715 - INFO - Epoch: 0.70 | loss: 0.000100 | grad_norm: 0.000549 | learning_rate: 0.000020 2025-04-12 14:58:08,687 - INFO - Epoch: 0.70 | loss: 0.000100 | grad_norm: 0.000750 | learning_rate: 0.000020 2025-04-12 14:58:14,682 - INFO - Epoch: 0.71 | loss: 0.000100 | grad_norm: 0.000785 | learning_rate: 0.000020 2025-04-12 14:58:20,852 - INFO - Epoch: 0.71 | loss: 0.000100 | grad_norm: 0.000585 | learning_rate: 0.000020 2025-04-12 14:58:27,099 - INFO - Epoch: 0.71 | loss: 0.000100 | grad_norm: 0.000539 | learning_rate: 0.000020 2025-04-12 14:58:33,067 - INFO - Epoch: 0.72 | loss: 0.000100 | grad_norm: 0.000770 | learning_rate: 0.000020 2025-04-12 14:58:39,196 - INFO - Epoch: 0.72 | loss: 0.000100 | grad_norm: 0.000515 | learning_rate: 0.000020 2025-04-12 14:58:45,074 - INFO - Epoch: 0.72 | loss: 0.000100 | grad_norm: 0.000684 | learning_rate: 0.000020 2025-04-12 14:58:51,214 - INFO - Epoch: 0.72 | loss: 0.000100 | grad_norm: 0.000731 | learning_rate: 0.000020 2025-04-12 14:58:57,670 - INFO - Epoch: 0.73 | loss: 0.000100 | grad_norm: 0.000654 | learning_rate: 0.000020 2025-04-12 14:59:03,931 - INFO - Epoch: 0.73 | loss: 0.000100 | grad_norm: 0.000604 | learning_rate: 0.000020 2025-04-12 14:59:09,659 - INFO - Epoch: 0.73 | loss: 0.000100 | grad_norm: 0.000988 | learning_rate: 0.000020 2025-04-12 14:59:15,729 - INFO - Epoch: 0.73 | loss: 0.000100 | grad_norm: 0.000793 | learning_rate: 0.000020 2025-04-12 14:59:21,642 - INFO - Epoch: 0.74 | loss: 0.000100 | grad_norm: 0.000651 | learning_rate: 0.000020 2025-04-12 14:59:27,790 - INFO - Epoch: 0.74 | loss: 0.000100 | grad_norm: 0.000777 | learning_rate: 0.000020 2025-04-12 14:59:33,691 - INFO - Epoch: 0.74 | loss: 0.000100 | grad_norm: 0.001186 | learning_rate: 0.000020 2025-04-12 14:59:39,586 - INFO - Epoch: 0.75 | loss: 0.000100 | grad_norm: 0.000693 | learning_rate: 0.000020 2025-04-12 14:59:45,853 - INFO - Epoch: 0.75 | loss: 0.000100 | grad_norm: 0.000530 | learning_rate: 0.000020 2025-04-12 14:59:51,710 - INFO - Epoch: 0.75 | loss: 0.000100 | grad_norm: 0.000650 | learning_rate: 0.000020 2025-04-12 14:59:57,801 - INFO - Epoch: 0.75 | loss: 0.000100 | grad_norm: 0.000713 | learning_rate: 0.000020 2025-04-12 15:00:03,889 - INFO - Epoch: 0.76 | loss: 0.000100 | grad_norm: 0.000695 | learning_rate: 0.000020 2025-04-12 15:00:09,949 - INFO - Epoch: 0.76 | loss: 0.000100 | grad_norm: 0.000995 | learning_rate: 0.000020 2025-04-12 15:00:16,335 - INFO - Epoch: 0.76 | loss: 0.000100 | grad_norm: 0.000617 | learning_rate: 0.000020 2025-04-12 15:00:22,406 - INFO - Epoch: 0.76 | loss: 0.000100 | grad_norm: 0.000693 | learning_rate: 0.000020 2025-04-12 15:00:28,851 - INFO - Epoch: 0.77 | loss: 0.000100 | grad_norm: 0.000640 | learning_rate: 0.000020 2025-04-12 15:00:34,800 - INFO - Epoch: 0.77 | loss: 0.000100 | grad_norm: 0.000928 | learning_rate: 0.000020 2025-04-12 15:00:40,902 - INFO - Epoch: 0.77 | loss: 0.000100 | grad_norm: 0.000846 | learning_rate: 0.000020 2025-04-12 15:00:47,116 - INFO - Epoch: 0.78 | loss: 0.000100 | grad_norm: 0.000775 | learning_rate: 0.000020 2025-04-12 15:00:53,229 - INFO - Epoch: 0.78 | loss: 0.000100 | grad_norm: 0.000729 | learning_rate: 0.000020 2025-04-12 15:00:59,162 - INFO - Epoch: 0.78 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000020 2025-04-12 15:01:05,376 - INFO - Epoch: 0.78 | loss: 0.000100 | grad_norm: 0.000574 | learning_rate: 0.000020 2025-04-12 15:01:11,627 - INFO - Epoch: 0.79 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000020 2025-04-12 15:01:17,800 - INFO - Epoch: 0.79 | loss: 0.000100 | grad_norm: 0.000704 | learning_rate: 0.000020 2025-04-12 15:01:24,078 - INFO - Epoch: 0.79 | loss: 0.000100 | grad_norm: 0.000741 | learning_rate: 0.000020 2025-04-12 15:01:30,329 - INFO - Epoch: 0.79 | loss: 0.000100 | grad_norm: 0.001053 | learning_rate: 0.000020 2025-04-12 15:01:36,415 - INFO - Epoch: 0.80 | loss: 0.000100 | grad_norm: 0.000690 | learning_rate: 0.000020 2025-04-12 15:01:42,219 - INFO - Epoch: 0.80 | loss: 0.000100 | grad_norm: 0.000707 | learning_rate: 0.000020 2025-04-12 15:01:48,419 - INFO - Epoch: 0.80 | loss: 0.000100 | grad_norm: 0.000806 | learning_rate: 0.000020 2025-04-12 15:01:54,961 - INFO - Epoch: 0.81 | loss: 0.000100 | grad_norm: 0.000785 | learning_rate: 0.000020 2025-04-12 15:02:00,725 - INFO - Epoch: 0.81 | loss: 0.000100 | grad_norm: 0.000820 | learning_rate: 0.000020 2025-04-12 15:02:06,925 - INFO - Epoch: 0.81 | loss: 0.000100 | grad_norm: 0.000901 | learning_rate: 0.000020 2025-04-12 15:02:13,471 - INFO - Epoch: 0.81 | loss: 0.000100 | grad_norm: 0.000798 | learning_rate: 0.000020 2025-04-12 15:02:19,459 - INFO - Epoch: 0.82 | loss: 0.000100 | grad_norm: 0.000728 | learning_rate: 0.000020 2025-04-12 15:02:25,555 - INFO - Epoch: 0.82 | loss: 0.000100 | grad_norm: 0.000825 | learning_rate: 0.000020 2025-04-12 15:02:32,050 - INFO - Epoch: 0.82 | loss: 0.000100 | grad_norm: 0.000655 | learning_rate: 0.000020 2025-04-12 15:02:38,301 - INFO - Epoch: 0.82 | loss: 0.000100 | grad_norm: 0.000589 | learning_rate: 0.000020 2025-04-12 15:02:43,993 - INFO - Epoch: 0.83 | loss: 0.000100 | grad_norm: 0.000772 | learning_rate: 0.000020 2025-04-12 15:02:50,300 - INFO - Epoch: 0.83 | loss: 0.000100 | grad_norm: 0.000875 | learning_rate: 0.000020 2025-04-12 15:02:56,500 - INFO - Epoch: 0.83 | loss: 0.000100 | grad_norm: 0.000499 | learning_rate: 0.000020 2025-04-12 15:03:02,737 - INFO - Epoch: 0.84 | loss: 0.000100 | grad_norm: 0.000698 | learning_rate: 0.000020 2025-04-12 15:03:08,841 - INFO - Epoch: 0.84 | loss: 0.000100 | grad_norm: 0.000611 | learning_rate: 0.000020 2025-04-12 15:03:15,314 - INFO - Epoch: 0.84 | loss: 0.000100 | grad_norm: 0.000728 | learning_rate: 0.000020 2025-04-12 15:03:21,971 - INFO - Epoch: 0.84 | loss: 0.000100 | grad_norm: 0.000602 | learning_rate: 0.000020 2025-04-12 15:03:28,025 - INFO - Epoch: 0.85 | loss: 0.000100 | grad_norm: 0.000695 | learning_rate: 0.000020 2025-04-12 15:03:34,154 - INFO - Epoch: 0.85 | loss: 0.000100 | grad_norm: 0.000581 | learning_rate: 0.000020 2025-04-12 15:03:40,330 - INFO - Epoch: 0.85 | loss: 0.000100 | grad_norm: 0.000532 | learning_rate: 0.000020 2025-04-12 15:03:46,364 - INFO - Epoch: 0.85 | loss: 0.000100 | grad_norm: 0.000501 | learning_rate: 0.000020 2025-04-12 15:03:52,387 - INFO - Epoch: 0.86 | loss: 0.000100 | grad_norm: 0.000449 | learning_rate: 0.000020 2025-04-12 15:03:58,395 - INFO - Epoch: 0.86 | loss: 0.000100 | grad_norm: 0.000384 | learning_rate: 0.000020 2025-04-12 15:04:04,410 - INFO - Epoch: 0.86 | loss: 0.000100 | grad_norm: 0.000564 | learning_rate: 0.000020 2025-04-12 15:04:10,787 - INFO - Epoch: 0.87 | loss: 0.000100 | grad_norm: 0.000857 | learning_rate: 0.000020 2025-04-12 15:04:16,929 - INFO - Epoch: 0.87 | loss: 0.000100 | grad_norm: 0.000481 | learning_rate: 0.000020 2025-04-12 15:04:22,835 - INFO - Epoch: 0.87 | loss: 0.000100 | grad_norm: 0.000700 | learning_rate: 0.000020 2025-04-12 15:04:28,910 - INFO - Epoch: 0.87 | loss: 0.000100 | grad_norm: 0.000499 | learning_rate: 0.000020 2025-04-12 15:04:35,194 - INFO - Epoch: 0.88 | loss: 0.000100 | grad_norm: 0.000731 | learning_rate: 0.000020 2025-04-12 15:04:40,986 - INFO - Epoch: 0.88 | loss: 0.000100 | grad_norm: 0.000532 | learning_rate: 0.000020 2025-04-12 15:04:47,312 - INFO - Epoch: 0.88 | loss: 0.000100 | grad_norm: 0.000953 | learning_rate: 0.000020 2025-04-12 15:04:53,821 - INFO - Epoch: 0.89 | loss: 0.000100 | grad_norm: 0.000720 | learning_rate: 0.000020 2025-04-12 15:04:59,966 - INFO - Epoch: 0.89 | loss: 0.000100 | grad_norm: 0.000522 | learning_rate: 0.000020 2025-04-12 15:05:06,445 - INFO - Epoch: 0.89 | loss: 0.000100 | grad_norm: 0.000572 | learning_rate: 0.000020 2025-04-12 15:05:12,691 - INFO - Epoch: 0.89 | loss: 0.000100 | grad_norm: 0.001108 | learning_rate: 0.000020 2025-04-12 15:05:19,068 - INFO - Epoch: 0.90 | loss: 0.000100 | grad_norm: 0.000662 | learning_rate: 0.000020 2025-04-12 15:05:24,904 - INFO - Epoch: 0.90 | loss: 0.000100 | grad_norm: 0.001252 | learning_rate: 0.000020 2025-04-12 15:05:30,846 - INFO - Epoch: 0.90 | loss: 0.000100 | grad_norm: 0.000852 | learning_rate: 0.000020 2025-04-12 15:05:37,120 - INFO - Epoch: 0.90 | loss: 0.000100 | grad_norm: 0.000562 | learning_rate: 0.000020 2025-04-12 15:05:42,581 - INFO - Epoch: 0.91 | loss: 0.000100 | grad_norm: 0.000600 | learning_rate: 0.000020 2025-04-12 15:05:48,561 - INFO - Epoch: 0.91 | loss: 0.000100 | grad_norm: 0.000593 | learning_rate: 0.000020 2025-04-12 15:05:54,540 - INFO - Epoch: 0.91 | loss: 0.000100 | grad_norm: 0.000697 | learning_rate: 0.000020 2025-04-12 15:06:00,741 - INFO - Epoch: 0.92 | loss: 0.000100 | grad_norm: 0.000603 | learning_rate: 0.000020 2025-04-12 15:06:07,224 - INFO - Epoch: 0.92 | loss: 0.000100 | grad_norm: 0.000615 | learning_rate: 0.000020 2025-04-12 15:06:13,441 - INFO - Epoch: 0.92 | loss: 0.000100 | grad_norm: 0.000479 | learning_rate: 0.000020 2025-04-12 15:06:19,457 - INFO - Epoch: 0.92 | loss: 0.000100 | grad_norm: 0.000550 | learning_rate: 0.000020 2025-04-12 15:06:25,345 - INFO - Epoch: 0.93 | loss: 0.000100 | grad_norm: 0.000976 | learning_rate: 0.000020 2025-04-12 15:06:31,463 - INFO - Epoch: 0.93 | loss: 0.000100 | grad_norm: 0.000866 | learning_rate: 0.000020 2025-04-12 15:06:37,405 - INFO - Epoch: 0.93 | loss: 0.000100 | grad_norm: 0.000623 | learning_rate: 0.000020 2025-04-12 15:06:43,123 - INFO - Epoch: 0.93 | loss: 0.000100 | grad_norm: 0.000613 | learning_rate: 0.000020 2025-04-12 15:06:49,200 - INFO - Epoch: 0.94 | loss: 0.000100 | grad_norm: 0.000569 | learning_rate: 0.000020 2025-04-12 15:06:55,615 - INFO - Epoch: 0.94 | loss: 0.000100 | grad_norm: 0.000460 | learning_rate: 0.000020 2025-04-12 15:07:01,152 - INFO - Epoch: 0.94 | loss: 0.000100 | grad_norm: 0.000997 | learning_rate: 0.000020 2025-04-12 15:07:07,329 - INFO - Epoch: 0.95 | loss: 0.000100 | grad_norm: 0.000803 | learning_rate: 0.000020 2025-04-12 15:07:13,694 - INFO - Epoch: 0.95 | loss: 0.000100 | grad_norm: 0.000515 | learning_rate: 0.000020 2025-04-12 15:07:19,760 - INFO - Epoch: 0.95 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000020 2025-04-12 15:07:25,623 - INFO - Epoch: 0.95 | loss: 0.000100 | grad_norm: 0.000803 | learning_rate: 0.000020 2025-04-12 15:07:31,981 - INFO - Epoch: 0.96 | loss: 0.000100 | grad_norm: 0.000799 | learning_rate: 0.000019 2025-04-12 15:07:38,400 - INFO - Epoch: 0.96 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000019 2025-04-12 15:07:44,413 - INFO - Epoch: 0.96 | loss: 0.000100 | grad_norm: 0.000588 | learning_rate: 0.000019 2025-04-12 15:07:50,026 - INFO - Epoch: 0.96 | loss: 0.000100 | grad_norm: 0.000757 | learning_rate: 0.000019 2025-04-12 15:07:56,117 - INFO - Epoch: 0.97 | loss: 0.000100 | grad_norm: 0.000653 | learning_rate: 0.000019 2025-04-12 15:08:02,442 - INFO - Epoch: 0.97 | loss: 0.000100 | grad_norm: 0.000546 | learning_rate: 0.000019 2025-04-12 15:08:08,666 - INFO - Epoch: 0.97 | loss: 0.000100 | grad_norm: 0.000478 | learning_rate: 0.000019 2025-04-12 15:08:14,914 - INFO - Epoch: 0.98 | loss: 0.000100 | grad_norm: 0.000651 | learning_rate: 0.000019 2025-04-12 15:08:21,085 - INFO - Epoch: 0.98 | loss: 0.000100 | grad_norm: 0.000675 | learning_rate: 0.000019 2025-04-12 15:08:27,119 - INFO - Epoch: 0.98 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000019 2025-04-12 15:08:33,375 - INFO - Epoch: 0.98 | loss: 0.000100 | grad_norm: 0.000582 | learning_rate: 0.000019 2025-04-12 15:08:39,005 - INFO - Epoch: 0.99 | loss: 0.000100 | grad_norm: 0.000557 | learning_rate: 0.000019 2025-04-12 15:08:44,749 - INFO - Epoch: 0.99 | loss: 0.000100 | grad_norm: 0.001082 | learning_rate: 0.000019 2025-04-12 15:08:50,891 - INFO - Epoch: 0.99 | loss: 0.000100 | grad_norm: 0.000703 | learning_rate: 0.000019 2025-04-12 15:08:57,117 - INFO - Epoch: 0.99 | loss: 0.000100 | grad_norm: 0.000629 | learning_rate: 0.000019 2025-04-12 15:09:03,373 - INFO - Epoch: 1.00 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000019 2025-04-12 15:09:09,328 - INFO - Epoch: 1.00 | loss: 0.000100 | grad_norm: 0.000868 | learning_rate: 0.000019 2025-04-12 15:09:18,753 - INFO - Epoch: 1.00 | loss: 0.000100 | grad_norm: 0.000662 | learning_rate: 0.000019 2025-04-12 15:09:24,670 - INFO - Epoch: 1.01 | loss: 0.000100 | grad_norm: 0.000612 | learning_rate: 0.000019 2025-04-12 15:09:30,692 - INFO - Epoch: 1.01 | loss: 0.000100 | grad_norm: 0.000647 | learning_rate: 0.000019 2025-04-12 15:09:36,896 - INFO - Epoch: 1.01 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000019 2025-04-12 15:09:43,242 - INFO - Epoch: 1.01 | loss: 0.000100 | grad_norm: 0.000879 | learning_rate: 0.000019 2025-04-12 15:09:49,333 - INFO - Epoch: 1.02 | loss: 0.000100 | grad_norm: 0.000601 | learning_rate: 0.000019 2025-04-12 15:09:55,404 - INFO - Epoch: 1.02 | loss: 0.000100 | grad_norm: 0.000659 | learning_rate: 0.000019 2025-04-12 15:10:01,436 - INFO - Epoch: 1.02 | loss: 0.000100 | grad_norm: 0.000705 | learning_rate: 0.000019 2025-04-12 15:10:07,321 - INFO - Epoch: 1.02 | loss: 0.000100 | grad_norm: 0.000609 | learning_rate: 0.000019 2025-04-12 15:10:13,475 - INFO - Epoch: 1.03 | loss: 0.000100 | grad_norm: 0.000745 | learning_rate: 0.000019 2025-04-12 15:10:19,500 - INFO - Epoch: 1.03 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000019 2025-04-12 15:10:26,060 - INFO - Epoch: 1.03 | loss: 0.000100 | grad_norm: 0.000554 | learning_rate: 0.000019 2025-04-12 15:10:32,488 - INFO - Epoch: 1.04 | loss: 0.000100 | grad_norm: 0.000630 | learning_rate: 0.000019 2025-04-12 15:10:38,505 - INFO - Epoch: 1.04 | loss: 0.000100 | grad_norm: 0.000800 | learning_rate: 0.000019 2025-04-12 15:10:44,265 - INFO - Epoch: 1.04 | loss: 0.000100 | grad_norm: 0.000545 | learning_rate: 0.000019 2025-04-12 15:10:50,049 - INFO - Epoch: 1.04 | loss: 0.000100 | grad_norm: 0.000816 | learning_rate: 0.000019 2025-04-12 15:10:56,211 - INFO - Epoch: 1.05 | loss: 0.000100 | grad_norm: 0.000583 | learning_rate: 0.000019 2025-04-12 15:11:02,493 - INFO - Epoch: 1.05 | loss: 0.000100 | grad_norm: 0.000855 | learning_rate: 0.000019 2025-04-12 15:11:08,617 - INFO - Epoch: 1.05 | loss: 0.000100 | grad_norm: 0.000589 | learning_rate: 0.000019 2025-04-12 15:11:14,649 - INFO - Epoch: 1.05 | loss: 0.000100 | grad_norm: 0.000679 | learning_rate: 0.000019 2025-04-12 15:11:20,565 - INFO - Epoch: 1.06 | loss: 0.000100 | grad_norm: 0.000651 | learning_rate: 0.000019 2025-04-12 15:11:26,804 - INFO - Epoch: 1.06 | loss: 0.000100 | grad_norm: 0.000550 | learning_rate: 0.000019 2025-04-12 15:11:32,834 - INFO - Epoch: 1.06 | loss: 0.000100 | grad_norm: 0.000539 | learning_rate: 0.000019 2025-04-12 15:11:38,816 - INFO - Epoch: 1.07 | loss: 0.000100 | grad_norm: 0.000607 | learning_rate: 0.000019 2025-04-12 15:11:44,771 - INFO - Epoch: 1.07 | loss: 0.000100 | grad_norm: 0.000546 | learning_rate: 0.000019 2025-04-12 15:11:51,004 - INFO - Epoch: 1.07 | loss: 0.000100 | grad_norm: 0.000541 | learning_rate: 0.000019 2025-04-12 15:11:57,258 - INFO - Epoch: 1.07 | loss: 0.000100 | grad_norm: 0.000485 | learning_rate: 0.000019 2025-04-12 15:12:03,254 - INFO - Epoch: 1.08 | loss: 0.000100 | grad_norm: 0.001182 | learning_rate: 0.000019 2025-04-12 15:12:09,012 - INFO - Epoch: 1.08 | loss: 0.000100 | grad_norm: 0.000636 | learning_rate: 0.000019 2025-04-12 15:12:15,264 - INFO - Epoch: 1.08 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000019 2025-04-12 15:12:21,369 - INFO - Epoch: 1.08 | loss: 0.000100 | grad_norm: 0.000631 | learning_rate: 0.000019 2025-04-12 15:12:27,641 - INFO - Epoch: 1.09 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000019 2025-04-12 15:12:33,906 - INFO - Epoch: 1.09 | loss: 0.000100 | grad_norm: 0.000520 | learning_rate: 0.000019 2025-04-12 15:12:39,864 - INFO - Epoch: 1.09 | loss: 0.000100 | grad_norm: 0.000651 | learning_rate: 0.000019 2025-04-12 15:12:46,272 - INFO - Epoch: 1.10 | loss: 0.000100 | grad_norm: 0.000599 | learning_rate: 0.000019 2025-04-12 15:12:52,367 - INFO - Epoch: 1.10 | loss: 0.000100 | grad_norm: 0.000709 | learning_rate: 0.000019 2025-04-12 15:12:58,883 - INFO - Epoch: 1.10 | loss: 0.000100 | grad_norm: 0.000410 | learning_rate: 0.000019 2025-04-12 15:13:04,635 - INFO - Epoch: 1.10 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000019 2025-04-12 15:13:10,613 - INFO - Epoch: 1.11 | loss: 0.000100 | grad_norm: 0.000812 | learning_rate: 0.000019 2025-04-12 15:13:16,879 - INFO - Epoch: 1.11 | loss: 0.000100 | grad_norm: 0.000485 | learning_rate: 0.000019 2025-04-12 15:13:23,111 - INFO - Epoch: 1.11 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000019 2025-04-12 15:13:29,157 - INFO - Epoch: 1.12 | loss: 0.000100 | grad_norm: 0.000560 | learning_rate: 0.000019 2025-04-12 15:13:35,665 - INFO - Epoch: 1.12 | loss: 0.000100 | grad_norm: 0.000637 | learning_rate: 0.000019 2025-04-12 15:13:41,986 - INFO - Epoch: 1.12 | loss: 0.000100 | grad_norm: 0.000393 | learning_rate: 0.000019 2025-04-12 15:13:48,079 - INFO - Epoch: 1.12 | loss: 0.000100 | grad_norm: 0.000498 | learning_rate: 0.000019 2025-04-12 15:13:54,321 - INFO - Epoch: 1.13 | loss: 0.000100 | grad_norm: 0.000683 | learning_rate: 0.000019 2025-04-12 15:14:00,570 - INFO - Epoch: 1.13 | loss: 0.000100 | grad_norm: 0.000805 | learning_rate: 0.000019 2025-04-12 15:14:06,572 - INFO - Epoch: 1.13 | loss: 0.000100 | grad_norm: 0.000547 | learning_rate: 0.000019 2025-04-12 15:14:12,372 - INFO - Epoch: 1.13 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000019 2025-04-12 15:14:18,652 - INFO - Epoch: 1.14 | loss: 0.000100 | grad_norm: 0.000811 | learning_rate: 0.000019 2025-04-12 15:14:24,745 - INFO - Epoch: 1.14 | loss: 0.000100 | grad_norm: 0.000509 | learning_rate: 0.000019 2025-04-12 15:14:30,870 - INFO - Epoch: 1.14 | loss: 0.000100 | grad_norm: 0.000416 | learning_rate: 0.000019 2025-04-12 15:14:36,876 - INFO - Epoch: 1.15 | loss: 0.000100 | grad_norm: 0.000749 | learning_rate: 0.000019 2025-04-12 15:14:42,818 - INFO - Epoch: 1.15 | loss: 0.000100 | grad_norm: 0.000722 | learning_rate: 0.000019 2025-04-12 15:14:49,032 - INFO - Epoch: 1.15 | loss: 0.000100 | grad_norm: 0.000574 | learning_rate: 0.000019 2025-04-12 15:14:55,572 - INFO - Epoch: 1.15 | loss: 0.000100 | grad_norm: 0.000477 | learning_rate: 0.000019 2025-04-12 15:15:01,541 - INFO - Epoch: 1.16 | loss: 0.000100 | grad_norm: 0.000629 | learning_rate: 0.000019 2025-04-12 15:15:08,182 - INFO - Epoch: 1.16 | loss: 0.000100 | grad_norm: 0.000689 | learning_rate: 0.000019 2025-04-12 15:15:14,406 - INFO - Epoch: 1.16 | loss: 0.000100 | grad_norm: 0.000716 | learning_rate: 0.000019 2025-04-12 15:15:20,351 - INFO - Epoch: 1.16 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000019 2025-04-12 15:15:26,209 - INFO - Epoch: 1.17 | loss: 0.000100 | grad_norm: 0.000473 | learning_rate: 0.000019 2025-04-12 15:15:31,877 - INFO - Epoch: 1.17 | loss: 0.000100 | grad_norm: 0.000567 | learning_rate: 0.000019 2025-04-12 15:15:38,096 - INFO - Epoch: 1.17 | loss: 0.000100 | grad_norm: 0.001044 | learning_rate: 0.000019 2025-04-12 15:15:43,983 - INFO - Epoch: 1.18 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000019 2025-04-12 15:15:49,516 - INFO - Epoch: 1.18 | loss: 0.000100 | grad_norm: 0.000736 | learning_rate: 0.000019 2025-04-12 15:15:55,570 - INFO - Epoch: 1.18 | loss: 0.000100 | grad_norm: 0.000993 | learning_rate: 0.000019 2025-04-12 15:16:01,827 - INFO - Epoch: 1.18 | loss: 0.000100 | grad_norm: 0.000430 | learning_rate: 0.000019 2025-04-12 15:16:08,181 - INFO - Epoch: 1.19 | loss: 0.000100 | grad_norm: 0.000553 | learning_rate: 0.000019 2025-04-12 15:16:14,173 - INFO - Epoch: 1.19 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000019 2025-04-12 15:16:20,642 - INFO - Epoch: 1.19 | loss: 0.000100 | grad_norm: 0.000505 | learning_rate: 0.000019 2025-04-12 15:16:26,702 - INFO - Epoch: 1.19 | loss: 0.000100 | grad_norm: 0.000670 | learning_rate: 0.000019 2025-04-12 15:16:32,825 - INFO - Epoch: 1.20 | loss: 0.000100 | grad_norm: 0.001287 | learning_rate: 0.000019 2025-04-12 15:16:39,014 - INFO - Epoch: 1.20 | loss: 0.000100 | grad_norm: 0.001027 | learning_rate: 0.000019 2025-04-12 15:16:44,972 - INFO - Epoch: 1.20 | loss: 0.000100 | grad_norm: 0.000821 | learning_rate: 0.000019 2025-04-12 15:16:51,234 - INFO - Epoch: 1.21 | loss: 0.000100 | grad_norm: 0.000546 | learning_rate: 0.000019 2025-04-12 15:16:56,983 - INFO - Epoch: 1.21 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000019 2025-04-12 15:17:03,308 - INFO - Epoch: 1.21 | loss: 0.000100 | grad_norm: 0.000671 | learning_rate: 0.000019 2025-04-12 15:17:09,838 - INFO - Epoch: 1.21 | loss: 0.000100 | grad_norm: 0.000821 | learning_rate: 0.000019 2025-04-12 15:17:16,202 - INFO - Epoch: 1.22 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000019 2025-04-12 15:17:22,274 - INFO - Epoch: 1.22 | loss: 0.000100 | grad_norm: 0.000764 | learning_rate: 0.000019 2025-04-12 15:17:28,368 - INFO - Epoch: 1.22 | loss: 0.000100 | grad_norm: 0.000423 | learning_rate: 0.000019 2025-04-12 15:17:34,379 - INFO - Epoch: 1.22 | loss: 0.000100 | grad_norm: 0.000477 | learning_rate: 0.000019 2025-04-12 15:17:40,565 - INFO - Epoch: 1.23 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000019 2025-04-12 15:17:46,608 - INFO - Epoch: 1.23 | loss: 0.000100 | grad_norm: 0.000490 | learning_rate: 0.000019 2025-04-12 15:17:52,394 - INFO - Epoch: 1.23 | loss: 0.000100 | grad_norm: 0.000549 | learning_rate: 0.000019 2025-04-12 15:17:58,185 - INFO - Epoch: 1.24 | loss: 0.000100 | grad_norm: 0.000596 | learning_rate: 0.000019 2025-04-12 15:18:04,325 - INFO - Epoch: 1.24 | loss: 0.000100 | grad_norm: 0.000596 | learning_rate: 0.000019 2025-04-12 15:18:10,770 - INFO - Epoch: 1.24 | loss: 0.000100 | grad_norm: 0.000822 | learning_rate: 0.000019 2025-04-12 15:18:17,062 - INFO - Epoch: 1.24 | loss: 0.000100 | grad_norm: 0.000679 | learning_rate: 0.000019 2025-04-12 15:18:23,283 - INFO - Epoch: 1.25 | loss: 0.000100 | grad_norm: 0.000547 | learning_rate: 0.000019 2025-04-12 15:18:29,307 - INFO - Epoch: 1.25 | loss: 0.000100 | grad_norm: 0.000670 | learning_rate: 0.000019 2025-04-12 15:18:35,392 - INFO - Epoch: 1.25 | loss: 0.000100 | grad_norm: 0.000637 | learning_rate: 0.000019 2025-04-12 15:18:41,666 - INFO - Epoch: 1.25 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000019 2025-04-12 15:18:48,035 - INFO - Epoch: 1.26 | loss: 0.000100 | grad_norm: 0.000531 | learning_rate: 0.000019 2025-04-12 15:18:54,353 - INFO - Epoch: 1.26 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000019 2025-04-12 15:19:00,142 - INFO - Epoch: 1.26 | loss: 0.000100 | grad_norm: 0.000493 | learning_rate: 0.000019 2025-04-12 15:19:06,646 - INFO - Epoch: 1.27 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000019 2025-04-12 15:19:12,325 - INFO - Epoch: 1.27 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000019 2025-04-12 15:19:18,412 - INFO - Epoch: 1.27 | loss: 0.000100 | grad_norm: 0.000559 | learning_rate: 0.000019 2025-04-12 15:19:24,321 - INFO - Epoch: 1.27 | loss: 0.000100 | grad_norm: 0.000642 | learning_rate: 0.000019 2025-04-12 15:19:30,418 - INFO - Epoch: 1.28 | loss: 0.000100 | grad_norm: 0.000644 | learning_rate: 0.000019 2025-04-12 15:19:36,066 - INFO - Epoch: 1.28 | loss: 0.000100 | grad_norm: 0.000696 | learning_rate: 0.000019 2025-04-12 15:19:41,868 - INFO - Epoch: 1.28 | loss: 0.000100 | grad_norm: 0.000515 | learning_rate: 0.000019 2025-04-12 15:19:47,935 - INFO - Epoch: 1.28 | loss: 0.000100 | grad_norm: 0.000652 | learning_rate: 0.000019 2025-04-12 15:19:54,047 - INFO - Epoch: 1.29 | loss: 0.000100 | grad_norm: 0.000469 | learning_rate: 0.000019 2025-04-12 15:20:00,107 - INFO - Epoch: 1.29 | loss: 0.000100 | grad_norm: 0.000468 | learning_rate: 0.000019 2025-04-12 15:20:05,936 - INFO - Epoch: 1.29 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000019 2025-04-12 15:20:12,025 - INFO - Epoch: 1.30 | loss: 0.000100 | grad_norm: 0.000546 | learning_rate: 0.000018 2025-04-12 15:20:17,999 - INFO - Epoch: 1.30 | loss: 0.000100 | grad_norm: 0.000473 | learning_rate: 0.000018 2025-04-12 15:20:24,093 - INFO - Epoch: 1.30 | loss: 0.000100 | grad_norm: 0.000734 | learning_rate: 0.000018 2025-04-12 15:20:30,426 - INFO - Epoch: 1.30 | loss: 0.000100 | grad_norm: 0.000804 | learning_rate: 0.000018 2025-04-12 15:20:36,650 - INFO - Epoch: 1.31 | loss: 0.000100 | grad_norm: 0.000545 | learning_rate: 0.000018 2025-04-12 15:20:42,643 - INFO - Epoch: 1.31 | loss: 0.000100 | grad_norm: 0.000670 | learning_rate: 0.000018 2025-04-12 15:20:48,879 - INFO - Epoch: 1.31 | loss: 0.000100 | grad_norm: 0.000568 | learning_rate: 0.000018 2025-04-12 15:20:55,235 - INFO - Epoch: 1.32 | loss: 0.000100 | grad_norm: 0.000658 | learning_rate: 0.000018 2025-04-12 15:21:01,362 - INFO - Epoch: 1.32 | loss: 0.000100 | grad_norm: 0.000584 | learning_rate: 0.000018 2025-04-12 15:21:07,353 - INFO - Epoch: 1.32 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000018 2025-04-12 15:21:13,475 - INFO - Epoch: 1.32 | loss: 0.000100 | grad_norm: 0.000680 | learning_rate: 0.000018 2025-04-12 15:21:19,486 - INFO - Epoch: 1.33 | loss: 0.000100 | grad_norm: 0.000507 | learning_rate: 0.000018 2025-04-12 15:21:25,765 - INFO - Epoch: 1.33 | loss: 0.000100 | grad_norm: 0.000604 | learning_rate: 0.000018 2025-04-12 15:21:31,130 - INFO - Epoch: 1.33 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000018 2025-04-12 15:21:37,024 - INFO - Epoch: 1.33 | loss: 0.000100 | grad_norm: 0.000728 | learning_rate: 0.000018 2025-04-12 15:21:43,046 - INFO - Epoch: 1.34 | loss: 0.000100 | grad_norm: 0.000670 | learning_rate: 0.000018 2025-04-12 15:21:49,102 - INFO - Epoch: 1.34 | loss: 0.000100 | grad_norm: 0.000752 | learning_rate: 0.000018 2025-04-12 15:21:55,228 - INFO - Epoch: 1.34 | loss: 0.000100 | grad_norm: 0.000742 | learning_rate: 0.000018 2025-04-12 15:22:01,637 - INFO - Epoch: 1.35 | loss: 0.000100 | grad_norm: 0.000588 | learning_rate: 0.000018 2025-04-12 15:22:07,907 - INFO - Epoch: 1.35 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000018 2025-04-12 15:22:14,242 - INFO - Epoch: 1.35 | loss: 0.000100 | grad_norm: 0.000446 | learning_rate: 0.000018 2025-04-12 15:22:20,361 - INFO - Epoch: 1.35 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000018 2025-04-12 15:22:26,339 - INFO - Epoch: 1.36 | loss: 0.000100 | grad_norm: 0.000543 | learning_rate: 0.000018 2025-04-12 15:22:32,593 - INFO - Epoch: 1.36 | loss: 0.000100 | grad_norm: 0.001004 | learning_rate: 0.000018 2025-04-12 15:22:38,685 - INFO - Epoch: 1.36 | loss: 0.000100 | grad_norm: 0.000654 | learning_rate: 0.000018 2025-04-12 15:22:44,949 - INFO - Epoch: 1.36 | loss: 0.000100 | grad_norm: 0.000635 | learning_rate: 0.000018 2025-04-12 15:22:51,142 - INFO - Epoch: 1.37 | loss: 0.000100 | grad_norm: 0.000762 | learning_rate: 0.000018 2025-04-12 15:22:57,505 - INFO - Epoch: 1.37 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000018 2025-04-12 15:23:03,757 - INFO - Epoch: 1.37 | loss: 0.000100 | grad_norm: 0.000603 | learning_rate: 0.000018 2025-04-12 15:23:09,729 - INFO - Epoch: 1.38 | loss: 0.000100 | grad_norm: 0.000683 | learning_rate: 0.000018 2025-04-12 15:23:16,065 - INFO - Epoch: 1.38 | loss: 0.000100 | grad_norm: 0.000537 | learning_rate: 0.000018 2025-04-12 15:23:21,886 - INFO - Epoch: 1.38 | loss: 0.000100 | grad_norm: 0.000442 | learning_rate: 0.000018 2025-04-12 15:23:27,822 - INFO - Epoch: 1.38 | loss: 0.000100 | grad_norm: 0.000462 | learning_rate: 0.000018 2025-04-12 15:23:33,501 - INFO - Epoch: 1.39 | loss: 0.000100 | grad_norm: 0.000630 | learning_rate: 0.000018 2025-04-12 15:23:39,516 - INFO - Epoch: 1.39 | loss: 0.000100 | grad_norm: 0.000420 | learning_rate: 0.000018 2025-04-12 15:23:45,452 - INFO - Epoch: 1.39 | loss: 0.000100 | grad_norm: 0.000435 | learning_rate: 0.000018 2025-04-12 15:23:51,477 - INFO - Epoch: 1.39 | loss: 0.000100 | grad_norm: 0.000769 | learning_rate: 0.000018 2025-04-12 15:23:57,632 - INFO - Epoch: 1.40 | loss: 0.000100 | grad_norm: 0.000567 | learning_rate: 0.000018 2025-04-12 15:24:03,450 - INFO - Epoch: 1.40 | loss: 0.000100 | grad_norm: 0.000569 | learning_rate: 0.000018 2025-04-12 15:24:09,471 - INFO - Epoch: 1.40 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000018 2025-04-12 15:24:15,823 - INFO - Epoch: 1.41 | loss: 0.000100 | grad_norm: 0.000437 | learning_rate: 0.000018 2025-04-12 15:24:21,888 - INFO - Epoch: 1.41 | loss: 0.000100 | grad_norm: 0.000483 | learning_rate: 0.000018 2025-04-12 15:24:27,950 - INFO - Epoch: 1.41 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000018 2025-04-12 15:24:34,184 - INFO - Epoch: 1.41 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000018 2025-04-12 15:24:39,912 - INFO - Epoch: 1.42 | loss: 0.000100 | grad_norm: 0.000848 | learning_rate: 0.000018 2025-04-12 15:24:46,223 - INFO - Epoch: 1.42 | loss: 0.000100 | grad_norm: 0.000539 | learning_rate: 0.000018 2025-04-12 15:24:52,536 - INFO - Epoch: 1.42 | loss: 0.000100 | grad_norm: 0.000450 | learning_rate: 0.000018 2025-04-12 15:24:58,434 - INFO - Epoch: 1.42 | loss: 0.000100 | grad_norm: 0.000473 | learning_rate: 0.000018 2025-04-12 15:25:04,548 - INFO - Epoch: 1.43 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000018 2025-04-12 15:25:11,064 - INFO - Epoch: 1.43 | loss: 0.000100 | grad_norm: 0.000657 | learning_rate: 0.000018 2025-04-12 15:25:17,172 - INFO - Epoch: 1.43 | loss: 0.000100 | grad_norm: 0.000504 | learning_rate: 0.000018 2025-04-12 15:25:23,410 - INFO - Epoch: 1.44 | loss: 0.000100 | grad_norm: 0.000681 | learning_rate: 0.000018 2025-04-12 15:25:29,689 - INFO - Epoch: 1.44 | loss: 0.000100 | grad_norm: 0.000520 | learning_rate: 0.000018 2025-04-12 15:25:35,562 - INFO - Epoch: 1.44 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000018 2025-04-12 15:25:41,759 - INFO - Epoch: 1.44 | loss: 0.000100 | grad_norm: 0.000478 | learning_rate: 0.000018 2025-04-12 15:25:47,818 - INFO - Epoch: 1.45 | loss: 0.000100 | grad_norm: 0.000438 | learning_rate: 0.000018 2025-04-12 15:25:54,197 - INFO - Epoch: 1.45 | loss: 0.000100 | grad_norm: 0.000598 | learning_rate: 0.000018 2025-04-12 15:26:00,164 - INFO - Epoch: 1.45 | loss: 0.000100 | grad_norm: 0.001415 | learning_rate: 0.000018 2025-04-12 15:26:06,006 - INFO - Epoch: 1.45 | loss: 0.000100 | grad_norm: 0.000529 | learning_rate: 0.000018 2025-04-12 15:26:11,838 - INFO - Epoch: 1.46 | loss: 0.000100 | grad_norm: 0.000534 | learning_rate: 0.000018 2025-04-12 15:26:18,066 - INFO - Epoch: 1.46 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000018 2025-04-12 15:26:23,943 - INFO - Epoch: 1.46 | loss: 0.000100 | grad_norm: 0.000580 | learning_rate: 0.000018 2025-04-12 15:26:30,215 - INFO - Epoch: 1.47 | loss: 0.000100 | grad_norm: 0.000406 | learning_rate: 0.000018 2025-04-12 15:26:36,520 - INFO - Epoch: 1.47 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000018 2025-04-12 15:26:42,885 - INFO - Epoch: 1.47 | loss: 0.000100 | grad_norm: 0.000683 | learning_rate: 0.000018 2025-04-12 15:26:48,892 - INFO - Epoch: 1.47 | loss: 0.000100 | grad_norm: 0.000689 | learning_rate: 0.000018 2025-04-12 15:26:55,120 - INFO - Epoch: 1.48 | loss: 0.000100 | grad_norm: 0.000424 | learning_rate: 0.000018 2025-04-12 15:27:01,477 - INFO - Epoch: 1.48 | loss: 0.000100 | grad_norm: 0.000613 | learning_rate: 0.000018 2025-04-12 15:27:07,362 - INFO - Epoch: 1.48 | loss: 0.000100 | grad_norm: 0.000526 | learning_rate: 0.000018 2025-04-12 15:27:13,368 - INFO - Epoch: 1.48 | loss: 0.000100 | grad_norm: 0.000474 | learning_rate: 0.000018 2025-04-12 15:27:19,393 - INFO - Epoch: 1.49 | loss: 0.000100 | grad_norm: 0.000442 | learning_rate: 0.000018 2025-04-12 15:27:25,587 - INFO - Epoch: 1.49 | loss: 0.000100 | grad_norm: 0.000546 | learning_rate: 0.000018 2025-04-12 15:27:31,795 - INFO - Epoch: 1.49 | loss: 0.000100 | grad_norm: 0.000453 | learning_rate: 0.000018 2025-04-12 15:27:38,174 - INFO - Epoch: 1.50 | loss: 0.000100 | grad_norm: 0.000445 | learning_rate: 0.000018 2025-04-12 15:27:44,494 - INFO - Epoch: 1.50 | loss: 0.000100 | grad_norm: 0.000377 | learning_rate: 0.000018 2025-04-12 15:27:50,532 - INFO - Epoch: 1.50 | loss: 0.000100 | grad_norm: 0.000478 | learning_rate: 0.000018 2025-04-12 15:27:56,616 - INFO - Epoch: 1.50 | loss: 0.000100 | grad_norm: 0.000436 | learning_rate: 0.000018 2025-04-12 15:28:02,649 - INFO - Epoch: 1.51 | loss: 0.000100 | grad_norm: 0.000443 | learning_rate: 0.000018 2025-04-12 15:28:09,008 - INFO - Epoch: 1.51 | loss: 0.000100 | grad_norm: 0.000565 | learning_rate: 0.000018 2025-04-12 15:28:15,053 - INFO - Epoch: 1.51 | loss: 0.000100 | grad_norm: 0.000608 | learning_rate: 0.000018 2025-04-12 15:28:20,929 - INFO - Epoch: 1.52 | loss: 0.000100 | grad_norm: 0.000756 | learning_rate: 0.000018 2025-04-12 15:28:27,122 - INFO - Epoch: 1.52 | loss: 0.000100 | grad_norm: 0.000494 | learning_rate: 0.000018 2025-04-12 15:28:33,086 - INFO - Epoch: 1.52 | loss: 0.000100 | grad_norm: 0.000592 | learning_rate: 0.000018 2025-04-12 15:28:39,006 - INFO - Epoch: 1.52 | loss: 0.000100 | grad_norm: 0.000487 | learning_rate: 0.000018 2025-04-12 15:28:45,456 - INFO - Epoch: 1.53 | loss: 0.000100 | grad_norm: 0.000603 | learning_rate: 0.000018 2025-04-12 15:28:51,540 - INFO - Epoch: 1.53 | loss: 0.000100 | grad_norm: 0.000669 | learning_rate: 0.000018 2025-04-12 15:28:58,116 - INFO - Epoch: 1.53 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000018 2025-04-12 15:29:04,317 - INFO - Epoch: 1.53 | loss: 0.000100 | grad_norm: 0.000626 | learning_rate: 0.000018 2025-04-12 15:29:10,546 - INFO - Epoch: 1.54 | loss: 0.000100 | grad_norm: 0.000376 | learning_rate: 0.000017 2025-04-12 15:29:17,076 - INFO - Epoch: 1.54 | loss: 0.000100 | grad_norm: 0.000861 | learning_rate: 0.000017 2025-04-12 15:29:23,178 - INFO - Epoch: 1.54 | loss: 0.000100 | grad_norm: 0.000716 | learning_rate: 0.000017 2025-04-12 15:29:29,388 - INFO - Epoch: 1.55 | loss: 0.000100 | grad_norm: 0.000412 | learning_rate: 0.000017 2025-04-12 15:29:35,035 - INFO - Epoch: 1.55 | loss: 0.000100 | grad_norm: 0.000567 | learning_rate: 0.000017 2025-04-12 15:29:41,573 - INFO - Epoch: 1.55 | loss: 0.000100 | grad_norm: 0.000550 | learning_rate: 0.000017 2025-04-12 15:29:47,344 - INFO - Epoch: 1.55 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000017 2025-04-12 15:29:53,665 - INFO - Epoch: 1.56 | loss: 0.000100 | grad_norm: 0.000789 | learning_rate: 0.000017 2025-04-12 15:29:59,513 - INFO - Epoch: 1.56 | loss: 0.000100 | grad_norm: 0.000594 | learning_rate: 0.000017 2025-04-12 15:30:05,572 - INFO - Epoch: 1.56 | loss: 0.000100 | grad_norm: 0.000892 | learning_rate: 0.000017 2025-04-12 15:30:11,789 - INFO - Epoch: 1.56 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000017 2025-04-12 15:30:17,918 - INFO - Epoch: 1.57 | loss: 0.000100 | grad_norm: 0.000557 | learning_rate: 0.000017 2025-04-12 15:30:23,955 - INFO - Epoch: 1.57 | loss: 0.000100 | grad_norm: 0.000429 | learning_rate: 0.000017 2025-04-12 15:30:30,043 - INFO - Epoch: 1.57 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000017 2025-04-12 15:30:36,480 - INFO - Epoch: 1.58 | loss: 0.000100 | grad_norm: 0.000796 | learning_rate: 0.000017 2025-04-12 15:30:42,599 - INFO - Epoch: 1.58 | loss: 0.000100 | grad_norm: 0.000607 | learning_rate: 0.000017 2025-04-12 15:30:48,390 - INFO - Epoch: 1.58 | loss: 0.000100 | grad_norm: 0.000275 | learning_rate: 0.000017 2025-04-12 15:30:54,067 - INFO - Epoch: 1.58 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000017 2025-04-12 15:31:00,023 - INFO - Epoch: 1.59 | loss: 0.000100 | grad_norm: 0.000479 | learning_rate: 0.000017 2025-04-12 15:31:06,193 - INFO - Epoch: 1.59 | loss: 0.000100 | grad_norm: 0.000573 | learning_rate: 0.000017 2025-04-12 15:31:11,949 - INFO - Epoch: 1.59 | loss: 0.000100 | grad_norm: 0.000550 | learning_rate: 0.000017 2025-04-12 15:31:17,683 - INFO - Epoch: 1.59 | loss: 0.000100 | grad_norm: 0.000508 | learning_rate: 0.000017 2025-04-12 15:31:23,693 - INFO - Epoch: 1.60 | loss: 0.000100 | grad_norm: 0.000744 | learning_rate: 0.000017 2025-04-12 15:31:29,915 - INFO - Epoch: 1.60 | loss: 0.000100 | grad_norm: 0.000505 | learning_rate: 0.000017 2025-04-12 15:31:36,080 - INFO - Epoch: 1.60 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000017 2025-04-12 15:31:41,884 - INFO - Epoch: 1.61 | loss: 0.000100 | grad_norm: 0.000840 | learning_rate: 0.000017 2025-04-12 15:31:48,073 - INFO - Epoch: 1.61 | loss: 0.000100 | grad_norm: 0.000631 | learning_rate: 0.000017 2025-04-12 15:31:53,959 - INFO - Epoch: 1.61 | loss: 0.000100 | grad_norm: 0.000738 | learning_rate: 0.000017 2025-04-12 15:32:00,205 - INFO - Epoch: 1.61 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000017 2025-04-12 15:32:05,937 - INFO - Epoch: 1.62 | loss: 0.000100 | grad_norm: 0.000458 | learning_rate: 0.000017 2025-04-12 15:32:11,956 - INFO - Epoch: 1.62 | loss: 0.000100 | grad_norm: 0.000441 | learning_rate: 0.000017 2025-04-12 15:32:17,968 - INFO - Epoch: 1.62 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000017 2025-04-12 15:32:24,088 - INFO - Epoch: 1.62 | loss: 0.000100 | grad_norm: 0.000527 | learning_rate: 0.000017 2025-04-12 15:32:30,462 - INFO - Epoch: 1.63 | loss: 0.000100 | grad_norm: 0.000618 | learning_rate: 0.000017 2025-04-12 15:32:36,509 - INFO - Epoch: 1.63 | loss: 0.000100 | grad_norm: 0.000389 | learning_rate: 0.000017 2025-04-12 15:32:42,804 - INFO - Epoch: 1.63 | loss: 0.000100 | grad_norm: 0.000785 | learning_rate: 0.000017 2025-04-12 15:32:49,139 - INFO - Epoch: 1.64 | loss: 0.000100 | grad_norm: 0.000409 | learning_rate: 0.000017 2025-04-12 15:32:55,281 - INFO - Epoch: 1.64 | loss: 0.000100 | grad_norm: 0.000604 | learning_rate: 0.000017 2025-04-12 15:33:01,709 - INFO - Epoch: 1.64 | loss: 0.000100 | grad_norm: 0.000599 | learning_rate: 0.000017 2025-04-12 15:33:08,005 - INFO - Epoch: 1.64 | loss: 0.000100 | grad_norm: 0.000517 | learning_rate: 0.000017 2025-04-12 15:33:14,193 - INFO - Epoch: 1.65 | loss: 0.000100 | grad_norm: 0.000457 | learning_rate: 0.000017 2025-04-12 15:33:20,523 - INFO - Epoch: 1.65 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000017 2025-04-12 15:33:26,713 - INFO - Epoch: 1.65 | loss: 0.000100 | grad_norm: 0.000517 | learning_rate: 0.000017 2025-04-12 15:33:32,497 - INFO - Epoch: 1.65 | loss: 0.000100 | grad_norm: 0.000527 | learning_rate: 0.000017 2025-04-12 15:33:38,645 - INFO - Epoch: 1.66 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000017 2025-04-12 15:33:44,554 - INFO - Epoch: 1.66 | loss: 0.000100 | grad_norm: 0.000519 | learning_rate: 0.000017 2025-04-12 15:33:50,954 - INFO - Epoch: 1.66 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000017 2025-04-12 15:33:57,089 - INFO - Epoch: 1.67 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000017 2025-04-12 15:34:03,177 - INFO - Epoch: 1.67 | loss: 0.000100 | grad_norm: 0.000740 | learning_rate: 0.000017 2025-04-12 15:34:09,129 - INFO - Epoch: 1.67 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000017 2025-04-12 15:34:14,951 - INFO - Epoch: 1.67 | loss: 0.000100 | grad_norm: 0.000433 | learning_rate: 0.000017 2025-04-12 15:34:20,796 - INFO - Epoch: 1.68 | loss: 0.000100 | grad_norm: 0.000433 | learning_rate: 0.000017 2025-04-12 15:34:26,969 - INFO - Epoch: 1.68 | loss: 0.000100 | grad_norm: 0.000371 | learning_rate: 0.000017 2025-04-12 15:34:33,019 - INFO - Epoch: 1.68 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000017 2025-04-12 15:34:39,131 - INFO - Epoch: 1.69 | loss: 0.000100 | grad_norm: 0.000568 | learning_rate: 0.000017 2025-04-12 15:34:45,292 - INFO - Epoch: 1.69 | loss: 0.000100 | grad_norm: 0.000718 | learning_rate: 0.000017 2025-04-12 15:34:51,828 - INFO - Epoch: 1.69 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000017 2025-04-12 15:34:58,024 - INFO - Epoch: 1.69 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000017 2025-04-12 15:35:04,286 - INFO - Epoch: 1.70 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000017 2025-04-12 15:35:10,257 - INFO - Epoch: 1.70 | loss: 0.000100 | grad_norm: 0.000301 | learning_rate: 0.000017 2025-04-12 15:35:16,518 - INFO - Epoch: 1.70 | loss: 0.000100 | grad_norm: 0.000311 | learning_rate: 0.000017 2025-04-12 15:35:22,767 - INFO - Epoch: 1.70 | loss: 0.000100 | grad_norm: 0.000464 | learning_rate: 0.000017 2025-04-12 15:35:29,019 - INFO - Epoch: 1.71 | loss: 0.000100 | grad_norm: 0.000652 | learning_rate: 0.000017 2025-04-12 15:35:35,198 - INFO - Epoch: 1.71 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000017 2025-04-12 15:35:41,156 - INFO - Epoch: 1.71 | loss: 0.000100 | grad_norm: 0.000445 | learning_rate: 0.000017 2025-04-12 15:35:47,292 - INFO - Epoch: 1.72 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000017 2025-04-12 15:35:53,516 - INFO - Epoch: 1.72 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000017 2025-04-12 15:35:59,892 - INFO - Epoch: 1.72 | loss: 0.000100 | grad_norm: 0.000625 | learning_rate: 0.000017 2025-04-12 15:36:05,960 - INFO - Epoch: 1.72 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000017 2025-04-12 15:36:11,699 - INFO - Epoch: 1.73 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000017 2025-04-12 15:36:17,927 - INFO - Epoch: 1.73 | loss: 0.000100 | grad_norm: 0.000288 | learning_rate: 0.000017 2025-04-12 15:36:23,888 - INFO - Epoch: 1.73 | loss: 0.000100 | grad_norm: 0.000581 | learning_rate: 0.000017 2025-04-12 15:36:30,093 - INFO - Epoch: 1.73 | loss: 0.000100 | grad_norm: 0.000329 | learning_rate: 0.000017 2025-04-12 15:36:35,898 - INFO - Epoch: 1.74 | loss: 0.000100 | grad_norm: 0.000449 | learning_rate: 0.000016 2025-04-12 15:36:41,961 - INFO - Epoch: 1.74 | loss: 0.000100 | grad_norm: 0.000306 | learning_rate: 0.000016 2025-04-12 15:36:47,782 - INFO - Epoch: 1.74 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000016 2025-04-12 15:36:53,868 - INFO - Epoch: 1.75 | loss: 0.000100 | grad_norm: 0.000760 | learning_rate: 0.000016 2025-04-12 15:36:59,820 - INFO - Epoch: 1.75 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000016 2025-04-12 15:37:05,899 - INFO - Epoch: 1.75 | loss: 0.000100 | grad_norm: 0.000557 | learning_rate: 0.000016 2025-04-12 15:37:12,407 - INFO - Epoch: 1.75 | loss: 0.000100 | grad_norm: 0.000454 | learning_rate: 0.000016 2025-04-12 15:37:18,730 - INFO - Epoch: 1.76 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000016 2025-04-12 15:37:24,822 - INFO - Epoch: 1.76 | loss: 0.000100 | grad_norm: 0.000418 | learning_rate: 0.000016 2025-04-12 15:37:31,133 - INFO - Epoch: 1.76 | loss: 0.000100 | grad_norm: 0.000445 | learning_rate: 0.000016 2025-04-12 15:37:37,471 - INFO - Epoch: 1.76 | loss: 0.000100 | grad_norm: 0.000566 | learning_rate: 0.000016 2025-04-12 15:37:43,940 - INFO - Epoch: 1.77 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000016 2025-04-12 15:37:50,447 - INFO - Epoch: 1.77 | loss: 0.000100 | grad_norm: 0.000411 | learning_rate: 0.000016 2025-04-12 15:37:56,279 - INFO - Epoch: 1.77 | loss: 0.000100 | grad_norm: 0.000344 | learning_rate: 0.000016 2025-04-12 15:38:02,587 - INFO - Epoch: 1.78 | loss: 0.000100 | grad_norm: 0.000305 | learning_rate: 0.000016 2025-04-12 15:38:09,160 - INFO - Epoch: 1.78 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000016 2025-04-12 15:38:15,001 - INFO - Epoch: 1.78 | loss: 0.000100 | grad_norm: 0.000412 | learning_rate: 0.000016 2025-04-12 15:38:21,005 - INFO - Epoch: 1.78 | loss: 0.000100 | grad_norm: 0.000604 | learning_rate: 0.000016 2025-04-12 15:38:26,740 - INFO - Epoch: 1.79 | loss: 0.000100 | grad_norm: 0.000788 | learning_rate: 0.000016 2025-04-12 15:38:32,686 - INFO - Epoch: 1.79 | loss: 0.000100 | grad_norm: 0.000498 | learning_rate: 0.000016 2025-04-12 15:38:38,655 - INFO - Epoch: 1.79 | loss: 0.000100 | grad_norm: 0.000404 | learning_rate: 0.000016 2025-04-12 15:38:44,809 - INFO - Epoch: 1.79 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000016 2025-04-12 15:38:50,729 - INFO - Epoch: 1.80 | loss: 0.000100 | grad_norm: 0.000385 | learning_rate: 0.000016 2025-04-12 15:38:56,927 - INFO - Epoch: 1.80 | loss: 0.000100 | grad_norm: 0.000459 | learning_rate: 0.000016 2025-04-12 15:39:02,964 - INFO - Epoch: 1.80 | loss: 0.000100 | grad_norm: 0.000533 | learning_rate: 0.000016 2025-04-12 15:39:08,972 - INFO - Epoch: 1.81 | loss: 0.000100 | grad_norm: 0.000379 | learning_rate: 0.000016 2025-04-12 15:39:15,169 - INFO - Epoch: 1.81 | loss: 0.000100 | grad_norm: 0.000403 | learning_rate: 0.000016 2025-04-12 15:39:21,110 - INFO - Epoch: 1.81 | loss: 0.000100 | grad_norm: 0.000574 | learning_rate: 0.000016 2025-04-12 15:39:27,128 - INFO - Epoch: 1.81 | loss: 0.000100 | grad_norm: 0.000762 | learning_rate: 0.000016 2025-04-12 15:39:33,091 - INFO - Epoch: 1.82 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000016 2025-04-12 15:39:39,286 - INFO - Epoch: 1.82 | loss: 0.000100 | grad_norm: 0.000517 | learning_rate: 0.000016 2025-04-12 15:39:45,175 - INFO - Epoch: 1.82 | loss: 0.000100 | grad_norm: 0.000369 | learning_rate: 0.000016 2025-04-12 15:39:51,391 - INFO - Epoch: 1.82 | loss: 0.000100 | grad_norm: 0.000606 | learning_rate: 0.000016 2025-04-12 15:39:57,088 - INFO - Epoch: 1.83 | loss: 0.000100 | grad_norm: 0.000462 | learning_rate: 0.000016 2025-04-12 15:40:03,348 - INFO - Epoch: 1.83 | loss: 0.000100 | grad_norm: 0.000413 | learning_rate: 0.000016 2025-04-12 15:40:09,340 - INFO - Epoch: 1.83 | loss: 0.000100 | grad_norm: 0.000320 | learning_rate: 0.000016 2025-04-12 15:40:15,471 - INFO - Epoch: 1.84 | loss: 0.000100 | grad_norm: 0.000505 | learning_rate: 0.000016 2025-04-12 15:40:21,674 - INFO - Epoch: 1.84 | loss: 0.000100 | grad_norm: 0.000756 | learning_rate: 0.000016 2025-04-12 15:40:28,022 - INFO - Epoch: 1.84 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000016 2025-04-12 15:40:34,148 - INFO - Epoch: 1.84 | loss: 0.000100 | grad_norm: 0.000566 | learning_rate: 0.000016 2025-04-12 15:40:40,421 - INFO - Epoch: 1.85 | loss: 0.000100 | grad_norm: 0.000405 | learning_rate: 0.000016 2025-04-12 15:40:46,569 - INFO - Epoch: 1.85 | loss: 0.000100 | grad_norm: 0.000620 | learning_rate: 0.000016 2025-04-12 15:40:52,810 - INFO - Epoch: 1.85 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000016 2025-04-12 15:40:58,987 - INFO - Epoch: 1.85 | loss: 0.000100 | grad_norm: 0.000697 | learning_rate: 0.000016 2025-04-12 15:41:05,217 - INFO - Epoch: 1.86 | loss: 0.000100 | grad_norm: 0.000570 | learning_rate: 0.000016 2025-04-12 15:41:11,633 - INFO - Epoch: 1.86 | loss: 0.000100 | grad_norm: 0.000606 | learning_rate: 0.000016 2025-04-12 15:41:17,638 - INFO - Epoch: 1.86 | loss: 0.000100 | grad_norm: 0.000404 | learning_rate: 0.000016 2025-04-12 15:41:24,167 - INFO - Epoch: 1.87 | loss: 0.000100 | grad_norm: 0.000520 | learning_rate: 0.000016 2025-04-12 15:41:30,024 - INFO - Epoch: 1.87 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000016 2025-04-12 15:41:35,825 - INFO - Epoch: 1.87 | loss: 0.000100 | grad_norm: 0.000819 | learning_rate: 0.000016 2025-04-12 15:41:42,239 - INFO - Epoch: 1.87 | loss: 0.000100 | grad_norm: 0.000283 | learning_rate: 0.000016 2025-04-12 15:41:47,909 - INFO - Epoch: 1.88 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000016 2025-04-12 15:41:54,241 - INFO - Epoch: 1.88 | loss: 0.000100 | grad_norm: 0.000783 | learning_rate: 0.000016 2025-04-12 15:41:59,756 - INFO - Epoch: 1.88 | loss: 0.000100 | grad_norm: 0.000496 | learning_rate: 0.000016 2025-04-12 15:42:05,708 - INFO - Epoch: 1.89 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000016 2025-04-12 15:42:12,311 - INFO - Epoch: 1.89 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000016 2025-04-12 15:42:18,327 - INFO - Epoch: 1.89 | loss: 0.000100 | grad_norm: 0.000277 | learning_rate: 0.000016 2025-04-12 15:42:24,760 - INFO - Epoch: 1.89 | loss: 0.000100 | grad_norm: 0.000332 | learning_rate: 0.000016 2025-04-12 15:42:30,860 - INFO - Epoch: 1.90 | loss: 0.000100 | grad_norm: 0.000559 | learning_rate: 0.000016 2025-04-12 15:42:36,913 - INFO - Epoch: 1.90 | loss: 0.000100 | grad_norm: 0.000539 | learning_rate: 0.000016 2025-04-12 15:42:43,080 - INFO - Epoch: 1.90 | loss: 0.000100 | grad_norm: 0.000508 | learning_rate: 0.000016 2025-04-12 15:42:49,431 - INFO - Epoch: 1.90 | loss: 0.000100 | grad_norm: 0.000581 | learning_rate: 0.000016 2025-04-12 15:42:55,513 - INFO - Epoch: 1.91 | loss: 0.000100 | grad_norm: 0.000329 | learning_rate: 0.000016 2025-04-12 15:43:01,668 - INFO - Epoch: 1.91 | loss: 0.000100 | grad_norm: 0.000551 | learning_rate: 0.000016 2025-04-12 15:43:07,800 - INFO - Epoch: 1.91 | loss: 0.000100 | grad_norm: 0.000404 | learning_rate: 0.000016 2025-04-12 15:43:13,874 - INFO - Epoch: 1.92 | loss: 0.000100 | grad_norm: 0.000771 | learning_rate: 0.000016 2025-04-12 15:43:20,035 - INFO - Epoch: 1.92 | loss: 0.000100 | grad_norm: 0.000403 | learning_rate: 0.000015 2025-04-12 15:43:26,066 - INFO - Epoch: 1.92 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000015 2025-04-12 15:43:32,335 - INFO - Epoch: 1.92 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000015 2025-04-12 15:43:38,258 - INFO - Epoch: 1.93 | loss: 0.000100 | grad_norm: 0.000320 | learning_rate: 0.000015 2025-04-12 15:43:44,102 - INFO - Epoch: 1.93 | loss: 0.000100 | grad_norm: 0.000559 | learning_rate: 0.000015 2025-04-12 15:43:49,787 - INFO - Epoch: 1.93 | loss: 0.000100 | grad_norm: 0.000393 | learning_rate: 0.000015 2025-04-12 15:43:55,921 - INFO - Epoch: 1.93 | loss: 0.000100 | grad_norm: 0.000321 | learning_rate: 0.000015 2025-04-12 15:44:02,061 - INFO - Epoch: 1.94 | loss: 0.000100 | grad_norm: 0.000634 | learning_rate: 0.000015 2025-04-12 15:44:08,192 - INFO - Epoch: 1.94 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000015 2025-04-12 15:44:14,365 - INFO - Epoch: 1.94 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000015 2025-04-12 15:44:20,890 - INFO - Epoch: 1.95 | loss: 0.000100 | grad_norm: 0.000464 | learning_rate: 0.000015 2025-04-12 15:44:26,667 - INFO - Epoch: 1.95 | loss: 0.000100 | grad_norm: 0.000628 | learning_rate: 0.000015 2025-04-12 15:44:32,930 - INFO - Epoch: 1.95 | loss: 0.000100 | grad_norm: 0.000277 | learning_rate: 0.000015 2025-04-12 15:44:38,898 - INFO - Epoch: 1.95 | loss: 0.000100 | grad_norm: 0.000591 | learning_rate: 0.000015 2025-04-12 15:44:45,312 - INFO - Epoch: 1.96 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000015 2025-04-12 15:44:51,287 - INFO - Epoch: 1.96 | loss: 0.000100 | grad_norm: 0.000630 | learning_rate: 0.000015 2025-04-12 15:44:57,430 - INFO - Epoch: 1.96 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000015 2025-04-12 15:45:03,821 - INFO - Epoch: 1.96 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000015 2025-04-12 15:45:09,811 - INFO - Epoch: 1.97 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000015 2025-04-12 15:45:16,025 - INFO - Epoch: 1.97 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000015 2025-04-12 15:45:21,840 - INFO - Epoch: 1.97 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000015 2025-04-12 15:45:28,317 - INFO - Epoch: 1.98 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000015 2025-04-12 15:45:34,895 - INFO - Epoch: 1.98 | loss: 0.000100 | grad_norm: 0.000772 | learning_rate: 0.000015 2025-04-12 15:45:40,974 - INFO - Epoch: 1.98 | loss: 0.000100 | grad_norm: 0.000331 | learning_rate: 0.000015 2025-04-12 15:45:46,687 - INFO - Epoch: 1.98 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000015 2025-04-12 15:45:52,690 - INFO - Epoch: 1.99 | loss: 0.000100 | grad_norm: 0.000682 | learning_rate: 0.000015 2025-04-12 15:45:59,002 - INFO - Epoch: 1.99 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000015 2025-04-12 15:46:05,004 - INFO - Epoch: 1.99 | loss: 0.000100 | grad_norm: 0.000389 | learning_rate: 0.000015 2025-04-12 15:46:11,265 - INFO - Epoch: 1.99 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000015 2025-04-12 15:46:16,738 - INFO - Epoch: 2.00 | loss: 0.000100 | grad_norm: 0.000282 | learning_rate: 0.000015 2025-04-12 15:46:22,366 - INFO - Epoch: 2.00 | loss: 0.000100 | grad_norm: 0.000289 | learning_rate: 0.000015 2025-04-12 15:46:31,674 - INFO - Epoch: 2.00 | loss: 0.000100 | grad_norm: 0.000416 | learning_rate: 0.000015 2025-04-12 15:46:38,007 - INFO - Epoch: 2.01 | loss: 0.000100 | grad_norm: 0.000460 | learning_rate: 0.000015 2025-04-12 15:46:44,440 - INFO - Epoch: 2.01 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000015 2025-04-12 15:46:50,132 - INFO - Epoch: 2.01 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000015 2025-04-12 15:46:55,693 - INFO - Epoch: 2.01 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000015 2025-04-12 15:47:01,618 - INFO - Epoch: 2.02 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000015 2025-04-12 15:47:08,023 - INFO - Epoch: 2.02 | loss: 0.000100 | grad_norm: 0.000508 | learning_rate: 0.000015 2025-04-12 15:47:14,168 - INFO - Epoch: 2.02 | loss: 0.000100 | grad_norm: 0.000224 | learning_rate: 0.000015 2025-04-12 15:47:19,711 - INFO - Epoch: 2.02 | loss: 0.000100 | grad_norm: 0.000628 | learning_rate: 0.000015 2025-04-12 15:47:26,057 - INFO - Epoch: 2.03 | loss: 0.000100 | grad_norm: 0.000341 | learning_rate: 0.000015 2025-04-12 15:47:32,100 - INFO - Epoch: 2.03 | loss: 0.000100 | grad_norm: 0.000360 | learning_rate: 0.000015 2025-04-12 15:47:38,113 - INFO - Epoch: 2.03 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000015 2025-04-12 15:47:44,183 - INFO - Epoch: 2.04 | loss: 0.000100 | grad_norm: 0.000453 | learning_rate: 0.000015 2025-04-12 15:47:49,885 - INFO - Epoch: 2.04 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000015 2025-04-12 15:47:55,996 - INFO - Epoch: 2.04 | loss: 0.000100 | grad_norm: 0.000510 | learning_rate: 0.000015 2025-04-12 15:48:02,348 - INFO - Epoch: 2.04 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000015 2025-04-12 15:48:08,282 - INFO - Epoch: 2.05 | loss: 0.000100 | grad_norm: 0.000283 | learning_rate: 0.000015 2025-04-12 15:48:14,178 - INFO - Epoch: 2.05 | loss: 0.000100 | grad_norm: 0.000402 | learning_rate: 0.000015 2025-04-12 15:48:20,162 - INFO - Epoch: 2.05 | loss: 0.000100 | grad_norm: 0.001253 | learning_rate: 0.000015 2025-04-12 15:48:25,997 - INFO - Epoch: 2.05 | loss: 0.000100 | grad_norm: 0.000551 | learning_rate: 0.000015 2025-04-12 15:48:31,697 - INFO - Epoch: 2.06 | loss: 0.000100 | grad_norm: 0.000555 | learning_rate: 0.000015 2025-04-12 15:48:37,433 - INFO - Epoch: 2.06 | loss: 0.000100 | grad_norm: 0.000378 | learning_rate: 0.000015 2025-04-12 15:48:43,637 - INFO - Epoch: 2.06 | loss: 0.000100 | grad_norm: 0.000659 | learning_rate: 0.000015 2025-04-12 15:48:49,523 - INFO - Epoch: 2.07 | loss: 0.000100 | grad_norm: 0.000671 | learning_rate: 0.000015 2025-04-12 15:48:55,810 - INFO - Epoch: 2.07 | loss: 0.000100 | grad_norm: 0.000659 | learning_rate: 0.000015 2025-04-12 15:49:01,814 - INFO - Epoch: 2.07 | loss: 0.000100 | grad_norm: 0.000354 | learning_rate: 0.000015 2025-04-12 15:49:07,760 - INFO - Epoch: 2.07 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000015 2025-04-12 15:49:13,792 - INFO - Epoch: 2.08 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000015 2025-04-12 15:49:19,971 - INFO - Epoch: 2.08 | loss: 0.000100 | grad_norm: 0.000465 | learning_rate: 0.000015 2025-04-12 15:49:26,486 - INFO - Epoch: 2.08 | loss: 0.000100 | grad_norm: 0.000399 | learning_rate: 0.000014 2025-04-12 15:49:32,411 - INFO - Epoch: 2.08 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000014 2025-04-12 15:49:38,341 - INFO - Epoch: 2.09 | loss: 0.000100 | grad_norm: 0.000585 | learning_rate: 0.000014 2025-04-12 15:49:44,166 - INFO - Epoch: 2.09 | loss: 0.000100 | grad_norm: 0.000479 | learning_rate: 0.000014 2025-04-12 15:49:50,324 - INFO - Epoch: 2.09 | loss: 0.000100 | grad_norm: 0.000669 | learning_rate: 0.000014 2025-04-12 15:49:56,241 - INFO - Epoch: 2.10 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000014 2025-04-12 15:50:02,319 - INFO - Epoch: 2.10 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000014 2025-04-12 15:50:08,297 - INFO - Epoch: 2.10 | loss: 0.000100 | grad_norm: 0.000514 | learning_rate: 0.000014 2025-04-12 15:50:14,135 - INFO - Epoch: 2.10 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000014 2025-04-12 15:50:20,316 - INFO - Epoch: 2.11 | loss: 0.000100 | grad_norm: 0.000476 | learning_rate: 0.000014 2025-04-12 15:50:25,996 - INFO - Epoch: 2.11 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000014 2025-04-12 15:50:31,919 - INFO - Epoch: 2.11 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000014 2025-04-12 15:50:38,146 - INFO - Epoch: 2.12 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000014 2025-04-12 15:50:44,125 - INFO - Epoch: 2.12 | loss: 0.000100 | grad_norm: 0.000376 | learning_rate: 0.000014 2025-04-12 15:50:50,154 - INFO - Epoch: 2.12 | loss: 0.000100 | grad_norm: 0.000687 | learning_rate: 0.000014 2025-04-12 15:50:56,337 - INFO - Epoch: 2.12 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000014 2025-04-12 15:51:02,361 - INFO - Epoch: 2.13 | loss: 0.000100 | grad_norm: 0.000316 | learning_rate: 0.000014 2025-04-12 15:51:08,798 - INFO - Epoch: 2.13 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000014 2025-04-12 15:51:15,246 - INFO - Epoch: 2.13 | loss: 0.000100 | grad_norm: 0.000460 | learning_rate: 0.000014 2025-04-12 15:51:21,175 - INFO - Epoch: 2.13 | loss: 0.000100 | grad_norm: 0.001031 | learning_rate: 0.000014 2025-04-12 15:51:27,285 - INFO - Epoch: 2.14 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000014 2025-04-12 15:51:33,566 - INFO - Epoch: 2.14 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000014 2025-04-12 15:51:39,788 - INFO - Epoch: 2.14 | loss: 0.000100 | grad_norm: 0.000535 | learning_rate: 0.000014 2025-04-12 15:51:45,941 - INFO - Epoch: 2.15 | loss: 0.000100 | grad_norm: 0.000388 | learning_rate: 0.000014 2025-04-12 15:51:52,062 - INFO - Epoch: 2.15 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000014 2025-04-12 15:51:58,024 - INFO - Epoch: 2.15 | loss: 0.000100 | grad_norm: 0.000646 | learning_rate: 0.000014 2025-04-12 15:52:03,722 - INFO - Epoch: 2.15 | loss: 0.000100 | grad_norm: 0.000412 | learning_rate: 0.000014 2025-04-12 15:52:09,960 - INFO - Epoch: 2.16 | loss: 0.000100 | grad_norm: 0.000578 | learning_rate: 0.000014 2025-04-12 15:52:16,056 - INFO - Epoch: 2.16 | loss: 0.000100 | grad_norm: 0.000508 | learning_rate: 0.000014 2025-04-12 15:52:22,327 - INFO - Epoch: 2.16 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000014 2025-04-12 15:52:28,650 - INFO - Epoch: 2.16 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000014 2025-04-12 15:52:34,598 - INFO - Epoch: 2.17 | loss: 0.000100 | grad_norm: 0.000370 | learning_rate: 0.000014 2025-04-12 15:52:40,729 - INFO - Epoch: 2.17 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000014 2025-04-12 15:52:46,697 - INFO - Epoch: 2.17 | loss: 0.000100 | grad_norm: 0.000367 | learning_rate: 0.000014 2025-04-12 15:52:52,909 - INFO - Epoch: 2.18 | loss: 0.000100 | grad_norm: 0.000432 | learning_rate: 0.000014 2025-04-12 15:52:59,045 - INFO - Epoch: 2.18 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000014 2025-04-12 15:53:05,167 - INFO - Epoch: 2.18 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000014 2025-04-12 15:53:11,525 - INFO - Epoch: 2.18 | loss: 0.000100 | grad_norm: 0.000453 | learning_rate: 0.000014 2025-04-12 15:53:17,321 - INFO - Epoch: 2.19 | loss: 0.000100 | grad_norm: 0.000501 | learning_rate: 0.000014 2025-04-12 15:53:23,478 - INFO - Epoch: 2.19 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000014 2025-04-12 15:53:29,397 - INFO - Epoch: 2.19 | loss: 0.000100 | grad_norm: 0.000367 | learning_rate: 0.000014 2025-04-12 15:53:35,350 - INFO - Epoch: 2.19 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000014 2025-04-12 15:53:41,496 - INFO - Epoch: 2.20 | loss: 0.000100 | grad_norm: 0.000416 | learning_rate: 0.000014 2025-04-12 15:53:47,405 - INFO - Epoch: 2.20 | loss: 0.000100 | grad_norm: 0.000404 | learning_rate: 0.000014 2025-04-12 15:53:53,961 - INFO - Epoch: 2.20 | loss: 0.000100 | grad_norm: 0.000298 | learning_rate: 0.000014 2025-04-12 15:54:00,242 - INFO - Epoch: 2.21 | loss: 0.000100 | grad_norm: 0.000540 | learning_rate: 0.000014 2025-04-12 15:54:06,548 - INFO - Epoch: 2.21 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000014 2025-04-12 15:54:12,715 - INFO - Epoch: 2.21 | loss: 0.000100 | grad_norm: 0.000314 | learning_rate: 0.000014 2025-04-12 15:54:18,871 - INFO - Epoch: 2.21 | loss: 0.000100 | grad_norm: 0.000671 | learning_rate: 0.000014 2025-04-12 15:54:25,014 - INFO - Epoch: 2.22 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000014 2025-04-12 15:54:31,064 - INFO - Epoch: 2.22 | loss: 0.000100 | grad_norm: 0.000657 | learning_rate: 0.000014 2025-04-12 15:54:37,647 - INFO - Epoch: 2.22 | loss: 0.000100 | grad_norm: 0.000391 | learning_rate: 0.000014 2025-04-12 15:54:43,733 - INFO - Epoch: 2.22 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000014 2025-04-12 15:54:49,579 - INFO - Epoch: 2.23 | loss: 0.000100 | grad_norm: 0.000413 | learning_rate: 0.000014 2025-04-12 15:54:55,896 - INFO - Epoch: 2.23 | loss: 0.000100 | grad_norm: 0.000624 | learning_rate: 0.000014 2025-04-12 15:55:01,808 - INFO - Epoch: 2.23 | loss: 0.000100 | grad_norm: 0.000431 | learning_rate: 0.000014 2025-04-12 15:55:08,129 - INFO - Epoch: 2.24 | loss: 0.000100 | grad_norm: 0.000505 | learning_rate: 0.000014 2025-04-12 15:55:14,188 - INFO - Epoch: 2.24 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000013 2025-04-12 15:55:20,536 - INFO - Epoch: 2.24 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000013 2025-04-12 15:55:26,913 - INFO - Epoch: 2.24 | loss: 0.000100 | grad_norm: 0.000526 | learning_rate: 0.000013 2025-04-12 15:55:33,430 - INFO - Epoch: 2.25 | loss: 0.000100 | grad_norm: 0.000547 | learning_rate: 0.000013 2025-04-12 15:55:39,708 - INFO - Epoch: 2.25 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000013 2025-04-12 15:55:46,026 - INFO - Epoch: 2.25 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000013 2025-04-12 15:55:52,174 - INFO - Epoch: 2.25 | loss: 0.000100 | grad_norm: 0.000328 | learning_rate: 0.000013 2025-04-12 15:55:58,201 - INFO - Epoch: 2.26 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000013 2025-04-12 15:56:04,047 - INFO - Epoch: 2.26 | loss: 0.000100 | grad_norm: 0.000443 | learning_rate: 0.000013 2025-04-12 15:56:10,087 - INFO - Epoch: 2.26 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000013 2025-04-12 15:56:16,271 - INFO - Epoch: 2.27 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000013 2025-04-12 15:56:22,471 - INFO - Epoch: 2.27 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000013 2025-04-12 15:56:28,415 - INFO - Epoch: 2.27 | loss: 0.000100 | grad_norm: 0.000405 | learning_rate: 0.000013 2025-04-12 15:56:34,316 - INFO - Epoch: 2.27 | loss: 0.000100 | grad_norm: 0.000292 | learning_rate: 0.000013 2025-04-12 15:56:40,462 - INFO - Epoch: 2.28 | loss: 0.000100 | grad_norm: 0.000454 | learning_rate: 0.000013 2025-04-12 15:56:46,429 - INFO - Epoch: 2.28 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000013 2025-04-12 15:56:52,684 - INFO - Epoch: 2.28 | loss: 0.000100 | grad_norm: 0.000423 | learning_rate: 0.000013 2025-04-12 15:56:58,920 - INFO - Epoch: 2.28 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000013 2025-04-12 15:57:04,912 - INFO - Epoch: 2.29 | loss: 0.000100 | grad_norm: 0.000767 | learning_rate: 0.000013 2025-04-12 15:57:11,193 - INFO - Epoch: 2.29 | loss: 0.000100 | grad_norm: 0.000525 | learning_rate: 0.000013 2025-04-12 15:57:17,025 - INFO - Epoch: 2.29 | loss: 0.000100 | grad_norm: 0.000360 | learning_rate: 0.000013 2025-04-12 15:57:23,260 - INFO - Epoch: 2.30 | loss: 0.000100 | grad_norm: 0.000377 | learning_rate: 0.000013 2025-04-12 15:57:29,584 - INFO - Epoch: 2.30 | loss: 0.000100 | grad_norm: 0.000316 | learning_rate: 0.000013 2025-04-12 15:57:35,443 - INFO - Epoch: 2.30 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000013 2025-04-12 15:57:41,075 - INFO - Epoch: 2.30 | loss: 0.000100 | grad_norm: 0.000247 | learning_rate: 0.000013 2025-04-12 15:57:47,045 - INFO - Epoch: 2.31 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000013 2025-04-12 15:57:53,354 - INFO - Epoch: 2.31 | loss: 0.000100 | grad_norm: 0.000281 | learning_rate: 0.000013 2025-04-12 15:57:59,283 - INFO - Epoch: 2.31 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000013 2025-04-12 15:58:05,257 - INFO - Epoch: 2.32 | loss: 0.000100 | grad_norm: 0.000345 | learning_rate: 0.000013 2025-04-12 15:58:11,366 - INFO - Epoch: 2.32 | loss: 0.000100 | grad_norm: 0.000261 | learning_rate: 0.000013 2025-04-12 15:58:17,634 - INFO - Epoch: 2.32 | loss: 0.000100 | grad_norm: 0.000264 | learning_rate: 0.000013 2025-04-12 15:58:24,098 - INFO - Epoch: 2.32 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000013 2025-04-12 15:58:30,006 - INFO - Epoch: 2.33 | loss: 0.000100 | grad_norm: 0.000399 | learning_rate: 0.000013 2025-04-12 15:58:36,121 - INFO - Epoch: 2.33 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000013 2025-04-12 15:58:42,371 - INFO - Epoch: 2.33 | loss: 0.000100 | grad_norm: 0.000483 | learning_rate: 0.000013 2025-04-12 15:58:48,446 - INFO - Epoch: 2.33 | loss: 0.000100 | grad_norm: 0.000312 | learning_rate: 0.000013 2025-04-12 15:58:54,511 - INFO - Epoch: 2.34 | loss: 0.000100 | grad_norm: 0.000572 | learning_rate: 0.000013 2025-04-12 15:59:00,844 - INFO - Epoch: 2.34 | loss: 0.000100 | grad_norm: 0.000347 | learning_rate: 0.000013 2025-04-12 15:59:06,978 - INFO - Epoch: 2.34 | loss: 0.000100 | grad_norm: 0.000496 | learning_rate: 0.000013 2025-04-12 15:59:12,587 - INFO - Epoch: 2.35 | loss: 0.000100 | grad_norm: 0.000511 | learning_rate: 0.000013 2025-04-12 15:59:19,023 - INFO - Epoch: 2.35 | loss: 0.000100 | grad_norm: 0.000288 | learning_rate: 0.000013 2025-04-12 15:59:25,238 - INFO - Epoch: 2.35 | loss: 0.000100 | grad_norm: 0.000535 | learning_rate: 0.000013 2025-04-12 15:59:31,613 - INFO - Epoch: 2.35 | loss: 0.000100 | grad_norm: 0.000307 | learning_rate: 0.000013 2025-04-12 15:59:37,816 - INFO - Epoch: 2.36 | loss: 0.000100 | grad_norm: 0.000347 | learning_rate: 0.000013 2025-04-12 15:59:43,773 - INFO - Epoch: 2.36 | loss: 0.000100 | grad_norm: 0.000471 | learning_rate: 0.000013 2025-04-12 15:59:49,668 - INFO - Epoch: 2.36 | loss: 0.000100 | grad_norm: 0.000915 | learning_rate: 0.000013 2025-04-12 15:59:55,458 - INFO - Epoch: 2.36 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000013 2025-04-12 16:00:01,390 - INFO - Epoch: 2.37 | loss: 0.000100 | grad_norm: 0.000519 | learning_rate: 0.000013 2025-04-12 16:00:07,671 - INFO - Epoch: 2.37 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000013 2025-04-12 16:00:13,786 - INFO - Epoch: 2.37 | loss: 0.000100 | grad_norm: 0.000572 | learning_rate: 0.000013 2025-04-12 16:00:19,876 - INFO - Epoch: 2.38 | loss: 0.000100 | grad_norm: 0.000586 | learning_rate: 0.000013 2025-04-12 16:00:26,101 - INFO - Epoch: 2.38 | loss: 0.000100 | grad_norm: 0.000359 | learning_rate: 0.000013 2025-04-12 16:00:32,227 - INFO - Epoch: 2.38 | loss: 0.000100 | grad_norm: 0.000756 | learning_rate: 0.000013 2025-04-12 16:00:38,150 - INFO - Epoch: 2.38 | loss: 0.000100 | grad_norm: 0.000409 | learning_rate: 0.000013 2025-04-12 16:00:44,338 - INFO - Epoch: 2.39 | loss: 0.000100 | grad_norm: 0.000708 | learning_rate: 0.000013 2025-04-12 16:00:50,470 - INFO - Epoch: 2.39 | loss: 0.000100 | grad_norm: 0.000714 | learning_rate: 0.000012 2025-04-12 16:00:56,524 - INFO - Epoch: 2.39 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000012 2025-04-12 16:01:02,956 - INFO - Epoch: 2.39 | loss: 0.000100 | grad_norm: 0.000435 | learning_rate: 0.000012 2025-04-12 16:01:08,881 - INFO - Epoch: 2.40 | loss: 0.000100 | grad_norm: 0.000505 | learning_rate: 0.000012 2025-04-12 16:01:15,251 - INFO - Epoch: 2.40 | loss: 0.000100 | grad_norm: 0.000259 | learning_rate: 0.000012 2025-04-12 16:01:21,430 - INFO - Epoch: 2.40 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000012 2025-04-12 16:01:27,427 - INFO - Epoch: 2.41 | loss: 0.000100 | grad_norm: 0.000531 | learning_rate: 0.000012 2025-04-12 16:01:33,390 - INFO - Epoch: 2.41 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000012 2025-04-12 16:01:39,571 - INFO - Epoch: 2.41 | loss: 0.000100 | grad_norm: 0.000377 | learning_rate: 0.000012 2025-04-12 16:01:45,962 - INFO - Epoch: 2.41 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000012 2025-04-12 16:01:52,431 - INFO - Epoch: 2.42 | loss: 0.000100 | grad_norm: 0.000441 | learning_rate: 0.000012 2025-04-12 16:01:58,442 - INFO - Epoch: 2.42 | loss: 0.000100 | grad_norm: 0.000573 | learning_rate: 0.000012 2025-04-12 16:02:04,694 - INFO - Epoch: 2.42 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000012 2025-04-12 16:02:10,861 - INFO - Epoch: 2.42 | loss: 0.000100 | grad_norm: 0.000529 | learning_rate: 0.000012 2025-04-12 16:02:16,935 - INFO - Epoch: 2.43 | loss: 0.000100 | grad_norm: 0.000583 | learning_rate: 0.000012 2025-04-12 16:02:23,030 - INFO - Epoch: 2.43 | loss: 0.000100 | grad_norm: 0.000477 | learning_rate: 0.000012 2025-04-12 16:02:28,994 - INFO - Epoch: 2.43 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000012 2025-04-12 16:02:35,045 - INFO - Epoch: 2.44 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000012 2025-04-12 16:02:41,220 - INFO - Epoch: 2.44 | loss: 0.000100 | grad_norm: 0.000601 | learning_rate: 0.000012 2025-04-12 16:02:47,447 - INFO - Epoch: 2.44 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000012 2025-04-12 16:02:53,947 - INFO - Epoch: 2.44 | loss: 0.000100 | grad_norm: 0.000354 | learning_rate: 0.000012 2025-04-12 16:02:59,921 - INFO - Epoch: 2.45 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000012 2025-04-12 16:03:06,080 - INFO - Epoch: 2.45 | loss: 0.000100 | grad_norm: 0.000850 | learning_rate: 0.000012 2025-04-12 16:03:12,216 - INFO - Epoch: 2.45 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000012 2025-04-12 16:03:18,324 - INFO - Epoch: 2.45 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000012 2025-04-12 16:03:24,732 - INFO - Epoch: 2.46 | loss: 0.000100 | grad_norm: 0.000400 | learning_rate: 0.000012 2025-04-12 16:03:30,665 - INFO - Epoch: 2.46 | loss: 0.000100 | grad_norm: 0.000634 | learning_rate: 0.000012 2025-04-12 16:03:36,817 - INFO - Epoch: 2.46 | loss: 0.000100 | grad_norm: 0.000622 | learning_rate: 0.000012 2025-04-12 16:03:43,208 - INFO - Epoch: 2.47 | loss: 0.000100 | grad_norm: 0.000451 | learning_rate: 0.000012 2025-04-12 16:03:49,450 - INFO - Epoch: 2.47 | loss: 0.000100 | grad_norm: 0.000640 | learning_rate: 0.000012 2025-04-12 16:03:55,682 - INFO - Epoch: 2.47 | loss: 0.000100 | grad_norm: 0.000465 | learning_rate: 0.000012 2025-04-12 16:04:01,992 - INFO - Epoch: 2.47 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000012 2025-04-12 16:04:07,900 - INFO - Epoch: 2.48 | loss: 0.000100 | grad_norm: 0.000487 | learning_rate: 0.000012 2025-04-12 16:04:13,676 - INFO - Epoch: 2.48 | loss: 0.000100 | grad_norm: 0.000593 | learning_rate: 0.000012 2025-04-12 16:04:19,573 - INFO - Epoch: 2.48 | loss: 0.000100 | grad_norm: 0.000472 | learning_rate: 0.000012 2025-04-12 16:04:25,860 - INFO - Epoch: 2.48 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000012 2025-04-12 16:04:31,831 - INFO - Epoch: 2.49 | loss: 0.000100 | grad_norm: 0.000410 | learning_rate: 0.000012 2025-04-12 16:04:37,839 - INFO - Epoch: 2.49 | loss: 0.000100 | grad_norm: 0.000574 | learning_rate: 0.000012 2025-04-12 16:04:43,547 - INFO - Epoch: 2.49 | loss: 0.000100 | grad_norm: 0.000431 | learning_rate: 0.000012 2025-04-12 16:04:49,512 - INFO - Epoch: 2.50 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000012 2025-04-12 16:04:55,476 - INFO - Epoch: 2.50 | loss: 0.000100 | grad_norm: 0.000377 | learning_rate: 0.000012 2025-04-12 16:05:01,724 - INFO - Epoch: 2.50 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000012 2025-04-12 16:05:07,724 - INFO - Epoch: 2.50 | loss: 0.000100 | grad_norm: 0.000566 | learning_rate: 0.000012 2025-04-12 16:05:14,054 - INFO - Epoch: 2.51 | loss: 0.000100 | grad_norm: 0.000473 | learning_rate: 0.000012 2025-04-12 16:05:19,737 - INFO - Epoch: 2.51 | loss: 0.000100 | grad_norm: 0.000562 | learning_rate: 0.000012 2025-04-12 16:05:25,791 - INFO - Epoch: 2.51 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000012 2025-04-12 16:05:32,421 - INFO - Epoch: 2.52 | loss: 0.000100 | grad_norm: 0.000650 | learning_rate: 0.000012 2025-04-12 16:05:38,752 - INFO - Epoch: 2.52 | loss: 0.000100 | grad_norm: 0.000496 | learning_rate: 0.000012 2025-04-12 16:05:44,876 - INFO - Epoch: 2.52 | loss: 0.000100 | grad_norm: 0.000587 | learning_rate: 0.000012 2025-04-12 16:05:50,730 - INFO - Epoch: 2.52 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000012 2025-04-12 16:05:56,970 - INFO - Epoch: 2.53 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000012 2025-04-12 16:06:03,028 - INFO - Epoch: 2.53 | loss: 0.000100 | grad_norm: 0.000406 | learning_rate: 0.000012 2025-04-12 16:06:09,691 - INFO - Epoch: 2.53 | loss: 0.000100 | grad_norm: 0.001681 | learning_rate: 0.000012 2025-04-12 16:06:16,224 - INFO - Epoch: 2.53 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000011 2025-04-12 16:06:22,495 - INFO - Epoch: 2.54 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000011 2025-04-12 16:06:28,638 - INFO - Epoch: 2.54 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000011 2025-04-12 16:06:34,974 - INFO - Epoch: 2.54 | loss: 0.000100 | grad_norm: 0.000498 | learning_rate: 0.000011 2025-04-12 16:06:40,873 - INFO - Epoch: 2.55 | loss: 0.000100 | grad_norm: 0.000488 | learning_rate: 0.000011 2025-04-12 16:06:47,003 - INFO - Epoch: 2.55 | loss: 0.000100 | grad_norm: 0.000470 | learning_rate: 0.000011 2025-04-12 16:06:53,118 - INFO - Epoch: 2.55 | loss: 0.000100 | grad_norm: 0.000351 | learning_rate: 0.000011 2025-04-12 16:06:59,346 - INFO - Epoch: 2.55 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000011 2025-04-12 16:07:05,445 - INFO - Epoch: 2.56 | loss: 0.000100 | grad_norm: 0.000455 | learning_rate: 0.000011 2025-04-12 16:07:11,724 - INFO - Epoch: 2.56 | loss: 0.000100 | grad_norm: 0.000378 | learning_rate: 0.000011 2025-04-12 16:07:17,773 - INFO - Epoch: 2.56 | loss: 0.000100 | grad_norm: 0.000533 | learning_rate: 0.000011 2025-04-12 16:07:23,694 - INFO - Epoch: 2.56 | loss: 0.000100 | grad_norm: 0.000304 | learning_rate: 0.000011 2025-04-12 16:07:29,678 - INFO - Epoch: 2.57 | loss: 0.000100 | grad_norm: 0.000315 | learning_rate: 0.000011 2025-04-12 16:07:35,828 - INFO - Epoch: 2.57 | loss: 0.000100 | grad_norm: 0.000470 | learning_rate: 0.000011 2025-04-12 16:07:41,902 - INFO - Epoch: 2.57 | loss: 0.000100 | grad_norm: 0.000432 | learning_rate: 0.000011 2025-04-12 16:07:48,170 - INFO - Epoch: 2.58 | loss: 0.000100 | grad_norm: 0.000420 | learning_rate: 0.000011 2025-04-12 16:07:54,645 - INFO - Epoch: 2.58 | loss: 0.000100 | grad_norm: 0.000727 | learning_rate: 0.000011 2025-04-12 16:08:00,825 - INFO - Epoch: 2.58 | loss: 0.000100 | grad_norm: 0.000627 | learning_rate: 0.000011 2025-04-12 16:08:06,849 - INFO - Epoch: 2.58 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000011 2025-04-12 16:08:12,808 - INFO - Epoch: 2.59 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000011 2025-04-12 16:08:19,074 - INFO - Epoch: 2.59 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000011 2025-04-12 16:08:25,196 - INFO - Epoch: 2.59 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000011 2025-04-12 16:08:30,974 - INFO - Epoch: 2.59 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000011 2025-04-12 16:08:37,156 - INFO - Epoch: 2.60 | loss: 0.000100 | grad_norm: 0.000700 | learning_rate: 0.000011 2025-04-12 16:08:43,007 - INFO - Epoch: 2.60 | loss: 0.000100 | grad_norm: 0.000371 | learning_rate: 0.000011 2025-04-12 16:08:49,323 - INFO - Epoch: 2.60 | loss: 0.000100 | grad_norm: 0.000459 | learning_rate: 0.000011 2025-04-12 16:08:55,310 - INFO - Epoch: 2.61 | loss: 0.000100 | grad_norm: 0.000504 | learning_rate: 0.000011 2025-04-12 16:09:01,309 - INFO - Epoch: 2.61 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000011 2025-04-12 16:09:07,241 - INFO - Epoch: 2.61 | loss: 0.000100 | grad_norm: 0.000442 | learning_rate: 0.000011 2025-04-12 16:09:13,578 - INFO - Epoch: 2.61 | loss: 0.000100 | grad_norm: 0.000396 | learning_rate: 0.000011 2025-04-12 16:09:19,504 - INFO - Epoch: 2.62 | loss: 0.000100 | grad_norm: 0.000290 | learning_rate: 0.000011 2025-04-12 16:09:25,822 - INFO - Epoch: 2.62 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000011 2025-04-12 16:09:31,833 - INFO - Epoch: 2.62 | loss: 0.000100 | grad_norm: 0.000538 | learning_rate: 0.000011 2025-04-12 16:09:38,148 - INFO - Epoch: 2.62 | loss: 0.000100 | grad_norm: 0.000326 | learning_rate: 0.000011 2025-04-12 16:09:44,158 - INFO - Epoch: 2.63 | loss: 0.000100 | grad_norm: 0.000540 | learning_rate: 0.000011 2025-04-12 16:09:50,107 - INFO - Epoch: 2.63 | loss: 0.000100 | grad_norm: 0.000348 | learning_rate: 0.000011 2025-04-12 16:09:55,797 - INFO - Epoch: 2.63 | loss: 0.000100 | grad_norm: 0.000416 | learning_rate: 0.000011 2025-04-12 16:10:01,918 - INFO - Epoch: 2.64 | loss: 0.000100 | grad_norm: 0.000325 | learning_rate: 0.000011 2025-04-12 16:10:08,019 - INFO - Epoch: 2.64 | loss: 0.000100 | grad_norm: 0.000412 | learning_rate: 0.000011 2025-04-12 16:10:14,523 - INFO - Epoch: 2.64 | loss: 0.000100 | grad_norm: 0.000469 | learning_rate: 0.000011 2025-04-12 16:10:20,769 - INFO - Epoch: 2.64 | loss: 0.000100 | grad_norm: 0.000455 | learning_rate: 0.000011 2025-04-12 16:10:27,029 - INFO - Epoch: 2.65 | loss: 0.000100 | grad_norm: 0.000561 | learning_rate: 0.000011 2025-04-12 16:10:33,635 - INFO - Epoch: 2.65 | loss: 0.000100 | grad_norm: 0.000392 | learning_rate: 0.000011 2025-04-12 16:10:39,671 - INFO - Epoch: 2.65 | loss: 0.000100 | grad_norm: 0.000385 | learning_rate: 0.000011 2025-04-12 16:10:45,556 - INFO - Epoch: 2.65 | loss: 0.000100 | grad_norm: 0.000583 | learning_rate: 0.000011 2025-04-12 16:10:51,105 - INFO - Epoch: 2.66 | loss: 0.000100 | grad_norm: 0.000499 | learning_rate: 0.000011 2025-04-12 16:10:57,408 - INFO - Epoch: 2.66 | loss: 0.000100 | grad_norm: 0.000389 | learning_rate: 0.000011 2025-04-12 16:11:03,559 - INFO - Epoch: 2.66 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000011 2025-04-12 16:11:09,598 - INFO - Epoch: 2.67 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000011 2025-04-12 16:11:15,899 - INFO - Epoch: 2.67 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000011 2025-04-12 16:11:22,162 - INFO - Epoch: 2.67 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000011 2025-04-12 16:11:28,258 - INFO - Epoch: 2.67 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000011 2025-04-12 16:11:34,241 - INFO - Epoch: 2.68 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000011 2025-04-12 16:11:40,182 - INFO - Epoch: 2.68 | loss: 0.000100 | grad_norm: 0.000431 | learning_rate: 0.000010 2025-04-12 16:11:46,184 - INFO - Epoch: 2.68 | loss: 0.000100 | grad_norm: 0.000427 | learning_rate: 0.000010 2025-04-12 16:11:52,580 - INFO - Epoch: 2.69 | loss: 0.000100 | grad_norm: 0.000362 | learning_rate: 0.000010 2025-04-12 16:11:58,693 - INFO - Epoch: 2.69 | loss: 0.000100 | grad_norm: 0.000437 | learning_rate: 0.000010 2025-04-12 16:12:05,138 - INFO - Epoch: 2.69 | loss: 0.000100 | grad_norm: 0.000493 | learning_rate: 0.000010 2025-04-12 16:12:10,838 - INFO - Epoch: 2.69 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000010 2025-04-12 16:12:16,653 - INFO - Epoch: 2.70 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000010 2025-04-12 16:12:22,595 - INFO - Epoch: 2.70 | loss: 0.000100 | grad_norm: 0.000396 | learning_rate: 0.000010 2025-04-12 16:12:28,560 - INFO - Epoch: 2.70 | loss: 0.000100 | grad_norm: 0.000638 | learning_rate: 0.000010 2025-04-12 16:12:34,724 - INFO - Epoch: 2.70 | loss: 0.000100 | grad_norm: 0.000474 | learning_rate: 0.000010 2025-04-12 16:12:40,930 - INFO - Epoch: 2.71 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000010 2025-04-12 16:12:47,131 - INFO - Epoch: 2.71 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000010 2025-04-12 16:12:53,245 - INFO - Epoch: 2.71 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000010 2025-04-12 16:12:59,662 - INFO - Epoch: 2.72 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000010 2025-04-12 16:13:05,763 - INFO - Epoch: 2.72 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000010 2025-04-12 16:13:12,099 - INFO - Epoch: 2.72 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000010 2025-04-12 16:13:18,188 - INFO - Epoch: 2.72 | loss: 0.000100 | grad_norm: 0.000618 | learning_rate: 0.000010 2025-04-12 16:13:24,074 - INFO - Epoch: 2.73 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000010 2025-04-12 16:13:29,898 - INFO - Epoch: 2.73 | loss: 0.000100 | grad_norm: 0.000268 | learning_rate: 0.000010 2025-04-12 16:13:35,745 - INFO - Epoch: 2.73 | loss: 0.000100 | grad_norm: 0.000423 | learning_rate: 0.000010 2025-04-12 16:13:41,770 - INFO - Epoch: 2.73 | loss: 0.000100 | grad_norm: 0.000579 | learning_rate: 0.000010 2025-04-12 16:13:47,991 - INFO - Epoch: 2.74 | loss: 0.000100 | grad_norm: 0.000405 | learning_rate: 0.000010 2025-04-12 16:13:54,074 - INFO - Epoch: 2.74 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000010 2025-04-12 16:14:00,618 - INFO - Epoch: 2.74 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000010 2025-04-12 16:14:06,691 - INFO - Epoch: 2.75 | loss: 0.000100 | grad_norm: 0.000481 | learning_rate: 0.000010 2025-04-12 16:14:12,823 - INFO - Epoch: 2.75 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000010 2025-04-12 16:14:19,394 - INFO - Epoch: 2.75 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000010 2025-04-12 16:14:25,413 - INFO - Epoch: 2.75 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000010 2025-04-12 16:14:31,655 - INFO - Epoch: 2.76 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000010 2025-04-12 16:14:37,673 - INFO - Epoch: 2.76 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000010 2025-04-12 16:14:43,730 - INFO - Epoch: 2.76 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000010 2025-04-12 16:14:49,655 - INFO - Epoch: 2.76 | loss: 0.000100 | grad_norm: 0.000423 | learning_rate: 0.000010 2025-04-12 16:14:55,549 - INFO - Epoch: 2.77 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000010 2025-04-12 16:15:01,943 - INFO - Epoch: 2.77 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000010 2025-04-12 16:15:08,128 - INFO - Epoch: 2.77 | loss: 0.000100 | grad_norm: 0.001203 | learning_rate: 0.000010 2025-04-12 16:15:14,293 - INFO - Epoch: 2.78 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000010 2025-04-12 16:15:20,656 - INFO - Epoch: 2.78 | loss: 0.000100 | grad_norm: 0.000291 | learning_rate: 0.000010 2025-04-12 16:15:26,834 - INFO - Epoch: 2.78 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000010 2025-04-12 16:15:33,060 - INFO - Epoch: 2.78 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000010 2025-04-12 16:15:38,892 - INFO - Epoch: 2.79 | loss: 0.000100 | grad_norm: 0.000534 | learning_rate: 0.000010 2025-04-12 16:15:44,927 - INFO - Epoch: 2.79 | loss: 0.000100 | grad_norm: 0.000459 | learning_rate: 0.000010 2025-04-12 16:15:50,603 - INFO - Epoch: 2.79 | loss: 0.000100 | grad_norm: 0.000379 | learning_rate: 0.000010 2025-04-12 16:15:56,473 - INFO - Epoch: 2.79 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000010 2025-04-12 16:16:02,688 - INFO - Epoch: 2.80 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000010 2025-04-12 16:16:08,564 - INFO - Epoch: 2.80 | loss: 0.000100 | grad_norm: 0.001311 | learning_rate: 0.000010 2025-04-12 16:16:14,453 - INFO - Epoch: 2.80 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000010 2025-04-12 16:16:20,626 - INFO - Epoch: 2.81 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000010 2025-04-12 16:16:26,832 - INFO - Epoch: 2.81 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000010 2025-04-12 16:16:33,082 - INFO - Epoch: 2.81 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000010 2025-04-12 16:16:38,792 - INFO - Epoch: 2.81 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000010 2025-04-12 16:16:45,088 - INFO - Epoch: 2.82 | loss: 0.000100 | grad_norm: 0.000320 | learning_rate: 0.000010 2025-04-12 16:16:50,735 - INFO - Epoch: 2.82 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000010 2025-04-12 16:16:56,635 - INFO - Epoch: 2.82 | loss: 0.000100 | grad_norm: 0.000228 | learning_rate: 0.000009 2025-04-12 16:17:02,785 - INFO - Epoch: 2.82 | loss: 0.000100 | grad_norm: 0.000486 | learning_rate: 0.000009 2025-04-12 16:17:08,727 - INFO - Epoch: 2.83 | loss: 0.000100 | grad_norm: 0.000485 | learning_rate: 0.000009 2025-04-12 16:17:14,730 - INFO - Epoch: 2.83 | loss: 0.000100 | grad_norm: 0.000287 | learning_rate: 0.000009 2025-04-12 16:17:20,595 - INFO - Epoch: 2.83 | loss: 0.000100 | grad_norm: 0.000285 | learning_rate: 0.000009 2025-04-12 16:17:26,501 - INFO - Epoch: 2.84 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000009 2025-04-12 16:17:32,605 - INFO - Epoch: 2.84 | loss: 0.000100 | grad_norm: 0.000307 | learning_rate: 0.000009 2025-04-12 16:17:38,401 - INFO - Epoch: 2.84 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000009 2025-04-12 16:17:44,572 - INFO - Epoch: 2.84 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000009 2025-04-12 16:17:50,920 - INFO - Epoch: 2.85 | loss: 0.000100 | grad_norm: 0.000446 | learning_rate: 0.000009 2025-04-12 16:17:56,909 - INFO - Epoch: 2.85 | loss: 0.000100 | grad_norm: 0.000464 | learning_rate: 0.000009 2025-04-12 16:18:02,980 - INFO - Epoch: 2.85 | loss: 0.000100 | grad_norm: 0.000396 | learning_rate: 0.000009 2025-04-12 16:18:09,309 - INFO - Epoch: 2.85 | loss: 0.000100 | grad_norm: 0.000413 | learning_rate: 0.000009 2025-04-12 16:18:15,544 - INFO - Epoch: 2.86 | loss: 0.000100 | grad_norm: 0.000518 | learning_rate: 0.000009 2025-04-12 16:18:21,768 - INFO - Epoch: 2.86 | loss: 0.000100 | grad_norm: 0.000568 | learning_rate: 0.000009 2025-04-12 16:18:28,091 - INFO - Epoch: 2.86 | loss: 0.000100 | grad_norm: 0.000564 | learning_rate: 0.000009 2025-04-12 16:18:34,301 - INFO - Epoch: 2.87 | loss: 0.000100 | grad_norm: 0.000256 | learning_rate: 0.000009 2025-04-12 16:18:40,307 - INFO - Epoch: 2.87 | loss: 0.000100 | grad_norm: 0.001129 | learning_rate: 0.000009 2025-04-12 16:18:46,490 - INFO - Epoch: 2.87 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000009 2025-04-12 16:18:53,051 - INFO - Epoch: 2.87 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000009 2025-04-12 16:18:59,252 - INFO - Epoch: 2.88 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000009 2025-04-12 16:19:05,612 - INFO - Epoch: 2.88 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000009 2025-04-12 16:19:11,736 - INFO - Epoch: 2.88 | loss: 0.000100 | grad_norm: 0.000348 | learning_rate: 0.000009 2025-04-12 16:19:18,143 - INFO - Epoch: 2.89 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000009 2025-04-12 16:19:24,436 - INFO - Epoch: 2.89 | loss: 0.000100 | grad_norm: 0.000274 | learning_rate: 0.000009 2025-04-12 16:19:30,210 - INFO - Epoch: 2.89 | loss: 0.000100 | grad_norm: 0.000507 | learning_rate: 0.000009 2025-04-12 16:19:36,288 - INFO - Epoch: 2.89 | loss: 0.000100 | grad_norm: 0.000252 | learning_rate: 0.000009 2025-04-12 16:19:42,846 - INFO - Epoch: 2.90 | loss: 0.000100 | grad_norm: 0.000485 | learning_rate: 0.000009 2025-04-12 16:19:49,050 - INFO - Epoch: 2.90 | loss: 0.000100 | grad_norm: 0.000371 | learning_rate: 0.000009 2025-04-12 16:19:55,460 - INFO - Epoch: 2.90 | loss: 0.000100 | grad_norm: 0.000236 | learning_rate: 0.000009 2025-04-12 16:20:01,211 - INFO - Epoch: 2.90 | loss: 0.000100 | grad_norm: 0.000611 | learning_rate: 0.000009 2025-04-12 16:20:07,138 - INFO - Epoch: 2.91 | loss: 0.000100 | grad_norm: 0.000296 | learning_rate: 0.000009 2025-04-12 16:20:12,961 - INFO - Epoch: 2.91 | loss: 0.000100 | grad_norm: 0.000457 | learning_rate: 0.000009 2025-04-12 16:20:19,530 - INFO - Epoch: 2.91 | loss: 0.000100 | grad_norm: 0.000286 | learning_rate: 0.000009 2025-04-12 16:20:25,412 - INFO - Epoch: 2.92 | loss: 0.000100 | grad_norm: 0.000499 | learning_rate: 0.000009 2025-04-12 16:20:31,353 - INFO - Epoch: 2.92 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000009 2025-04-12 16:20:37,638 - INFO - Epoch: 2.92 | loss: 0.000100 | grad_norm: 0.000461 | learning_rate: 0.000009 2025-04-12 16:20:44,031 - INFO - Epoch: 2.92 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000009 2025-04-12 16:20:50,097 - INFO - Epoch: 2.93 | loss: 0.000100 | grad_norm: 0.000455 | learning_rate: 0.000009 2025-04-12 16:20:56,227 - INFO - Epoch: 2.93 | loss: 0.000100 | grad_norm: 0.000543 | learning_rate: 0.000009 2025-04-12 16:21:02,233 - INFO - Epoch: 2.93 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000009 2025-04-12 16:21:08,627 - INFO - Epoch: 2.93 | loss: 0.000100 | grad_norm: 0.000320 | learning_rate: 0.000009 2025-04-12 16:21:14,632 - INFO - Epoch: 2.94 | loss: 0.000100 | grad_norm: 0.000328 | learning_rate: 0.000009 2025-04-12 16:21:20,544 - INFO - Epoch: 2.94 | loss: 0.000100 | grad_norm: 0.000510 | learning_rate: 0.000009 2025-04-12 16:21:26,541 - INFO - Epoch: 2.94 | loss: 0.000100 | grad_norm: 0.000220 | learning_rate: 0.000009 2025-04-12 16:21:32,972 - INFO - Epoch: 2.95 | loss: 0.000100 | grad_norm: 0.000509 | learning_rate: 0.000009 2025-04-12 16:21:39,238 - INFO - Epoch: 2.95 | loss: 0.000100 | grad_norm: 0.000440 | learning_rate: 0.000009 2025-04-12 16:21:45,363 - INFO - Epoch: 2.95 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000009 2025-04-12 16:21:51,300 - INFO - Epoch: 2.95 | loss: 0.000100 | grad_norm: 0.000212 | learning_rate: 0.000009 2025-04-12 16:21:57,342 - INFO - Epoch: 2.96 | loss: 0.000100 | grad_norm: 0.000471 | learning_rate: 0.000009 2025-04-12 16:22:03,538 - INFO - Epoch: 2.96 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000009 2025-04-12 16:22:09,823 - INFO - Epoch: 2.96 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000009 2025-04-12 16:22:15,689 - INFO - Epoch: 2.96 | loss: 0.000100 | grad_norm: 0.000511 | learning_rate: 0.000009 2025-04-12 16:22:21,506 - INFO - Epoch: 2.97 | loss: 0.000100 | grad_norm: 0.000583 | learning_rate: 0.000008 2025-04-12 16:22:27,690 - INFO - Epoch: 2.97 | loss: 0.000100 | grad_norm: 0.000674 | learning_rate: 0.000008 2025-04-12 16:22:33,555 - INFO - Epoch: 2.97 | loss: 0.000100 | grad_norm: 0.000870 | learning_rate: 0.000008 2025-04-12 16:22:39,766 - INFO - Epoch: 2.98 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000008 2025-04-12 16:22:46,091 - INFO - Epoch: 2.98 | loss: 0.000100 | grad_norm: 0.000266 | learning_rate: 0.000008 2025-04-12 16:22:51,830 - INFO - Epoch: 2.98 | loss: 0.000100 | grad_norm: 0.000344 | learning_rate: 0.000008 2025-04-12 16:22:57,803 - INFO - Epoch: 2.98 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000008 2025-04-12 16:23:04,091 - INFO - Epoch: 2.99 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000008 2025-04-12 16:23:10,230 - INFO - Epoch: 2.99 | loss: 0.000100 | grad_norm: 0.000409 | learning_rate: 0.000008 2025-04-12 16:23:16,370 - INFO - Epoch: 2.99 | loss: 0.000100 | grad_norm: 0.000307 | learning_rate: 0.000008 2025-04-12 16:23:22,152 - INFO - Epoch: 2.99 | loss: 0.000100 | grad_norm: 0.000383 | learning_rate: 0.000008 2025-04-12 16:23:28,303 - INFO - Epoch: 3.00 | loss: 0.000100 | grad_norm: 0.000402 | learning_rate: 0.000008 2025-04-12 16:23:34,081 - INFO - Epoch: 3.00 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000008 2025-04-12 16:23:44,100 - INFO - Epoch: 3.00 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000008 2025-04-12 16:23:50,014 - INFO - Epoch: 3.01 | loss: 0.000100 | grad_norm: 0.000424 | learning_rate: 0.000008 2025-04-12 16:23:56,069 - INFO - Epoch: 3.01 | loss: 0.000100 | grad_norm: 0.000250 | learning_rate: 0.000008 2025-04-12 16:24:02,215 - INFO - Epoch: 3.01 | loss: 0.000100 | grad_norm: 0.000219 | learning_rate: 0.000008 2025-04-12 16:24:08,534 - INFO - Epoch: 3.01 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000008 2025-04-12 16:24:14,659 - INFO - Epoch: 3.02 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000008 2025-04-12 16:24:20,541 - INFO - Epoch: 3.02 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000008 2025-04-12 16:24:26,868 - INFO - Epoch: 3.02 | loss: 0.000100 | grad_norm: 0.000376 | learning_rate: 0.000008 2025-04-12 16:24:33,012 - INFO - Epoch: 3.02 | loss: 0.000100 | grad_norm: 0.000381 | learning_rate: 0.000008 2025-04-12 16:24:39,286 - INFO - Epoch: 3.03 | loss: 0.000100 | grad_norm: 0.000509 | learning_rate: 0.000008 2025-04-12 16:24:44,963 - INFO - Epoch: 3.03 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000008 2025-04-12 16:24:51,191 - INFO - Epoch: 3.03 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000008 2025-04-12 16:24:57,333 - INFO - Epoch: 3.04 | loss: 0.000100 | grad_norm: 0.000298 | learning_rate: 0.000008 2025-04-12 16:25:03,533 - INFO - Epoch: 3.04 | loss: 0.000100 | grad_norm: 0.000500 | learning_rate: 0.000008 2025-04-12 16:25:09,870 - INFO - Epoch: 3.04 | loss: 0.000100 | grad_norm: 0.000266 | learning_rate: 0.000008 2025-04-12 16:25:15,781 - INFO - Epoch: 3.04 | loss: 0.000100 | grad_norm: 0.000255 | learning_rate: 0.000008 2025-04-12 16:25:21,930 - INFO - Epoch: 3.05 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000008 2025-04-12 16:25:27,871 - INFO - Epoch: 3.05 | loss: 0.000100 | grad_norm: 0.000403 | learning_rate: 0.000008 2025-04-12 16:25:34,031 - INFO - Epoch: 3.05 | loss: 0.000100 | grad_norm: 0.000254 | learning_rate: 0.000008 2025-04-12 16:25:40,301 - INFO - Epoch: 3.05 | loss: 0.000100 | grad_norm: 0.000270 | learning_rate: 0.000008 2025-04-12 16:25:46,495 - INFO - Epoch: 3.06 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000008 2025-04-12 16:25:52,506 - INFO - Epoch: 3.06 | loss: 0.000100 | grad_norm: 0.000480 | learning_rate: 0.000008 2025-04-12 16:25:58,897 - INFO - Epoch: 3.06 | loss: 0.000100 | grad_norm: 0.000357 | learning_rate: 0.000008 2025-04-12 16:26:04,701 - INFO - Epoch: 3.07 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000008 2025-04-12 16:26:10,901 - INFO - Epoch: 3.07 | loss: 0.000100 | grad_norm: 0.000457 | learning_rate: 0.000008 2025-04-12 16:26:16,860 - INFO - Epoch: 3.07 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000008 2025-04-12 16:26:23,274 - INFO - Epoch: 3.07 | loss: 0.000100 | grad_norm: 0.000480 | learning_rate: 0.000008 2025-04-12 16:26:29,277 - INFO - Epoch: 3.08 | loss: 0.000100 | grad_norm: 0.000424 | learning_rate: 0.000008 2025-04-12 16:26:35,704 - INFO - Epoch: 3.08 | loss: 0.000100 | grad_norm: 0.000594 | learning_rate: 0.000008 2025-04-12 16:26:42,191 - INFO - Epoch: 3.08 | loss: 0.000100 | grad_norm: 0.000535 | learning_rate: 0.000008 2025-04-12 16:26:48,390 - INFO - Epoch: 3.08 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000008 2025-04-12 16:26:54,749 - INFO - Epoch: 3.09 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000008 2025-04-12 16:27:00,613 - INFO - Epoch: 3.09 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000008 2025-04-12 16:27:07,059 - INFO - Epoch: 3.09 | loss: 0.000100 | grad_norm: 0.000728 | learning_rate: 0.000008 2025-04-12 16:27:13,307 - INFO - Epoch: 3.10 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000008 2025-04-12 16:27:19,496 - INFO - Epoch: 3.10 | loss: 0.000100 | grad_norm: 0.000343 | learning_rate: 0.000008 2025-04-12 16:27:25,648 - INFO - Epoch: 3.10 | loss: 0.000100 | grad_norm: 0.000464 | learning_rate: 0.000008 2025-04-12 16:27:32,247 - INFO - Epoch: 3.10 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000008 2025-04-12 16:27:38,232 - INFO - Epoch: 3.11 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000008 2025-04-12 16:27:44,550 - INFO - Epoch: 3.11 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000008 2025-04-12 16:27:50,705 - INFO - Epoch: 3.11 | loss: 0.000100 | grad_norm: 0.000329 | learning_rate: 0.000007 2025-04-12 16:27:56,720 - INFO - Epoch: 3.12 | loss: 0.000100 | grad_norm: 0.000523 | learning_rate: 0.000007 2025-04-12 16:28:03,065 - INFO - Epoch: 3.12 | loss: 0.000100 | grad_norm: 0.000295 | learning_rate: 0.000007 2025-04-12 16:28:09,141 - INFO - Epoch: 3.12 | loss: 0.000100 | grad_norm: 0.000344 | learning_rate: 0.000007 2025-04-12 16:28:15,095 - INFO - Epoch: 3.12 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000007 2025-04-12 16:28:21,152 - INFO - Epoch: 3.13 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000007 2025-04-12 16:28:27,116 - INFO - Epoch: 3.13 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000007 2025-04-12 16:28:33,326 - INFO - Epoch: 3.13 | loss: 0.000100 | grad_norm: 0.000262 | learning_rate: 0.000007 2025-04-12 16:28:39,100 - INFO - Epoch: 3.13 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000007 2025-04-12 16:28:45,120 - INFO - Epoch: 3.14 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000007 2025-04-12 16:28:51,174 - INFO - Epoch: 3.14 | loss: 0.000100 | grad_norm: 0.000712 | learning_rate: 0.000007 2025-04-12 16:28:57,564 - INFO - Epoch: 3.14 | loss: 0.000100 | grad_norm: 0.000243 | learning_rate: 0.000007 2025-04-12 16:29:03,995 - INFO - Epoch: 3.15 | loss: 0.000100 | grad_norm: 0.000358 | learning_rate: 0.000007 2025-04-12 16:29:10,171 - INFO - Epoch: 3.15 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000007 2025-04-12 16:29:16,510 - INFO - Epoch: 3.15 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000007 2025-04-12 16:29:22,574 - INFO - Epoch: 3.15 | loss: 0.000100 | grad_norm: 0.000314 | learning_rate: 0.000007 2025-04-12 16:29:29,063 - INFO - Epoch: 3.16 | loss: 0.000100 | grad_norm: 0.000357 | learning_rate: 0.000007 2025-04-12 16:29:35,224 - INFO - Epoch: 3.16 | loss: 0.000100 | grad_norm: 0.000384 | learning_rate: 0.000007 2025-04-12 16:29:41,284 - INFO - Epoch: 3.16 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000007 2025-04-12 16:29:47,565 - INFO - Epoch: 3.16 | loss: 0.000100 | grad_norm: 0.000271 | learning_rate: 0.000007 2025-04-12 16:29:54,107 - INFO - Epoch: 3.17 | loss: 0.000100 | grad_norm: 0.000360 | learning_rate: 0.000007 2025-04-12 16:30:00,419 - INFO - Epoch: 3.17 | loss: 0.000100 | grad_norm: 0.000551 | learning_rate: 0.000007 2025-04-12 16:30:06,662 - INFO - Epoch: 3.17 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000007 2025-04-12 16:30:13,050 - INFO - Epoch: 3.18 | loss: 0.000100 | grad_norm: 0.000823 | learning_rate: 0.000007 2025-04-12 16:30:19,101 - INFO - Epoch: 3.18 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000007 2025-04-12 16:30:25,356 - INFO - Epoch: 3.18 | loss: 0.000100 | grad_norm: 0.000405 | learning_rate: 0.000007 2025-04-12 16:30:31,452 - INFO - Epoch: 3.18 | loss: 0.000100 | grad_norm: 0.000332 | learning_rate: 0.000007 2025-04-12 16:30:37,633 - INFO - Epoch: 3.19 | loss: 0.000100 | grad_norm: 0.000269 | learning_rate: 0.000007 2025-04-12 16:30:44,170 - INFO - Epoch: 3.19 | loss: 0.000100 | grad_norm: 0.000704 | learning_rate: 0.000007 2025-04-12 16:30:50,425 - INFO - Epoch: 3.19 | loss: 0.000100 | grad_norm: 0.000537 | learning_rate: 0.000007 2025-04-12 16:30:56,680 - INFO - Epoch: 3.19 | loss: 0.000100 | grad_norm: 0.000301 | learning_rate: 0.000007 2025-04-12 16:31:03,027 - INFO - Epoch: 3.20 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000007 2025-04-12 16:31:09,293 - INFO - Epoch: 3.20 | loss: 0.000100 | grad_norm: 0.000332 | learning_rate: 0.000007 2025-04-12 16:31:15,545 - INFO - Epoch: 3.20 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000007 2025-04-12 16:31:21,705 - INFO - Epoch: 3.21 | loss: 0.000100 | grad_norm: 0.000436 | learning_rate: 0.000007 2025-04-12 16:31:27,890 - INFO - Epoch: 3.21 | loss: 0.000100 | grad_norm: 0.000315 | learning_rate: 0.000007 2025-04-12 16:31:33,742 - INFO - Epoch: 3.21 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000007 2025-04-12 16:31:39,797 - INFO - Epoch: 3.21 | loss: 0.000100 | grad_norm: 0.000509 | learning_rate: 0.000007 2025-04-12 16:31:45,806 - INFO - Epoch: 3.22 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000007 2025-04-12 16:31:52,043 - INFO - Epoch: 3.22 | loss: 0.000100 | grad_norm: 0.000653 | learning_rate: 0.000007 2025-04-12 16:31:58,020 - INFO - Epoch: 3.22 | loss: 0.000100 | grad_norm: 0.000486 | learning_rate: 0.000007 2025-04-12 16:32:04,019 - INFO - Epoch: 3.22 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000007 2025-04-12 16:32:09,931 - INFO - Epoch: 3.23 | loss: 0.000100 | grad_norm: 0.000301 | learning_rate: 0.000007 2025-04-12 16:32:15,958 - INFO - Epoch: 3.23 | loss: 0.000100 | grad_norm: 0.000424 | learning_rate: 0.000007 2025-04-12 16:32:21,705 - INFO - Epoch: 3.23 | loss: 0.000100 | grad_norm: 0.000484 | learning_rate: 0.000007 2025-04-12 16:32:27,853 - INFO - Epoch: 3.24 | loss: 0.000100 | grad_norm: 0.000290 | learning_rate: 0.000007 2025-04-12 16:32:33,935 - INFO - Epoch: 3.24 | loss: 0.000100 | grad_norm: 0.000285 | learning_rate: 0.000007 2025-04-12 16:32:39,992 - INFO - Epoch: 3.24 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000007 2025-04-12 16:32:45,964 - INFO - Epoch: 3.24 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000007 2025-04-12 16:32:52,308 - INFO - Epoch: 3.25 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000007 2025-04-12 16:32:58,534 - INFO - Epoch: 3.25 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000007 2025-04-12 16:33:04,656 - INFO - Epoch: 3.25 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000007 2025-04-12 16:33:10,903 - INFO - Epoch: 3.25 | loss: 0.000100 | grad_norm: 0.000575 | learning_rate: 0.000007 2025-04-12 16:33:16,661 - INFO - Epoch: 3.26 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000007 2025-04-12 16:33:22,971 - INFO - Epoch: 3.26 | loss: 0.000100 | grad_norm: 0.000313 | learning_rate: 0.000007 2025-04-12 16:33:29,146 - INFO - Epoch: 3.26 | loss: 0.000100 | grad_norm: 0.000451 | learning_rate: 0.000006 2025-04-12 16:33:35,251 - INFO - Epoch: 3.27 | loss: 0.000100 | grad_norm: 0.000501 | learning_rate: 0.000006 2025-04-12 16:33:41,267 - INFO - Epoch: 3.27 | loss: 0.000100 | grad_norm: 0.000369 | learning_rate: 0.000006 2025-04-12 16:33:47,383 - INFO - Epoch: 3.27 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000006 2025-04-12 16:33:53,518 - INFO - Epoch: 3.27 | loss: 0.000100 | grad_norm: 0.000398 | learning_rate: 0.000006 2025-04-12 16:33:59,566 - INFO - Epoch: 3.28 | loss: 0.000100 | grad_norm: 0.000341 | learning_rate: 0.000006 2025-04-12 16:34:05,579 - INFO - Epoch: 3.28 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000006 2025-04-12 16:34:11,658 - INFO - Epoch: 3.28 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000006 2025-04-12 16:34:17,855 - INFO - Epoch: 3.28 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000006 2025-04-12 16:34:24,289 - INFO - Epoch: 3.29 | loss: 0.000100 | grad_norm: 0.000357 | learning_rate: 0.000006 2025-04-12 16:34:30,910 - INFO - Epoch: 3.29 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000006 2025-04-12 16:34:36,862 - INFO - Epoch: 3.29 | loss: 0.000100 | grad_norm: 0.000841 | learning_rate: 0.000006 2025-04-12 16:34:43,027 - INFO - Epoch: 3.30 | loss: 0.000100 | grad_norm: 0.000782 | learning_rate: 0.000006 2025-04-12 16:34:49,195 - INFO - Epoch: 3.30 | loss: 0.000100 | grad_norm: 0.000669 | learning_rate: 0.000006 2025-04-12 16:34:55,303 - INFO - Epoch: 3.30 | loss: 0.000100 | grad_norm: 0.000533 | learning_rate: 0.000006 2025-04-12 16:35:01,621 - INFO - Epoch: 3.30 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000006 2025-04-12 16:35:07,619 - INFO - Epoch: 3.31 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000006 2025-04-12 16:35:14,002 - INFO - Epoch: 3.31 | loss: 0.000100 | grad_norm: 0.000307 | learning_rate: 0.000006 2025-04-12 16:35:20,435 - INFO - Epoch: 3.31 | loss: 0.000100 | grad_norm: 0.000415 | learning_rate: 0.000006 2025-04-12 16:35:26,704 - INFO - Epoch: 3.32 | loss: 0.000100 | grad_norm: 0.000589 | learning_rate: 0.000006 2025-04-12 16:35:33,090 - INFO - Epoch: 3.32 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000006 2025-04-12 16:35:38,907 - INFO - Epoch: 3.32 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000006 2025-04-12 16:35:45,086 - INFO - Epoch: 3.32 | loss: 0.000100 | grad_norm: 0.000331 | learning_rate: 0.000006 2025-04-12 16:35:50,800 - INFO - Epoch: 3.33 | loss: 0.000100 | grad_norm: 0.000519 | learning_rate: 0.000006 2025-04-12 16:35:56,943 - INFO - Epoch: 3.33 | loss: 0.000100 | grad_norm: 0.000270 | learning_rate: 0.000006 2025-04-12 16:36:03,342 - INFO - Epoch: 3.33 | loss: 0.000100 | grad_norm: 0.000491 | learning_rate: 0.000006 2025-04-12 16:36:09,609 - INFO - Epoch: 3.33 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000006 2025-04-12 16:36:15,736 - INFO - Epoch: 3.34 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000006 2025-04-12 16:36:21,840 - INFO - Epoch: 3.34 | loss: 0.000100 | grad_norm: 0.000413 | learning_rate: 0.000006 2025-04-12 16:36:28,168 - INFO - Epoch: 3.34 | loss: 0.000100 | grad_norm: 0.000259 | learning_rate: 0.000006 2025-04-12 16:36:34,606 - INFO - Epoch: 3.35 | loss: 0.000100 | grad_norm: 0.000693 | learning_rate: 0.000006 2025-04-12 16:36:40,481 - INFO - Epoch: 3.35 | loss: 0.000100 | grad_norm: 0.000514 | learning_rate: 0.000006 2025-04-12 16:36:46,907 - INFO - Epoch: 3.35 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000006 2025-04-12 16:36:53,112 - INFO - Epoch: 3.35 | loss: 0.000100 | grad_norm: 0.000457 | learning_rate: 0.000006 2025-04-12 16:36:59,139 - INFO - Epoch: 3.36 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000006 2025-04-12 16:37:05,074 - INFO - Epoch: 3.36 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000006 2025-04-12 16:37:11,111 - INFO - Epoch: 3.36 | loss: 0.000100 | grad_norm: 0.000321 | learning_rate: 0.000006 2025-04-12 16:37:17,300 - INFO - Epoch: 3.36 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000006 2025-04-12 16:37:23,441 - INFO - Epoch: 3.37 | loss: 0.000100 | grad_norm: 0.000311 | learning_rate: 0.000006 2025-04-12 16:37:29,609 - INFO - Epoch: 3.37 | loss: 0.000100 | grad_norm: 0.000609 | learning_rate: 0.000006 2025-04-12 16:37:35,762 - INFO - Epoch: 3.37 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000006 2025-04-12 16:37:41,465 - INFO - Epoch: 3.38 | loss: 0.000100 | grad_norm: 0.000889 | learning_rate: 0.000006 2025-04-12 16:37:47,750 - INFO - Epoch: 3.38 | loss: 0.000100 | grad_norm: 0.000291 | learning_rate: 0.000006 2025-04-12 16:37:53,524 - INFO - Epoch: 3.38 | loss: 0.000100 | grad_norm: 0.000632 | learning_rate: 0.000006 2025-04-12 16:37:59,527 - INFO - Epoch: 3.38 | loss: 0.000100 | grad_norm: 0.001224 | learning_rate: 0.000006 2025-04-12 16:38:05,923 - INFO - Epoch: 3.39 | loss: 0.000100 | grad_norm: 0.000304 | learning_rate: 0.000006 2025-04-12 16:38:12,026 - INFO - Epoch: 3.39 | loss: 0.000100 | grad_norm: 0.000275 | learning_rate: 0.000006 2025-04-12 16:38:18,013 - INFO - Epoch: 3.39 | loss: 0.000100 | grad_norm: 0.000312 | learning_rate: 0.000006 2025-04-12 16:38:24,195 - INFO - Epoch: 3.39 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000006 2025-04-12 16:38:30,519 - INFO - Epoch: 3.40 | loss: 0.000100 | grad_norm: 0.000479 | learning_rate: 0.000006 2025-04-12 16:38:36,635 - INFO - Epoch: 3.40 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000006 2025-04-12 16:38:42,660 - INFO - Epoch: 3.40 | loss: 0.000100 | grad_norm: 0.000710 | learning_rate: 0.000006 2025-04-12 16:38:48,910 - INFO - Epoch: 3.41 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000006 2025-04-12 16:38:54,938 - INFO - Epoch: 3.41 | loss: 0.000100 | grad_norm: 0.000315 | learning_rate: 0.000006 2025-04-12 16:39:01,215 - INFO - Epoch: 3.41 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000006 2025-04-12 16:39:07,186 - INFO - Epoch: 3.41 | loss: 0.000100 | grad_norm: 0.000618 | learning_rate: 0.000006 2025-04-12 16:39:13,301 - INFO - Epoch: 3.42 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000006 2025-04-12 16:39:19,524 - INFO - Epoch: 3.42 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000005 2025-04-12 16:39:25,579 - INFO - Epoch: 3.42 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000005 2025-04-12 16:39:31,871 - INFO - Epoch: 3.42 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000005 2025-04-12 16:39:37,861 - INFO - Epoch: 3.43 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000005 2025-04-12 16:39:43,991 - INFO - Epoch: 3.43 | loss: 0.000100 | grad_norm: 0.000484 | learning_rate: 0.000005 2025-04-12 16:39:50,268 - INFO - Epoch: 3.43 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000005 2025-04-12 16:39:56,170 - INFO - Epoch: 3.44 | loss: 0.000100 | grad_norm: 0.000236 | learning_rate: 0.000005 2025-04-12 16:40:02,182 - INFO - Epoch: 3.44 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000005 2025-04-12 16:40:08,238 - INFO - Epoch: 3.44 | loss: 0.000100 | grad_norm: 0.000312 | learning_rate: 0.000005 2025-04-12 16:40:14,447 - INFO - Epoch: 3.44 | loss: 0.000100 | grad_norm: 0.000362 | learning_rate: 0.000005 2025-04-12 16:40:20,744 - INFO - Epoch: 3.45 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000005 2025-04-12 16:40:26,523 - INFO - Epoch: 3.45 | loss: 0.000100 | grad_norm: 0.000381 | learning_rate: 0.000005 2025-04-12 16:40:32,426 - INFO - Epoch: 3.45 | loss: 0.000100 | grad_norm: 0.000509 | learning_rate: 0.000005 2025-04-12 16:40:38,201 - INFO - Epoch: 3.45 | loss: 0.000100 | grad_norm: 0.001008 | learning_rate: 0.000005 2025-04-12 16:40:44,094 - INFO - Epoch: 3.46 | loss: 0.000100 | grad_norm: 0.000490 | learning_rate: 0.000005 2025-04-12 16:40:50,239 - INFO - Epoch: 3.46 | loss: 0.000100 | grad_norm: 0.000345 | learning_rate: 0.000005 2025-04-12 16:40:56,377 - INFO - Epoch: 3.46 | loss: 0.000100 | grad_norm: 0.000245 | learning_rate: 0.000005 2025-04-12 16:41:02,650 - INFO - Epoch: 3.47 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000005 2025-04-12 16:41:08,978 - INFO - Epoch: 3.47 | loss: 0.000100 | grad_norm: 0.000487 | learning_rate: 0.000005 2025-04-12 16:41:14,778 - INFO - Epoch: 3.47 | loss: 0.000100 | grad_norm: 0.000377 | learning_rate: 0.000005 2025-04-12 16:41:20,769 - INFO - Epoch: 3.47 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000005 2025-04-12 16:41:27,043 - INFO - Epoch: 3.48 | loss: 0.000100 | grad_norm: 0.000554 | learning_rate: 0.000005 2025-04-12 16:41:33,231 - INFO - Epoch: 3.48 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000005 2025-04-12 16:41:39,099 - INFO - Epoch: 3.48 | loss: 0.000100 | grad_norm: 0.000485 | learning_rate: 0.000005 2025-04-12 16:41:44,972 - INFO - Epoch: 3.48 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000005 2025-04-12 16:41:51,216 - INFO - Epoch: 3.49 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000005 2025-04-12 16:41:56,935 - INFO - Epoch: 3.49 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000005 2025-04-12 16:42:03,294 - INFO - Epoch: 3.49 | loss: 0.000100 | grad_norm: 0.000304 | learning_rate: 0.000005 2025-04-12 16:42:09,518 - INFO - Epoch: 3.50 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000005 2025-04-12 16:42:15,583 - INFO - Epoch: 3.50 | loss: 0.000100 | grad_norm: 0.000542 | learning_rate: 0.000005 2025-04-12 16:42:21,690 - INFO - Epoch: 3.50 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000005 2025-04-12 16:42:27,723 - INFO - Epoch: 3.50 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000005 2025-04-12 16:42:33,661 - INFO - Epoch: 3.51 | loss: 0.000100 | grad_norm: 0.000266 | learning_rate: 0.000005 2025-04-12 16:42:39,791 - INFO - Epoch: 3.51 | loss: 0.000100 | grad_norm: 0.000300 | learning_rate: 0.000005 2025-04-12 16:42:45,896 - INFO - Epoch: 3.51 | loss: 0.000100 | grad_norm: 0.000488 | learning_rate: 0.000005 2025-04-12 16:42:51,991 - INFO - Epoch: 3.52 | loss: 0.000100 | grad_norm: 0.000282 | learning_rate: 0.000005 2025-04-12 16:42:57,842 - INFO - Epoch: 3.52 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000005 2025-04-12 16:43:03,611 - INFO - Epoch: 3.52 | loss: 0.000100 | grad_norm: 0.000308 | learning_rate: 0.000005 2025-04-12 16:43:09,801 - INFO - Epoch: 3.52 | loss: 0.000100 | grad_norm: 0.000250 | learning_rate: 0.000005 2025-04-12 16:43:15,879 - INFO - Epoch: 3.53 | loss: 0.000100 | grad_norm: 0.000307 | learning_rate: 0.000005 2025-04-12 16:43:21,858 - INFO - Epoch: 3.53 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000005 2025-04-12 16:43:28,174 - INFO - Epoch: 3.53 | loss: 0.000100 | grad_norm: 0.000418 | learning_rate: 0.000005 2025-04-12 16:43:34,357 - INFO - Epoch: 3.53 | loss: 0.000100 | grad_norm: 0.000445 | learning_rate: 0.000005 2025-04-12 16:43:40,523 - INFO - Epoch: 3.54 | loss: 0.000100 | grad_norm: 0.000412 | learning_rate: 0.000005 2025-04-12 16:43:47,045 - INFO - Epoch: 3.54 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000005 2025-04-12 16:43:53,049 - INFO - Epoch: 3.54 | loss: 0.000100 | grad_norm: 0.000313 | learning_rate: 0.000005 2025-04-12 16:43:59,245 - INFO - Epoch: 3.55 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000005 2025-04-12 16:44:05,337 - INFO - Epoch: 3.55 | loss: 0.000100 | grad_norm: 0.000343 | learning_rate: 0.000005 2025-04-12 16:44:11,593 - INFO - Epoch: 3.55 | loss: 0.000100 | grad_norm: 0.000196 | learning_rate: 0.000005 2025-04-12 16:44:17,503 - INFO - Epoch: 3.55 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000005 2025-04-12 16:44:24,069 - INFO - Epoch: 3.56 | loss: 0.000100 | grad_norm: 0.000265 | learning_rate: 0.000005 2025-04-12 16:44:30,278 - INFO - Epoch: 3.56 | loss: 0.000100 | grad_norm: 0.000274 | learning_rate: 0.000005 2025-04-12 16:44:36,343 - INFO - Epoch: 3.56 | loss: 0.000100 | grad_norm: 0.000407 | learning_rate: 0.000005 2025-04-12 16:44:42,551 - INFO - Epoch: 3.56 | loss: 0.000100 | grad_norm: 0.000594 | learning_rate: 0.000005 2025-04-12 16:44:48,842 - INFO - Epoch: 3.57 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000005 2025-04-12 16:44:54,978 - INFO - Epoch: 3.57 | loss: 0.000100 | grad_norm: 0.000313 | learning_rate: 0.000005 2025-04-12 16:45:01,126 - INFO - Epoch: 3.57 | loss: 0.000100 | grad_norm: 0.000388 | learning_rate: 0.000005 2025-04-12 16:45:07,370 - INFO - Epoch: 3.58 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000005 2025-04-12 16:45:13,431 - INFO - Epoch: 3.58 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000005 2025-04-12 16:45:19,388 - INFO - Epoch: 3.58 | loss: 0.000100 | grad_norm: 0.000270 | learning_rate: 0.000005 2025-04-12 16:45:25,044 - INFO - Epoch: 3.58 | loss: 0.000100 | grad_norm: 0.000550 | learning_rate: 0.000004 2025-04-12 16:45:31,216 - INFO - Epoch: 3.59 | loss: 0.000100 | grad_norm: 0.000421 | learning_rate: 0.000004 2025-04-12 16:45:37,416 - INFO - Epoch: 3.59 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000004 2025-04-12 16:45:43,173 - INFO - Epoch: 3.59 | loss: 0.000100 | grad_norm: 0.000301 | learning_rate: 0.000004 2025-04-12 16:45:49,379 - INFO - Epoch: 3.59 | loss: 0.000100 | grad_norm: 0.000719 | learning_rate: 0.000004 2025-04-12 16:45:55,152 - INFO - Epoch: 3.60 | loss: 0.000100 | grad_norm: 0.000676 | learning_rate: 0.000004 2025-04-12 16:46:01,446 - INFO - Epoch: 3.60 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000004 2025-04-12 16:46:07,591 - INFO - Epoch: 3.60 | loss: 0.000100 | grad_norm: 0.000398 | learning_rate: 0.000004 2025-04-12 16:46:13,892 - INFO - Epoch: 3.61 | loss: 0.000100 | grad_norm: 0.000432 | learning_rate: 0.000004 2025-04-12 16:46:19,792 - INFO - Epoch: 3.61 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000004 2025-04-12 16:46:25,635 - INFO - Epoch: 3.61 | loss: 0.000100 | grad_norm: 0.000263 | learning_rate: 0.000004 2025-04-12 16:46:31,697 - INFO - Epoch: 3.61 | loss: 0.000100 | grad_norm: 0.000439 | learning_rate: 0.000004 2025-04-12 16:46:38,182 - INFO - Epoch: 3.62 | loss: 0.000100 | grad_norm: 0.000478 | learning_rate: 0.000004 2025-04-12 16:46:44,470 - INFO - Epoch: 3.62 | loss: 0.000100 | grad_norm: 0.000282 | learning_rate: 0.000004 2025-04-12 16:46:50,418 - INFO - Epoch: 3.62 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000004 2025-04-12 16:46:56,409 - INFO - Epoch: 3.62 | loss: 0.000100 | grad_norm: 0.000250 | learning_rate: 0.000004 2025-04-12 16:47:02,873 - INFO - Epoch: 3.63 | loss: 0.000100 | grad_norm: 0.000345 | learning_rate: 0.000004 2025-04-12 16:47:09,023 - INFO - Epoch: 3.63 | loss: 0.000100 | grad_norm: 0.000283 | learning_rate: 0.000004 2025-04-12 16:47:15,064 - INFO - Epoch: 3.63 | loss: 0.000100 | grad_norm: 0.000389 | learning_rate: 0.000004 2025-04-12 16:47:21,293 - INFO - Epoch: 3.64 | loss: 0.000100 | grad_norm: 0.000524 | learning_rate: 0.000004 2025-04-12 16:47:27,281 - INFO - Epoch: 3.64 | loss: 0.000100 | grad_norm: 0.000351 | learning_rate: 0.000004 2025-04-12 16:47:33,164 - INFO - Epoch: 3.64 | loss: 0.000100 | grad_norm: 0.000384 | learning_rate: 0.000004 2025-04-12 16:47:39,459 - INFO - Epoch: 3.64 | loss: 0.000100 | grad_norm: 0.000255 | learning_rate: 0.000004 2025-04-12 16:47:45,854 - INFO - Epoch: 3.65 | loss: 0.000100 | grad_norm: 0.000384 | learning_rate: 0.000004 2025-04-12 16:47:52,171 - INFO - Epoch: 3.65 | loss: 0.000100 | grad_norm: 0.000323 | learning_rate: 0.000004 2025-04-12 16:47:58,128 - INFO - Epoch: 3.65 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000004 2025-04-12 16:48:03,884 - INFO - Epoch: 3.65 | loss: 0.000100 | grad_norm: 0.000597 | learning_rate: 0.000004 2025-04-12 16:48:10,494 - INFO - Epoch: 3.66 | loss: 0.000100 | grad_norm: 0.000493 | learning_rate: 0.000004 2025-04-12 16:48:16,510 - INFO - Epoch: 3.66 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000004 2025-04-12 16:48:22,505 - INFO - Epoch: 3.66 | loss: 0.000100 | grad_norm: 0.000358 | learning_rate: 0.000004 2025-04-12 16:48:28,885 - INFO - Epoch: 3.67 | loss: 0.000100 | grad_norm: 0.000433 | learning_rate: 0.000004 2025-04-12 16:48:34,929 - INFO - Epoch: 3.67 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000004 2025-04-12 16:48:41,167 - INFO - Epoch: 3.67 | loss: 0.000100 | grad_norm: 0.000254 | learning_rate: 0.000004 2025-04-12 16:48:46,868 - INFO - Epoch: 3.67 | loss: 0.000100 | grad_norm: 0.000431 | learning_rate: 0.000004 2025-04-12 16:48:53,081 - INFO - Epoch: 3.68 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000004 2025-04-12 16:48:59,086 - INFO - Epoch: 3.68 | loss: 0.000100 | grad_norm: 0.000921 | learning_rate: 0.000004 2025-04-12 16:49:05,365 - INFO - Epoch: 3.68 | loss: 0.000100 | grad_norm: 0.000331 | learning_rate: 0.000004 2025-04-12 16:49:11,061 - INFO - Epoch: 3.69 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000004 2025-04-12 16:49:16,923 - INFO - Epoch: 3.69 | loss: 0.000100 | grad_norm: 0.000443 | learning_rate: 0.000004 2025-04-12 16:49:23,166 - INFO - Epoch: 3.69 | loss: 0.000100 | grad_norm: 0.000548 | learning_rate: 0.000004 2025-04-12 16:49:28,908 - INFO - Epoch: 3.69 | loss: 0.000100 | grad_norm: 0.000445 | learning_rate: 0.000004 2025-04-12 16:49:35,173 - INFO - Epoch: 3.70 | loss: 0.000100 | grad_norm: 0.000542 | learning_rate: 0.000004 2025-04-12 16:49:41,747 - INFO - Epoch: 3.70 | loss: 0.000100 | grad_norm: 0.000248 | learning_rate: 0.000004 2025-04-12 16:49:48,035 - INFO - Epoch: 3.70 | loss: 0.000100 | grad_norm: 0.000287 | learning_rate: 0.000004 2025-04-12 16:49:54,537 - INFO - Epoch: 3.70 | loss: 0.000100 | grad_norm: 0.000435 | learning_rate: 0.000004 2025-04-12 16:50:00,765 - INFO - Epoch: 3.71 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000004 2025-04-12 16:50:07,067 - INFO - Epoch: 3.71 | loss: 0.000100 | grad_norm: 0.000274 | learning_rate: 0.000004 2025-04-12 16:50:13,225 - INFO - Epoch: 3.71 | loss: 0.000100 | grad_norm: 0.000272 | learning_rate: 0.000004 2025-04-12 16:50:19,309 - INFO - Epoch: 3.72 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000004 2025-04-12 16:50:25,612 - INFO - Epoch: 3.72 | loss: 0.000100 | grad_norm: 0.000436 | learning_rate: 0.000004 2025-04-12 16:50:31,788 - INFO - Epoch: 3.72 | loss: 0.000100 | grad_norm: 0.000272 | learning_rate: 0.000004 2025-04-12 16:50:38,067 - INFO - Epoch: 3.72 | loss: 0.000100 | grad_norm: 0.000317 | learning_rate: 0.000004 2025-04-12 16:50:44,470 - INFO - Epoch: 3.73 | loss: 0.000100 | grad_norm: 0.000433 | learning_rate: 0.000004 2025-04-12 16:50:50,372 - INFO - Epoch: 3.73 | loss: 0.000100 | grad_norm: 0.000344 | learning_rate: 0.000004 2025-04-12 16:50:56,391 - INFO - Epoch: 3.73 | loss: 0.000100 | grad_norm: 0.000497 | learning_rate: 0.000004 2025-04-12 16:51:02,258 - INFO - Epoch: 3.73 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000004 2025-04-12 16:51:08,200 - INFO - Epoch: 3.74 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000004 2025-04-12 16:51:14,364 - INFO - Epoch: 3.74 | loss: 0.000100 | grad_norm: 0.000261 | learning_rate: 0.000004 2025-04-12 16:51:20,398 - INFO - Epoch: 3.74 | loss: 0.000100 | grad_norm: 0.000418 | learning_rate: 0.000004 2025-04-12 16:51:26,667 - INFO - Epoch: 3.75 | loss: 0.000100 | grad_norm: 0.000314 | learning_rate: 0.000004 2025-04-12 16:51:32,549 - INFO - Epoch: 3.75 | loss: 0.000100 | grad_norm: 0.000251 | learning_rate: 0.000004 2025-04-12 16:51:38,657 - INFO - Epoch: 3.75 | loss: 0.000100 | grad_norm: 0.000582 | learning_rate: 0.000004 2025-04-12 16:51:44,226 - INFO - Epoch: 3.75 | loss: 0.000100 | grad_norm: 0.000286 | learning_rate: 0.000004 2025-04-12 16:51:50,217 - INFO - Epoch: 3.76 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000004 2025-04-12 16:51:56,032 - INFO - Epoch: 3.76 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000004 2025-04-12 16:52:02,472 - INFO - Epoch: 3.76 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000004 2025-04-12 16:52:08,412 - INFO - Epoch: 3.76 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000003 2025-04-12 16:52:14,328 - INFO - Epoch: 3.77 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000003 2025-04-12 16:52:20,588 - INFO - Epoch: 3.77 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000003 2025-04-12 16:52:26,933 - INFO - Epoch: 3.77 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000003 2025-04-12 16:52:33,178 - INFO - Epoch: 3.78 | loss: 0.000100 | grad_norm: 0.000303 | learning_rate: 0.000003 2025-04-12 16:52:39,057 - INFO - Epoch: 3.78 | loss: 0.000100 | grad_norm: 0.000604 | learning_rate: 0.000003 2025-04-12 16:52:44,987 - INFO - Epoch: 3.78 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000003 2025-04-12 16:52:51,087 - INFO - Epoch: 3.78 | loss: 0.000100 | grad_norm: 0.000257 | learning_rate: 0.000003 2025-04-12 16:52:56,793 - INFO - Epoch: 3.79 | loss: 0.000100 | grad_norm: 0.000442 | learning_rate: 0.000003 2025-04-12 16:53:02,843 - INFO - Epoch: 3.79 | loss: 0.000100 | grad_norm: 0.000529 | learning_rate: 0.000003 2025-04-12 16:53:08,799 - INFO - Epoch: 3.79 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000003 2025-04-12 16:53:15,025 - INFO - Epoch: 3.79 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000003 2025-04-12 16:53:20,926 - INFO - Epoch: 3.80 | loss: 0.000100 | grad_norm: 0.000430 | learning_rate: 0.000003 2025-04-12 16:53:27,187 - INFO - Epoch: 3.80 | loss: 0.000100 | grad_norm: 0.000451 | learning_rate: 0.000003 2025-04-12 16:53:33,497 - INFO - Epoch: 3.80 | loss: 0.000100 | grad_norm: 0.000300 | learning_rate: 0.000003 2025-04-12 16:53:39,234 - INFO - Epoch: 3.81 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000003 2025-04-12 16:53:45,382 - INFO - Epoch: 3.81 | loss: 0.000100 | grad_norm: 0.000411 | learning_rate: 0.000003 2025-04-12 16:53:51,293 - INFO - Epoch: 3.81 | loss: 0.000100 | grad_norm: 0.000321 | learning_rate: 0.000003 2025-04-12 16:53:57,620 - INFO - Epoch: 3.81 | loss: 0.000100 | grad_norm: 0.000346 | learning_rate: 0.000003 2025-04-12 16:54:03,598 - INFO - Epoch: 3.82 | loss: 0.000100 | grad_norm: 0.000358 | learning_rate: 0.000003 2025-04-12 16:54:09,510 - INFO - Epoch: 3.82 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000003 2025-04-12 16:54:15,842 - INFO - Epoch: 3.82 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000003 2025-04-12 16:54:21,775 - INFO - Epoch: 3.82 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000003 2025-04-12 16:54:27,915 - INFO - Epoch: 3.83 | loss: 0.000100 | grad_norm: 0.000351 | learning_rate: 0.000003 2025-04-12 16:54:34,248 - INFO - Epoch: 3.83 | loss: 0.000100 | grad_norm: 0.000386 | learning_rate: 0.000003 2025-04-12 16:54:40,457 - INFO - Epoch: 3.83 | loss: 0.000100 | grad_norm: 0.000473 | learning_rate: 0.000003 2025-04-12 16:54:46,823 - INFO - Epoch: 3.84 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000003 2025-04-12 16:54:53,034 - INFO - Epoch: 3.84 | loss: 0.000100 | grad_norm: 0.000568 | learning_rate: 0.000003 2025-04-12 16:54:59,329 - INFO - Epoch: 3.84 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000003 2025-04-12 16:55:05,231 - INFO - Epoch: 3.84 | loss: 0.000100 | grad_norm: 0.000256 | learning_rate: 0.000003 2025-04-12 16:55:11,480 - INFO - Epoch: 3.85 | loss: 0.000100 | grad_norm: 0.000285 | learning_rate: 0.000003 2025-04-12 16:55:17,198 - INFO - Epoch: 3.85 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000003 2025-04-12 16:55:23,449 - INFO - Epoch: 3.85 | loss: 0.000100 | grad_norm: 0.000277 | learning_rate: 0.000003 2025-04-12 16:55:29,657 - INFO - Epoch: 3.85 | loss: 0.000100 | grad_norm: 0.000292 | learning_rate: 0.000003 2025-04-12 16:55:35,907 - INFO - Epoch: 3.86 | loss: 0.000100 | grad_norm: 0.000343 | learning_rate: 0.000003 2025-04-12 16:55:41,891 - INFO - Epoch: 3.86 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000003 2025-04-12 16:55:47,897 - INFO - Epoch: 3.86 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000003 2025-04-12 16:55:53,910 - INFO - Epoch: 3.87 | loss: 0.000100 | grad_norm: 0.000490 | learning_rate: 0.000003 2025-04-12 16:56:00,111 - INFO - Epoch: 3.87 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000003 2025-04-12 16:56:06,258 - INFO - Epoch: 3.87 | loss: 0.000100 | grad_norm: 0.000458 | learning_rate: 0.000003 2025-04-12 16:56:12,252 - INFO - Epoch: 3.87 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000003 2025-04-12 16:56:18,673 - INFO - Epoch: 3.88 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000003 2025-04-12 16:56:24,664 - INFO - Epoch: 3.88 | loss: 0.000100 | grad_norm: 0.000449 | learning_rate: 0.000003 2025-04-12 16:56:30,844 - INFO - Epoch: 3.88 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000003 2025-04-12 16:56:37,250 - INFO - Epoch: 3.89 | loss: 0.000100 | grad_norm: 0.000449 | learning_rate: 0.000003 2025-04-12 16:56:43,292 - INFO - Epoch: 3.89 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000003 2025-04-12 16:56:49,384 - INFO - Epoch: 3.89 | loss: 0.000100 | grad_norm: 0.000328 | learning_rate: 0.000003 2025-04-12 16:56:55,119 - INFO - Epoch: 3.89 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000003 2025-04-12 16:57:00,818 - INFO - Epoch: 3.90 | loss: 0.000100 | grad_norm: 0.000267 | learning_rate: 0.000003 2025-04-12 16:57:06,771 - INFO - Epoch: 3.90 | loss: 0.000100 | grad_norm: 0.000605 | learning_rate: 0.000003 2025-04-12 16:57:12,814 - INFO - Epoch: 3.90 | loss: 0.000100 | grad_norm: 0.000312 | learning_rate: 0.000003 2025-04-12 16:57:19,000 - INFO - Epoch: 3.90 | loss: 0.000100 | grad_norm: 0.000442 | learning_rate: 0.000003 2025-04-12 16:57:25,126 - INFO - Epoch: 3.91 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000003 2025-04-12 16:57:31,325 - INFO - Epoch: 3.91 | loss: 0.000100 | grad_norm: 0.000423 | learning_rate: 0.000003 2025-04-12 16:57:37,056 - INFO - Epoch: 3.91 | loss: 0.000100 | grad_norm: 0.000414 | learning_rate: 0.000003 2025-04-12 16:57:43,107 - INFO - Epoch: 3.92 | loss: 0.000100 | grad_norm: 0.000469 | learning_rate: 0.000003 2025-04-12 16:57:48,713 - INFO - Epoch: 3.92 | loss: 0.000100 | grad_norm: 0.000441 | learning_rate: 0.000003 2025-04-12 16:57:54,449 - INFO - Epoch: 3.92 | loss: 0.000100 | grad_norm: 0.000281 | learning_rate: 0.000003 2025-04-12 16:58:00,512 - INFO - Epoch: 3.92 | loss: 0.000100 | grad_norm: 0.000238 | learning_rate: 0.000003 2025-04-12 16:58:06,214 - INFO - Epoch: 3.93 | loss: 0.000100 | grad_norm: 0.000551 | learning_rate: 0.000003 2025-04-12 16:58:12,494 - INFO - Epoch: 3.93 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000003 2025-04-12 16:58:18,506 - INFO - Epoch: 3.93 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000003 2025-04-12 16:58:24,537 - INFO - Epoch: 3.93 | loss: 0.000100 | grad_norm: 0.000323 | learning_rate: 0.000003 2025-04-12 16:58:30,485 - INFO - Epoch: 3.94 | loss: 0.000100 | grad_norm: 0.000413 | learning_rate: 0.000003 2025-04-12 16:58:36,965 - INFO - Epoch: 3.94 | loss: 0.000100 | grad_norm: 0.000316 | learning_rate: 0.000003 2025-04-12 16:58:42,956 - INFO - Epoch: 3.94 | loss: 0.000100 | grad_norm: 0.000348 | learning_rate: 0.000003 2025-04-12 16:58:49,446 - INFO - Epoch: 3.95 | loss: 0.000100 | grad_norm: 0.000360 | learning_rate: 0.000003 2025-04-12 16:58:55,419 - INFO - Epoch: 3.95 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000003 2025-04-12 16:59:01,381 - INFO - Epoch: 3.95 | loss: 0.000100 | grad_norm: 0.000398 | learning_rate: 0.000003 2025-04-12 16:59:07,552 - INFO - Epoch: 3.95 | loss: 0.000100 | grad_norm: 0.000463 | learning_rate: 0.000003 2025-04-12 16:59:13,363 - INFO - Epoch: 3.96 | loss: 0.000100 | grad_norm: 0.000314 | learning_rate: 0.000003 2025-04-12 16:59:19,908 - INFO - Epoch: 3.96 | loss: 0.000100 | grad_norm: 0.000406 | learning_rate: 0.000003 2025-04-12 16:59:26,565 - INFO - Epoch: 3.96 | loss: 0.000100 | grad_norm: 0.000212 | learning_rate: 0.000003 2025-04-12 16:59:33,058 - INFO - Epoch: 3.96 | loss: 0.000100 | grad_norm: 0.000654 | learning_rate: 0.000002 2025-04-12 16:59:39,080 - INFO - Epoch: 3.97 | loss: 0.000100 | grad_norm: 0.000447 | learning_rate: 0.000002 2025-04-12 16:59:45,410 - INFO - Epoch: 3.97 | loss: 0.000100 | grad_norm: 0.000306 | learning_rate: 0.000002 2025-04-12 16:59:51,154 - INFO - Epoch: 3.97 | loss: 0.000100 | grad_norm: 0.000388 | learning_rate: 0.000002 2025-04-12 16:59:57,537 - INFO - Epoch: 3.98 | loss: 0.000100 | grad_norm: 0.000513 | learning_rate: 0.000002 2025-04-12 17:00:03,493 - INFO - Epoch: 3.98 | loss: 0.000100 | grad_norm: 0.000309 | learning_rate: 0.000002 2025-04-12 17:00:09,604 - INFO - Epoch: 3.98 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000002 2025-04-12 17:00:15,559 - INFO - Epoch: 3.98 | loss: 0.000100 | grad_norm: 0.000532 | learning_rate: 0.000002 2025-04-12 17:00:21,586 - INFO - Epoch: 3.99 | loss: 0.000100 | grad_norm: 0.000492 | learning_rate: 0.000002 2025-04-12 17:00:27,818 - INFO - Epoch: 3.99 | loss: 0.000100 | grad_norm: 0.000306 | learning_rate: 0.000002 2025-04-12 17:00:33,890 - INFO - Epoch: 3.99 | loss: 0.000100 | grad_norm: 0.000281 | learning_rate: 0.000002 2025-04-12 17:00:39,425 - INFO - Epoch: 3.99 | loss: 0.000100 | grad_norm: 0.000638 | learning_rate: 0.000002 2025-04-12 17:00:45,026 - INFO - Epoch: 4.00 | loss: 0.000100 | grad_norm: 0.000265 | learning_rate: 0.000002 2025-04-12 17:00:50,736 - INFO - Epoch: 4.00 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000002 2025-04-12 17:01:00,591 - INFO - Epoch: 4.00 | loss: 0.000100 | grad_norm: 0.000411 | learning_rate: 0.000002 2025-04-12 17:01:06,496 - INFO - Epoch: 4.01 | loss: 0.000100 | grad_norm: 0.000661 | learning_rate: 0.000002 2025-04-12 17:01:12,472 - INFO - Epoch: 4.01 | loss: 0.000100 | grad_norm: 0.000450 | learning_rate: 0.000002 2025-04-12 17:01:18,786 - INFO - Epoch: 4.01 | loss: 0.000100 | grad_norm: 0.000300 | learning_rate: 0.000002 2025-04-12 17:01:25,106 - INFO - Epoch: 4.01 | loss: 0.000100 | grad_norm: 0.000341 | learning_rate: 0.000002 2025-04-12 17:01:31,071 - INFO - Epoch: 4.02 | loss: 0.000100 | grad_norm: 0.000270 | learning_rate: 0.000002 2025-04-12 17:01:37,274 - INFO - Epoch: 4.02 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000002 2025-04-12 17:01:43,317 - INFO - Epoch: 4.02 | loss: 0.000100 | grad_norm: 0.000693 | learning_rate: 0.000002 2025-04-12 17:01:49,221 - INFO - Epoch: 4.02 | loss: 0.000100 | grad_norm: 0.000397 | learning_rate: 0.000002 2025-04-12 17:01:55,243 - INFO - Epoch: 4.03 | loss: 0.000100 | grad_norm: 0.000418 | learning_rate: 0.000002 2025-04-12 17:02:01,301 - INFO - Epoch: 4.03 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000002 2025-04-12 17:02:07,492 - INFO - Epoch: 4.03 | loss: 0.000100 | grad_norm: 0.000254 | learning_rate: 0.000002 2025-04-12 17:02:13,869 - INFO - Epoch: 4.04 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000002 2025-04-12 17:02:19,987 - INFO - Epoch: 4.04 | loss: 0.000100 | grad_norm: 0.000403 | learning_rate: 0.000002 2025-04-12 17:02:25,811 - INFO - Epoch: 4.04 | loss: 0.000100 | grad_norm: 0.000198 | learning_rate: 0.000002 2025-04-12 17:02:32,011 - INFO - Epoch: 4.04 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000002 2025-04-12 17:02:37,825 - INFO - Epoch: 4.05 | loss: 0.000100 | grad_norm: 0.000406 | learning_rate: 0.000002 2025-04-12 17:02:43,802 - INFO - Epoch: 4.05 | loss: 0.000100 | grad_norm: 0.000347 | learning_rate: 0.000002 2025-04-12 17:02:50,525 - INFO - Epoch: 4.05 | loss: 0.000100 | grad_norm: 0.000480 | learning_rate: 0.000002 2025-04-12 17:02:56,803 - INFO - Epoch: 4.05 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000002 2025-04-12 17:03:03,063 - INFO - Epoch: 4.06 | loss: 0.000100 | grad_norm: 0.000443 | learning_rate: 0.000002 2025-04-12 17:03:09,158 - INFO - Epoch: 4.06 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000002 2025-04-12 17:03:15,187 - INFO - Epoch: 4.06 | loss: 0.000100 | grad_norm: 0.000348 | learning_rate: 0.000002 2025-04-12 17:03:21,255 - INFO - Epoch: 4.07 | loss: 0.000100 | grad_norm: 0.000324 | learning_rate: 0.000002 2025-04-12 17:03:27,444 - INFO - Epoch: 4.07 | loss: 0.000100 | grad_norm: 0.000313 | learning_rate: 0.000002 2025-04-12 17:03:34,603 - INFO - Epoch: 4.07 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000002 2025-04-12 17:03:40,716 - INFO - Epoch: 4.07 | loss: 0.000100 | grad_norm: 0.000311 | learning_rate: 0.000002 2025-04-12 17:03:46,658 - INFO - Epoch: 4.08 | loss: 0.000100 | grad_norm: 0.000408 | learning_rate: 0.000002 2025-04-12 17:03:52,988 - INFO - Epoch: 4.08 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000002 2025-04-12 17:03:58,938 - INFO - Epoch: 4.08 | loss: 0.000100 | grad_norm: 0.000360 | learning_rate: 0.000002 2025-04-12 17:04:05,035 - INFO - Epoch: 4.08 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000002 2025-04-12 17:04:11,299 - INFO - Epoch: 4.09 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000002 2025-04-12 17:04:17,522 - INFO - Epoch: 4.09 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000002 2025-04-12 17:04:24,154 - INFO - Epoch: 4.09 | loss: 0.000100 | grad_norm: 0.000272 | learning_rate: 0.000002 2025-04-12 17:04:30,199 - INFO - Epoch: 4.10 | loss: 0.000100 | grad_norm: 0.000347 | learning_rate: 0.000002 2025-04-12 17:04:36,134 - INFO - Epoch: 4.10 | loss: 0.000100 | grad_norm: 0.000547 | learning_rate: 0.000002 2025-04-12 17:04:42,567 - INFO - Epoch: 4.10 | loss: 0.000100 | grad_norm: 0.000286 | learning_rate: 0.000002 2025-04-12 17:04:48,748 - INFO - Epoch: 4.10 | loss: 0.000100 | grad_norm: 0.000321 | learning_rate: 0.000002 2025-04-12 17:04:54,810 - INFO - Epoch: 4.11 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000002 2025-04-12 17:05:00,710 - INFO - Epoch: 4.11 | loss: 0.000100 | grad_norm: 0.001027 | learning_rate: 0.000002 2025-04-12 17:05:06,866 - INFO - Epoch: 4.11 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000002 2025-04-12 17:05:12,674 - INFO - Epoch: 4.12 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000002 2025-04-12 17:05:18,869 - INFO - Epoch: 4.12 | loss: 0.000100 | grad_norm: 0.000453 | learning_rate: 0.000002 2025-04-12 17:05:24,855 - INFO - Epoch: 4.12 | loss: 0.000100 | grad_norm: 0.000275 | learning_rate: 0.000002 2025-04-12 17:05:30,956 - INFO - Epoch: 4.12 | loss: 0.000100 | grad_norm: 0.000243 | learning_rate: 0.000002 2025-04-12 17:05:37,470 - INFO - Epoch: 4.13 | loss: 0.000100 | grad_norm: 0.000404 | learning_rate: 0.000002 2025-04-12 17:05:43,468 - INFO - Epoch: 4.13 | loss: 0.000100 | grad_norm: 0.000333 | learning_rate: 0.000002 2025-04-12 17:05:49,645 - INFO - Epoch: 4.13 | loss: 0.000100 | grad_norm: 0.000290 | learning_rate: 0.000002 2025-04-12 17:05:55,658 - INFO - Epoch: 4.13 | loss: 0.000100 | grad_norm: 0.000344 | learning_rate: 0.000002 2025-04-12 17:06:01,676 - INFO - Epoch: 4.14 | loss: 0.000100 | grad_norm: 0.000536 | learning_rate: 0.000002 2025-04-12 17:06:07,726 - INFO - Epoch: 4.14 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000002 2025-04-12 17:06:13,916 - INFO - Epoch: 4.14 | loss: 0.000100 | grad_norm: 0.000494 | learning_rate: 0.000002 2025-04-12 17:06:19,817 - INFO - Epoch: 4.15 | loss: 0.000100 | grad_norm: 0.000428 | learning_rate: 0.000002 2025-04-12 17:06:26,055 - INFO - Epoch: 4.15 | loss: 0.000100 | grad_norm: 0.000265 | learning_rate: 0.000002 2025-04-12 17:06:32,123 - INFO - Epoch: 4.15 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000002 2025-04-12 17:06:38,408 - INFO - Epoch: 4.15 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000002 2025-04-12 17:06:44,418 - INFO - Epoch: 4.16 | loss: 0.000100 | grad_norm: 0.000456 | learning_rate: 0.000002 2025-04-12 17:06:50,701 - INFO - Epoch: 4.16 | loss: 0.000100 | grad_norm: 0.000490 | learning_rate: 0.000002 2025-04-12 17:06:56,887 - INFO - Epoch: 4.16 | loss: 0.000100 | grad_norm: 0.000284 | learning_rate: 0.000002 2025-04-12 17:07:02,932 - INFO - Epoch: 4.16 | loss: 0.000100 | grad_norm: 0.000308 | learning_rate: 0.000002 2025-04-12 17:07:08,900 - INFO - Epoch: 4.17 | loss: 0.000100 | grad_norm: 0.000278 | learning_rate: 0.000002 2025-04-12 17:07:15,015 - INFO - Epoch: 4.17 | loss: 0.000100 | grad_norm: 0.000391 | learning_rate: 0.000002 2025-04-12 17:07:21,082 - INFO - Epoch: 4.17 | loss: 0.000100 | grad_norm: 0.000292 | learning_rate: 0.000002 2025-04-12 17:07:27,081 - INFO - Epoch: 4.18 | loss: 0.000100 | grad_norm: 0.000367 | learning_rate: 0.000002 2025-04-12 17:07:32,953 - INFO - Epoch: 4.18 | loss: 0.000100 | grad_norm: 0.000303 | learning_rate: 0.000002 2025-04-12 17:07:38,688 - INFO - Epoch: 4.18 | loss: 0.000100 | grad_norm: 0.000208 | learning_rate: 0.000002 2025-04-12 17:07:45,250 - INFO - Epoch: 4.18 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000002 2025-04-12 17:07:50,805 - INFO - Epoch: 4.19 | loss: 0.000100 | grad_norm: 0.000616 | learning_rate: 0.000002 2025-04-12 17:07:56,948 - INFO - Epoch: 4.19 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000002 2025-04-12 17:08:03,108 - INFO - Epoch: 4.19 | loss: 0.000100 | grad_norm: 0.000278 | learning_rate: 0.000002 2025-04-12 17:08:09,452 - INFO - Epoch: 4.19 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000002 2025-04-12 17:08:15,602 - INFO - Epoch: 4.20 | loss: 0.000100 | grad_norm: 0.000658 | learning_rate: 0.000002 2025-04-12 17:08:21,702 - INFO - Epoch: 4.20 | loss: 0.000100 | grad_norm: 0.000241 | learning_rate: 0.000002 2025-04-12 17:08:27,410 - INFO - Epoch: 4.20 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000002 2025-04-12 17:08:33,487 - INFO - Epoch: 4.21 | loss: 0.000100 | grad_norm: 0.000555 | learning_rate: 0.000001 2025-04-12 17:08:39,541 - INFO - Epoch: 4.21 | loss: 0.000100 | grad_norm: 0.000288 | learning_rate: 0.000001 2025-04-12 17:08:45,634 - INFO - Epoch: 4.21 | loss: 0.000100 | grad_norm: 0.000323 | learning_rate: 0.000001 2025-04-12 17:08:52,039 - INFO - Epoch: 4.21 | loss: 0.000100 | grad_norm: 0.000251 | learning_rate: 0.000001 2025-04-12 17:08:58,212 - INFO - Epoch: 4.22 | loss: 0.000100 | grad_norm: 0.000385 | learning_rate: 0.000001 2025-04-12 17:09:04,361 - INFO - Epoch: 4.22 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000001 2025-04-12 17:09:10,442 - INFO - Epoch: 4.22 | loss: 0.000100 | grad_norm: 0.000325 | learning_rate: 0.000001 2025-04-12 17:09:16,503 - INFO - Epoch: 4.22 | loss: 0.000100 | grad_norm: 0.000267 | learning_rate: 0.000001 2025-04-12 17:09:22,602 - INFO - Epoch: 4.23 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000001 2025-04-12 17:09:28,745 - INFO - Epoch: 4.23 | loss: 0.000100 | grad_norm: 0.000367 | learning_rate: 0.000001 2025-04-12 17:09:34,784 - INFO - Epoch: 4.23 | loss: 0.000100 | grad_norm: 0.000272 | learning_rate: 0.000001 2025-04-12 17:09:41,053 - INFO - Epoch: 4.24 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000001 2025-04-12 17:09:46,940 - INFO - Epoch: 4.24 | loss: 0.000100 | grad_norm: 0.000286 | learning_rate: 0.000001 2025-04-12 17:09:52,918 - INFO - Epoch: 4.24 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000001 2025-04-12 17:09:58,926 - INFO - Epoch: 4.24 | loss: 0.000100 | grad_norm: 0.000274 | learning_rate: 0.000001 2025-04-12 17:10:05,242 - INFO - Epoch: 4.25 | loss: 0.000100 | grad_norm: 0.000396 | learning_rate: 0.000001 2025-04-12 17:10:11,387 - INFO - Epoch: 4.25 | loss: 0.000100 | grad_norm: 0.000472 | learning_rate: 0.000001 2025-04-12 17:10:17,873 - INFO - Epoch: 4.25 | loss: 0.000100 | grad_norm: 0.000233 | learning_rate: 0.000001 2025-04-12 17:10:24,107 - INFO - Epoch: 4.25 | loss: 0.000100 | grad_norm: 0.000471 | learning_rate: 0.000001 2025-04-12 17:10:30,235 - INFO - Epoch: 4.26 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000001 2025-04-12 17:10:36,243 - INFO - Epoch: 4.26 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000001 2025-04-12 17:10:42,158 - INFO - Epoch: 4.26 | loss: 0.000100 | grad_norm: 0.000502 | learning_rate: 0.000001 2025-04-12 17:10:48,094 - INFO - Epoch: 4.27 | loss: 0.000100 | grad_norm: 0.000277 | learning_rate: 0.000001 2025-04-12 17:10:54,179 - INFO - Epoch: 4.27 | loss: 0.000100 | grad_norm: 0.000317 | learning_rate: 0.000001 2025-04-12 17:11:00,391 - INFO - Epoch: 4.27 | loss: 0.000100 | grad_norm: 0.000290 | learning_rate: 0.000001 2025-04-12 17:11:06,429 - INFO - Epoch: 4.27 | loss: 0.000100 | grad_norm: 0.000489 | learning_rate: 0.000001 2025-04-12 17:11:12,787 - INFO - Epoch: 4.28 | loss: 0.000100 | grad_norm: 0.000246 | learning_rate: 0.000001 2025-04-12 17:11:18,979 - INFO - Epoch: 4.28 | loss: 0.000100 | grad_norm: 0.000668 | learning_rate: 0.000001 2025-04-12 17:11:25,311 - INFO - Epoch: 4.28 | loss: 0.000100 | grad_norm: 0.000250 | learning_rate: 0.000001 2025-04-12 17:11:31,450 - INFO - Epoch: 4.28 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000001 2025-04-12 17:11:37,650 - INFO - Epoch: 4.29 | loss: 0.000100 | grad_norm: 0.000345 | learning_rate: 0.000001 2025-04-12 17:11:44,248 - INFO - Epoch: 4.29 | loss: 0.000100 | grad_norm: 0.000477 | learning_rate: 0.000001 2025-04-12 17:11:50,166 - INFO - Epoch: 4.29 | loss: 0.000100 | grad_norm: 0.000424 | learning_rate: 0.000001 2025-04-12 17:11:56,518 - INFO - Epoch: 4.30 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000001 2025-04-12 17:12:02,209 - INFO - Epoch: 4.30 | loss: 0.000100 | grad_norm: 0.000493 | learning_rate: 0.000001 2025-04-12 17:12:08,201 - INFO - Epoch: 4.30 | loss: 0.000100 | grad_norm: 0.000581 | learning_rate: 0.000001 2025-04-12 17:12:14,055 - INFO - Epoch: 4.30 | loss: 0.000100 | grad_norm: 0.000306 | learning_rate: 0.000001 2025-04-12 17:12:20,379 - INFO - Epoch: 4.31 | loss: 0.000100 | grad_norm: 0.000281 | learning_rate: 0.000001 2025-04-12 17:12:26,487 - INFO - Epoch: 4.31 | loss: 0.000100 | grad_norm: 0.000362 | learning_rate: 0.000001 2025-04-12 17:12:32,661 - INFO - Epoch: 4.31 | loss: 0.000100 | grad_norm: 0.000643 | learning_rate: 0.000001 2025-04-12 17:12:38,795 - INFO - Epoch: 4.32 | loss: 0.000100 | grad_norm: 0.000466 | learning_rate: 0.000001 2025-04-12 17:12:44,953 - INFO - Epoch: 4.32 | loss: 0.000100 | grad_norm: 0.000364 | learning_rate: 0.000001 2025-04-12 17:12:51,215 - INFO - Epoch: 4.32 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000001 2025-04-12 17:12:57,159 - INFO - Epoch: 4.32 | loss: 0.000100 | grad_norm: 0.000489 | learning_rate: 0.000001 2025-04-12 17:13:03,258 - INFO - Epoch: 4.33 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000001 2025-04-12 17:13:09,699 - INFO - Epoch: 4.33 | loss: 0.000100 | grad_norm: 0.000384 | learning_rate: 0.000001 2025-04-12 17:13:15,995 - INFO - Epoch: 4.33 | loss: 0.000100 | grad_norm: 0.000274 | learning_rate: 0.000001 2025-04-12 17:13:21,689 - INFO - Epoch: 4.33 | loss: 0.000100 | grad_norm: 0.000506 | learning_rate: 0.000001 2025-04-12 17:13:27,785 - INFO - Epoch: 4.34 | loss: 0.000100 | grad_norm: 0.000567 | learning_rate: 0.000001 2025-04-12 17:13:33,818 - INFO - Epoch: 4.34 | loss: 0.000100 | grad_norm: 0.000399 | learning_rate: 0.000001 2025-04-12 17:13:40,186 - INFO - Epoch: 4.34 | loss: 0.000100 | grad_norm: 0.000269 | learning_rate: 0.000001 2025-04-12 17:13:46,221 - INFO - Epoch: 4.35 | loss: 0.000100 | grad_norm: 0.000241 | learning_rate: 0.000001 2025-04-12 17:13:52,267 - INFO - Epoch: 4.35 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000001 2025-04-12 17:13:58,474 - INFO - Epoch: 4.35 | loss: 0.000100 | grad_norm: 0.000512 | learning_rate: 0.000001 2025-04-12 17:14:04,885 - INFO - Epoch: 4.35 | loss: 0.000100 | grad_norm: 0.000248 | learning_rate: 0.000001 2025-04-12 17:14:11,238 - INFO - Epoch: 4.36 | loss: 0.000100 | grad_norm: 0.000531 | learning_rate: 0.000001 2025-04-12 17:14:17,344 - INFO - Epoch: 4.36 | loss: 0.000100 | grad_norm: 0.000625 | learning_rate: 0.000001 2025-04-12 17:14:23,676 - INFO - Epoch: 4.36 | loss: 0.000100 | grad_norm: 0.000325 | learning_rate: 0.000001 2025-04-12 17:14:29,577 - INFO - Epoch: 4.36 | loss: 0.000100 | grad_norm: 0.000362 | learning_rate: 0.000001 2025-04-12 17:14:36,021 - INFO - Epoch: 4.37 | loss: 0.000100 | grad_norm: 0.000851 | learning_rate: 0.000001 2025-04-12 17:14:41,906 - INFO - Epoch: 4.37 | loss: 0.000100 | grad_norm: 0.000249 | learning_rate: 0.000001 2025-04-12 17:14:48,449 - INFO - Epoch: 4.37 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000001 2025-04-12 17:14:54,594 - INFO - Epoch: 4.38 | loss: 0.000100 | grad_norm: 0.000426 | learning_rate: 0.000001 2025-04-12 17:15:00,652 - INFO - Epoch: 4.38 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000001 2025-04-12 17:15:06,963 - INFO - Epoch: 4.38 | loss: 0.000100 | grad_norm: 0.000393 | learning_rate: 0.000001 2025-04-12 17:15:13,063 - INFO - Epoch: 4.38 | loss: 0.000100 | grad_norm: 0.000219 | learning_rate: 0.000001 2025-04-12 17:15:19,304 - INFO - Epoch: 4.39 | loss: 0.000100 | grad_norm: 0.000305 | learning_rate: 0.000001 2025-04-12 17:15:25,632 - INFO - Epoch: 4.39 | loss: 0.000100 | grad_norm: 0.000248 | learning_rate: 0.000001 2025-04-12 17:15:31,892 - INFO - Epoch: 4.39 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000001 2025-04-12 17:15:38,061 - INFO - Epoch: 4.39 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000001 2025-04-12 17:15:44,345 - INFO - Epoch: 4.40 | loss: 0.000100 | grad_norm: 0.000394 | learning_rate: 0.000001 2025-04-12 17:15:50,148 - INFO - Epoch: 4.40 | loss: 0.000100 | grad_norm: 0.000454 | learning_rate: 0.000001 2025-04-12 17:15:56,212 - INFO - Epoch: 4.40 | loss: 0.000100 | grad_norm: 0.000789 | learning_rate: 0.000001 2025-04-12 17:16:02,218 - INFO - Epoch: 4.41 | loss: 0.000100 | grad_norm: 0.000673 | learning_rate: 0.000001 2025-04-12 17:16:08,663 - INFO - Epoch: 4.41 | loss: 0.000100 | grad_norm: 0.000277 | learning_rate: 0.000001 2025-04-12 17:16:15,058 - INFO - Epoch: 4.41 | loss: 0.000100 | grad_norm: 0.000420 | learning_rate: 0.000001 2025-04-12 17:16:20,680 - INFO - Epoch: 4.41 | loss: 0.000100 | grad_norm: 0.000350 | learning_rate: 0.000001 2025-04-12 17:16:26,737 - INFO - Epoch: 4.42 | loss: 0.000100 | grad_norm: 0.000551 | learning_rate: 0.000001 2025-04-12 17:16:33,169 - INFO - Epoch: 4.42 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000001 2025-04-12 17:16:39,365 - INFO - Epoch: 4.42 | loss: 0.000100 | grad_norm: 0.000453 | learning_rate: 0.000001 2025-04-12 17:16:45,191 - INFO - Epoch: 4.42 | loss: 0.000100 | grad_norm: 0.000260 | learning_rate: 0.000001 2025-04-12 17:16:51,289 - INFO - Epoch: 4.43 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000001 2025-04-12 17:16:57,210 - INFO - Epoch: 4.43 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000001 2025-04-12 17:17:03,165 - INFO - Epoch: 4.43 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000001 2025-04-12 17:17:09,362 - INFO - Epoch: 4.44 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000001 2025-04-12 17:17:15,172 - INFO - Epoch: 4.44 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000001 2025-04-12 17:17:21,275 - INFO - Epoch: 4.44 | loss: 0.000100 | grad_norm: 0.000353 | learning_rate: 0.000001 2025-04-12 17:17:27,258 - INFO - Epoch: 4.44 | loss: 0.000100 | grad_norm: 0.000390 | learning_rate: 0.000001 2025-04-12 17:17:33,481 - INFO - Epoch: 4.45 | loss: 0.000100 | grad_norm: 0.000341 | learning_rate: 0.000001 2025-04-12 17:17:39,555 - INFO - Epoch: 4.45 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000001 2025-04-12 17:17:45,532 - INFO - Epoch: 4.45 | loss: 0.000100 | grad_norm: 0.000547 | learning_rate: 0.000001 2025-04-12 17:17:51,827 - INFO - Epoch: 4.45 | loss: 0.000100 | grad_norm: 0.000356 | learning_rate: 0.000001 2025-04-12 17:17:58,010 - INFO - Epoch: 4.46 | loss: 0.000100 | grad_norm: 0.000396 | learning_rate: 0.000001 2025-04-12 17:18:03,747 - INFO - Epoch: 4.46 | loss: 0.000100 | grad_norm: 0.000278 | learning_rate: 0.000001 2025-04-12 17:18:09,869 - INFO - Epoch: 4.46 | loss: 0.000100 | grad_norm: 0.000462 | learning_rate: 0.000001 2025-04-12 17:18:15,941 - INFO - Epoch: 4.47 | loss: 0.000100 | grad_norm: 0.000264 | learning_rate: 0.000001 2025-04-12 17:18:21,875 - INFO - Epoch: 4.47 | loss: 0.000100 | grad_norm: 0.000365 | learning_rate: 0.000001 2025-04-12 17:18:27,790 - INFO - Epoch: 4.47 | loss: 0.000100 | grad_norm: 0.000501 | learning_rate: 0.000001 2025-04-12 17:18:33,865 - INFO - Epoch: 4.47 | loss: 0.000100 | grad_norm: 0.000264 | learning_rate: 0.000001 2025-04-12 17:18:39,904 - INFO - Epoch: 4.48 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000001 2025-04-12 17:18:45,990 - INFO - Epoch: 4.48 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000001 2025-04-12 17:18:52,403 - INFO - Epoch: 4.48 | loss: 0.000100 | grad_norm: 0.000302 | learning_rate: 0.000001 2025-04-12 17:18:58,721 - INFO - Epoch: 4.48 | loss: 0.000100 | grad_norm: 0.000294 | learning_rate: 0.000001 2025-04-12 17:19:04,667 - INFO - Epoch: 4.49 | loss: 0.000100 | grad_norm: 0.000933 | learning_rate: 0.000001 2025-04-12 17:19:10,763 - INFO - Epoch: 4.49 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000001 2025-04-12 17:19:16,759 - INFO - Epoch: 4.49 | loss: 0.000100 | grad_norm: 0.000234 | learning_rate: 0.000001 2025-04-12 17:19:22,973 - INFO - Epoch: 4.50 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000001 2025-04-12 17:19:29,171 - INFO - Epoch: 4.50 | loss: 0.000100 | grad_norm: 0.000229 | learning_rate: 0.000001 2025-04-12 17:19:35,151 - INFO - Epoch: 4.50 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000001 2025-04-12 17:19:41,573 - INFO - Epoch: 4.50 | loss: 0.000100 | grad_norm: 0.000263 | learning_rate: 0.000001 2025-04-12 17:19:48,039 - INFO - Epoch: 4.51 | loss: 0.000100 | grad_norm: 0.000332 | learning_rate: 0.000001 2025-04-12 17:19:54,142 - INFO - Epoch: 4.51 | loss: 0.000100 | grad_norm: 0.000268 | learning_rate: 0.000001 2025-04-12 17:20:00,216 - INFO - Epoch: 4.51 | loss: 0.000100 | grad_norm: 0.000282 | learning_rate: 0.000001 2025-04-12 17:20:06,600 - INFO - Epoch: 4.52 | loss: 0.000100 | grad_norm: 0.000335 | learning_rate: 0.000001 2025-04-12 17:20:12,612 - INFO - Epoch: 4.52 | loss: 0.000100 | grad_norm: 0.000482 | learning_rate: 0.000001 2025-04-12 17:20:18,851 - INFO - Epoch: 4.52 | loss: 0.000100 | grad_norm: 0.000332 | learning_rate: 0.000001 2025-04-12 17:20:25,182 - INFO - Epoch: 4.52 | loss: 0.000100 | grad_norm: 0.000409 | learning_rate: 0.000001 2025-04-12 17:20:31,504 - INFO - Epoch: 4.53 | loss: 0.000100 | grad_norm: 0.000733 | learning_rate: 0.000001 2025-04-12 17:20:37,817 - INFO - Epoch: 4.53 | loss: 0.000100 | grad_norm: 0.000381 | learning_rate: 0.000001 2025-04-12 17:20:44,130 - INFO - Epoch: 4.53 | loss: 0.000100 | grad_norm: 0.000635 | learning_rate: 0.000001 2025-04-12 17:20:50,604 - INFO - Epoch: 4.53 | loss: 0.000100 | grad_norm: 0.000190 | learning_rate: 0.000001 2025-04-12 17:20:56,908 - INFO - Epoch: 4.54 | loss: 0.000100 | grad_norm: 0.000406 | learning_rate: 0.000001 2025-04-12 17:21:02,748 - INFO - Epoch: 4.54 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000001 2025-04-12 17:21:08,941 - INFO - Epoch: 4.54 | loss: 0.000100 | grad_norm: 0.000498 | learning_rate: 0.000001 2025-04-12 17:21:15,191 - INFO - Epoch: 4.55 | loss: 0.000100 | grad_norm: 0.000555 | learning_rate: 0.000000 2025-04-12 17:21:21,090 - INFO - Epoch: 4.55 | loss: 0.000100 | grad_norm: 0.000254 | learning_rate: 0.000000 2025-04-12 17:21:27,295 - INFO - Epoch: 4.55 | loss: 0.000100 | grad_norm: 0.000300 | learning_rate: 0.000000 2025-04-12 17:21:33,474 - INFO - Epoch: 4.55 | loss: 0.000100 | grad_norm: 0.000565 | learning_rate: 0.000000 2025-04-12 17:21:39,654 - INFO - Epoch: 4.56 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000000 2025-04-12 17:21:45,438 - INFO - Epoch: 4.56 | loss: 0.000100 | grad_norm: 0.000524 | learning_rate: 0.000000 2025-04-12 17:21:51,396 - INFO - Epoch: 4.56 | loss: 0.000100 | grad_norm: 0.000349 | learning_rate: 0.000000 2025-04-12 17:21:57,456 - INFO - Epoch: 4.56 | loss: 0.000100 | grad_norm: 0.000325 | learning_rate: 0.000000 2025-04-12 17:22:03,516 - INFO - Epoch: 4.57 | loss: 0.000100 | grad_norm: 0.000535 | learning_rate: 0.000000 2025-04-12 17:22:09,470 - INFO - Epoch: 4.57 | loss: 0.000100 | grad_norm: 0.000293 | learning_rate: 0.000000 2025-04-12 17:22:15,775 - INFO - Epoch: 4.57 | loss: 0.000100 | grad_norm: 0.000409 | learning_rate: 0.000000 2025-04-12 17:22:21,767 - INFO - Epoch: 4.58 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000000 2025-04-12 17:22:27,790 - INFO - Epoch: 4.58 | loss: 0.000100 | grad_norm: 0.000481 | learning_rate: 0.000000 2025-04-12 17:22:34,104 - INFO - Epoch: 4.58 | loss: 0.000100 | grad_norm: 0.000256 | learning_rate: 0.000000 2025-04-12 17:22:40,217 - INFO - Epoch: 4.58 | loss: 0.000100 | grad_norm: 0.000389 | learning_rate: 0.000000 2025-04-12 17:22:46,103 - INFO - Epoch: 4.59 | loss: 0.000100 | grad_norm: 0.000422 | learning_rate: 0.000000 2025-04-12 17:22:52,271 - INFO - Epoch: 4.59 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000000 2025-04-12 17:22:58,689 - INFO - Epoch: 4.59 | loss: 0.000100 | grad_norm: 0.000407 | learning_rate: 0.000000 2025-04-12 17:23:04,976 - INFO - Epoch: 4.59 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000000 2025-04-12 17:23:11,638 - INFO - Epoch: 4.60 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000000 2025-04-12 17:23:18,262 - INFO - Epoch: 4.60 | loss: 0.000100 | grad_norm: 0.000253 | learning_rate: 0.000000 2025-04-12 17:23:24,221 - INFO - Epoch: 4.60 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000000 2025-04-12 17:23:30,439 - INFO - Epoch: 4.61 | loss: 0.000100 | grad_norm: 0.000429 | learning_rate: 0.000000 2025-04-12 17:23:36,863 - INFO - Epoch: 4.61 | loss: 0.000100 | grad_norm: 0.000336 | learning_rate: 0.000000 2025-04-12 17:23:42,587 - INFO - Epoch: 4.61 | loss: 0.000100 | grad_norm: 0.000729 | learning_rate: 0.000000 2025-04-12 17:23:48,720 - INFO - Epoch: 4.61 | loss: 0.000100 | grad_norm: 0.000286 | learning_rate: 0.000000 2025-04-12 17:23:54,716 - INFO - Epoch: 4.62 | loss: 0.000100 | grad_norm: 0.000376 | learning_rate: 0.000000 2025-04-12 17:24:00,671 - INFO - Epoch: 4.62 | loss: 0.000100 | grad_norm: 0.000622 | learning_rate: 0.000000 2025-04-12 17:24:06,946 - INFO - Epoch: 4.62 | loss: 0.000100 | grad_norm: 0.000418 | learning_rate: 0.000000 2025-04-12 17:24:12,714 - INFO - Epoch: 4.62 | loss: 0.000100 | grad_norm: 0.000328 | learning_rate: 0.000000 2025-04-12 17:24:19,167 - INFO - Epoch: 4.63 | loss: 0.000100 | grad_norm: 0.000253 | learning_rate: 0.000000 2025-04-12 17:24:25,427 - INFO - Epoch: 4.63 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000000 2025-04-12 17:24:31,707 - INFO - Epoch: 4.63 | loss: 0.000100 | grad_norm: 0.000402 | learning_rate: 0.000000 2025-04-12 17:24:37,217 - INFO - Epoch: 4.64 | loss: 0.000100 | grad_norm: 0.000292 | learning_rate: 0.000000 2025-04-12 17:24:43,261 - INFO - Epoch: 4.64 | loss: 0.000100 | grad_norm: 0.000368 | learning_rate: 0.000000 2025-04-12 17:24:49,516 - INFO - Epoch: 4.64 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000000 2025-04-12 17:24:55,465 - INFO - Epoch: 4.64 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000000 2025-04-12 17:25:01,264 - INFO - Epoch: 4.65 | loss: 0.000100 | grad_norm: 0.000450 | learning_rate: 0.000000 2025-04-12 17:25:07,440 - INFO - Epoch: 4.65 | loss: 0.000100 | grad_norm: 0.000359 | learning_rate: 0.000000 2025-04-12 17:25:13,310 - INFO - Epoch: 4.65 | loss: 0.000100 | grad_norm: 0.000293 | learning_rate: 0.000000 2025-04-12 17:25:19,559 - INFO - Epoch: 4.65 | loss: 0.000100 | grad_norm: 0.000362 | learning_rate: 0.000000 2025-04-12 17:25:26,021 - INFO - Epoch: 4.66 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000000 2025-04-12 17:25:32,240 - INFO - Epoch: 4.66 | loss: 0.000100 | grad_norm: 0.000475 | learning_rate: 0.000000 2025-04-12 17:25:38,561 - INFO - Epoch: 4.66 | loss: 0.000100 | grad_norm: 0.000435 | learning_rate: 0.000000 2025-04-12 17:25:45,029 - INFO - Epoch: 4.67 | loss: 0.000100 | grad_norm: 0.000281 | learning_rate: 0.000000 2025-04-12 17:25:51,003 - INFO - Epoch: 4.67 | loss: 0.000100 | grad_norm: 0.000545 | learning_rate: 0.000000 2025-04-12 17:25:57,431 - INFO - Epoch: 4.67 | loss: 0.000100 | grad_norm: 0.000366 | learning_rate: 0.000000 2025-04-12 17:26:03,831 - INFO - Epoch: 4.67 | loss: 0.000100 | grad_norm: 0.000433 | learning_rate: 0.000000 2025-04-12 17:26:10,022 - INFO - Epoch: 4.68 | loss: 0.000100 | grad_norm: 0.000337 | learning_rate: 0.000000 2025-04-12 17:26:16,464 - INFO - Epoch: 4.68 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000000 2025-04-12 17:26:22,541 - INFO - Epoch: 4.68 | loss: 0.000100 | grad_norm: 0.000714 | learning_rate: 0.000000 2025-04-12 17:26:28,768 - INFO - Epoch: 4.69 | loss: 0.000100 | grad_norm: 0.000327 | learning_rate: 0.000000 2025-04-12 17:26:34,861 - INFO - Epoch: 4.69 | loss: 0.000100 | grad_norm: 0.000459 | learning_rate: 0.000000 2025-04-12 17:26:40,774 - INFO - Epoch: 4.69 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000000 2025-04-12 17:26:46,845 - INFO - Epoch: 4.69 | loss: 0.000100 | grad_norm: 0.000373 | learning_rate: 0.000000 2025-04-12 17:26:53,238 - INFO - Epoch: 4.70 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000000 2025-04-12 17:26:59,290 - INFO - Epoch: 4.70 | loss: 0.000100 | grad_norm: 0.000357 | learning_rate: 0.000000 2025-04-12 17:27:05,703 - INFO - Epoch: 4.70 | loss: 0.000100 | grad_norm: 0.000303 | learning_rate: 0.000000 2025-04-12 17:27:11,936 - INFO - Epoch: 4.70 | loss: 0.000100 | grad_norm: 0.000285 | learning_rate: 0.000000 2025-04-12 17:27:17,925 - INFO - Epoch: 4.71 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000000 2025-04-12 17:27:23,813 - INFO - Epoch: 4.71 | loss: 0.000100 | grad_norm: 0.000407 | learning_rate: 0.000000 2025-04-12 17:27:30,174 - INFO - Epoch: 4.71 | loss: 0.000100 | grad_norm: 0.000321 | learning_rate: 0.000000 2025-04-12 17:27:36,413 - INFO - Epoch: 4.72 | loss: 0.000100 | grad_norm: 0.000280 | learning_rate: 0.000000 2025-04-12 17:27:42,342 - INFO - Epoch: 4.72 | loss: 0.000100 | grad_norm: 0.000622 | learning_rate: 0.000000 2025-04-12 17:27:48,449 - INFO - Epoch: 4.72 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000000 2025-04-12 17:27:54,785 - INFO - Epoch: 4.72 | loss: 0.000100 | grad_norm: 0.000367 | learning_rate: 0.000000 2025-04-12 17:28:00,975 - INFO - Epoch: 4.73 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000000 2025-04-12 17:28:06,784 - INFO - Epoch: 4.73 | loss: 0.000100 | grad_norm: 0.000565 | learning_rate: 0.000000 2025-04-12 17:28:13,149 - INFO - Epoch: 4.73 | loss: 0.000100 | grad_norm: 0.000260 | learning_rate: 0.000000 2025-04-12 17:28:19,234 - INFO - Epoch: 4.73 | loss: 0.000100 | grad_norm: 0.000330 | learning_rate: 0.000000 2025-04-12 17:28:24,976 - INFO - Epoch: 4.74 | loss: 0.000100 | grad_norm: 0.000282 | learning_rate: 0.000000 2025-04-12 17:28:31,149 - INFO - Epoch: 4.74 | loss: 0.000100 | grad_norm: 0.000435 | learning_rate: 0.000000 2025-04-12 17:28:37,169 - INFO - Epoch: 4.74 | loss: 0.000100 | grad_norm: 0.000468 | learning_rate: 0.000000 2025-04-12 17:28:42,974 - INFO - Epoch: 4.75 | loss: 0.000100 | grad_norm: 0.000597 | learning_rate: 0.000000 2025-04-12 17:28:48,995 - INFO - Epoch: 4.75 | loss: 0.000100 | grad_norm: 0.000448 | learning_rate: 0.000000 2025-04-12 17:28:54,940 - INFO - Epoch: 4.75 | loss: 0.000100 | grad_norm: 0.000529 | learning_rate: 0.000000 2025-04-12 17:29:01,159 - INFO - Epoch: 4.75 | loss: 0.000100 | grad_norm: 0.000375 | learning_rate: 0.000000 2025-04-12 17:29:07,042 - INFO - Epoch: 4.76 | loss: 0.000100 | grad_norm: 0.000284 | learning_rate: 0.000000 2025-04-12 17:29:13,046 - INFO - Epoch: 4.76 | loss: 0.000100 | grad_norm: 0.000345 | learning_rate: 0.000000 2025-04-12 17:29:19,085 - INFO - Epoch: 4.76 | loss: 0.000100 | grad_norm: 0.000522 | learning_rate: 0.000000 2025-04-12 17:29:24,867 - INFO - Epoch: 4.76 | loss: 0.000100 | grad_norm: 0.000387 | learning_rate: 0.000000 2025-04-12 17:29:31,225 - INFO - Epoch: 4.77 | loss: 0.000100 | grad_norm: 0.000249 | learning_rate: 0.000000 2025-04-12 17:29:37,759 - INFO - Epoch: 4.77 | loss: 0.000100 | grad_norm: 0.000984 | learning_rate: 0.000000 2025-04-12 17:29:43,914 - INFO - Epoch: 4.77 | loss: 0.000100 | grad_norm: 0.000363 | learning_rate: 0.000000 2025-04-12 17:29:50,108 - INFO - Epoch: 4.78 | loss: 0.000100 | grad_norm: 0.000350 | learning_rate: 0.000000 2025-04-12 17:29:56,062 - INFO - Epoch: 4.78 | loss: 0.000100 | grad_norm: 0.000300 | learning_rate: 0.000000 2025-04-12 17:30:02,525 - INFO - Epoch: 4.78 | loss: 0.000100 | grad_norm: 0.000498 | learning_rate: 0.000000 2025-04-12 17:30:08,606 - INFO - Epoch: 4.78 | loss: 0.000100 | grad_norm: 0.000271 | learning_rate: 0.000000 2025-04-12 17:30:14,537 - INFO - Epoch: 4.79 | loss: 0.000100 | grad_norm: 0.000737 | learning_rate: 0.000000 2025-04-12 17:30:20,117 - INFO - Epoch: 4.79 | loss: 0.000100 | grad_norm: 0.000415 | learning_rate: 0.000000 2025-04-12 17:30:26,133 - INFO - Epoch: 4.79 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000000 2025-04-12 17:30:32,209 - INFO - Epoch: 4.79 | loss: 0.000100 | grad_norm: 0.000400 | learning_rate: 0.000000 2025-04-12 17:30:38,148 - INFO - Epoch: 4.80 | loss: 0.000100 | grad_norm: 0.000750 | learning_rate: 0.000000 2025-04-12 17:30:44,222 - INFO - Epoch: 4.80 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000000 2025-04-12 17:30:49,953 - INFO - Epoch: 4.80 | loss: 0.000100 | grad_norm: 0.000339 | learning_rate: 0.000000 2025-04-12 17:30:55,706 - INFO - Epoch: 4.81 | loss: 0.000100 | grad_norm: 0.000299 | learning_rate: 0.000000 2025-04-12 17:31:02,029 - INFO - Epoch: 4.81 | loss: 0.000100 | grad_norm: 0.000342 | learning_rate: 0.000000 2025-04-12 17:31:07,679 - INFO - Epoch: 4.81 | loss: 0.000100 | grad_norm: 0.000328 | learning_rate: 0.000000 2025-04-12 17:31:13,509 - INFO - Epoch: 4.81 | loss: 0.000100 | grad_norm: 0.000319 | learning_rate: 0.000000 2025-04-12 17:31:19,812 - INFO - Epoch: 4.82 | loss: 0.000100 | grad_norm: 0.000348 | learning_rate: 0.000000 2025-04-12 17:31:26,254 - INFO - Epoch: 4.82 | loss: 0.000100 | grad_norm: 0.000548 | learning_rate: 0.000000 2025-04-12 17:31:32,361 - INFO - Epoch: 4.82 | loss: 0.000100 | grad_norm: 0.000661 | learning_rate: 0.000000 2025-04-12 17:31:38,138 - INFO - Epoch: 4.82 | loss: 0.000100 | grad_norm: 0.000483 | learning_rate: 0.000000 2025-04-12 17:31:44,208 - INFO - Epoch: 4.83 | loss: 0.000100 | grad_norm: 0.000334 | learning_rate: 0.000000 2025-04-12 17:31:50,541 - INFO - Epoch: 4.83 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000000 2025-04-12 17:31:56,737 - INFO - Epoch: 4.83 | loss: 0.000100 | grad_norm: 0.000374 | learning_rate: 0.000000 2025-04-12 17:32:03,007 - INFO - Epoch: 4.84 | loss: 0.000100 | grad_norm: 0.000295 | learning_rate: 0.000000 2025-04-12 17:32:09,245 - INFO - Epoch: 4.84 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000000 2025-04-12 17:32:15,055 - INFO - Epoch: 4.84 | loss: 0.000100 | grad_norm: 0.000338 | learning_rate: 0.000000 2025-04-12 17:32:21,545 - INFO - Epoch: 4.84 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000000 2025-04-12 17:32:27,613 - INFO - Epoch: 4.85 | loss: 0.000100 | grad_norm: 0.000351 | learning_rate: 0.000000 2025-04-12 17:32:33,832 - INFO - Epoch: 4.85 | loss: 0.000100 | grad_norm: 0.000372 | learning_rate: 0.000000 2025-04-12 17:32:40,108 - INFO - Epoch: 4.85 | loss: 0.000100 | grad_norm: 0.000495 | learning_rate: 0.000000 2025-04-12 17:32:46,250 - INFO - Epoch: 4.85 | loss: 0.000100 | grad_norm: 0.000391 | learning_rate: 0.000000 2025-04-12 17:32:52,619 - INFO - Epoch: 4.86 | loss: 0.000100 | grad_norm: 0.000529 | learning_rate: 0.000000 2025-04-12 17:32:58,601 - INFO - Epoch: 4.86 | loss: 0.000100 | grad_norm: 0.000376 | learning_rate: 0.000000 2025-04-12 17:33:04,351 - INFO - Epoch: 4.86 | loss: 0.000100 | grad_norm: 0.000514 | learning_rate: 0.000000 2025-04-12 17:33:10,433 - INFO - Epoch: 4.87 | loss: 0.000100 | grad_norm: 0.000571 | learning_rate: 0.000000 2025-04-12 17:33:16,321 - INFO - Epoch: 4.87 | loss: 0.000100 | grad_norm: 0.000511 | learning_rate: 0.000000 2025-04-12 17:33:22,240 - INFO - Epoch: 4.87 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000000 2025-04-12 17:33:28,364 - INFO - Epoch: 4.87 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000000 2025-04-12 17:33:34,808 - INFO - Epoch: 4.88 | loss: 0.000100 | grad_norm: 0.000213 | learning_rate: 0.000000 2025-04-12 17:33:40,997 - INFO - Epoch: 4.88 | loss: 0.000100 | grad_norm: 0.000361 | learning_rate: 0.000000 2025-04-12 17:33:47,009 - INFO - Epoch: 4.88 | loss: 0.000100 | grad_norm: 0.000410 | learning_rate: 0.000000 2025-04-12 17:33:52,980 - INFO - Epoch: 4.89 | loss: 0.000100 | grad_norm: 0.000275 | learning_rate: 0.000000 2025-04-12 17:33:58,865 - INFO - Epoch: 4.89 | loss: 0.000100 | grad_norm: 0.000272 | learning_rate: 0.000000 2025-04-12 17:34:05,010 - INFO - Epoch: 4.89 | loss: 0.000100 | grad_norm: 0.000297 | learning_rate: 0.000000 2025-04-12 17:34:11,204 - INFO - Epoch: 4.89 | loss: 0.000100 | grad_norm: 0.000324 | learning_rate: 0.000000 2025-04-12 17:34:17,425 - INFO - Epoch: 4.90 | loss: 0.000100 | grad_norm: 0.000318 | learning_rate: 0.000000 2025-04-12 17:34:23,355 - INFO - Epoch: 4.90 | loss: 0.000100 | grad_norm: 0.000425 | learning_rate: 0.000000 2025-04-12 17:34:29,226 - INFO - Epoch: 4.90 | loss: 0.000100 | grad_norm: 0.000234 | learning_rate: 0.000000 2025-04-12 17:34:35,144 - INFO - Epoch: 4.90 | loss: 0.000100 | grad_norm: 0.000401 | learning_rate: 0.000000 2025-04-12 17:34:41,604 - INFO - Epoch: 4.91 | loss: 0.000100 | grad_norm: 0.000395 | learning_rate: 0.000000 2025-04-12 17:34:47,508 - INFO - Epoch: 4.91 | loss: 0.000100 | grad_norm: 0.000263 | learning_rate: 0.000000 2025-04-12 17:34:53,872 - INFO - Epoch: 4.91 | loss: 0.000100 | grad_norm: 0.000444 | learning_rate: 0.000000 2025-04-12 17:35:00,007 - INFO - Epoch: 4.92 | loss: 0.000100 | grad_norm: 0.000279 | learning_rate: 0.000000 2025-04-12 17:35:06,066 - INFO - Epoch: 4.92 | loss: 0.000100 | grad_norm: 0.000458 | learning_rate: 0.000000 2025-04-12 17:35:12,004 - INFO - Epoch: 4.92 | loss: 0.000100 | grad_norm: 0.000385 | learning_rate: 0.000000 2025-04-12 17:35:18,148 - INFO - Epoch: 4.92 | loss: 0.000100 | grad_norm: 0.000371 | learning_rate: 0.000000 2025-04-12 17:35:24,486 - INFO - Epoch: 4.93 | loss: 0.000100 | grad_norm: 0.000417 | learning_rate: 0.000000 2025-04-12 17:35:30,752 - INFO - Epoch: 4.93 | loss: 0.000100 | grad_norm: 0.000434 | learning_rate: 0.000000 2025-04-12 17:35:36,835 - INFO - Epoch: 4.93 | loss: 0.000100 | grad_norm: 0.000322 | learning_rate: 0.000000 2025-04-12 17:35:42,938 - INFO - Epoch: 4.93 | loss: 0.000100 | grad_norm: 0.000255 | learning_rate: 0.000000 2025-04-12 17:35:49,315 - INFO - Epoch: 4.94 | loss: 0.000100 | grad_norm: 0.000352 | learning_rate: 0.000000 2025-04-12 17:35:55,339 - INFO - Epoch: 4.94 | loss: 0.000100 | grad_norm: 0.000488 | learning_rate: 0.000000 2025-04-12 17:36:01,507 - INFO - Epoch: 4.94 | loss: 0.000100 | grad_norm: 0.000464 | learning_rate: 0.000000 2025-04-12 17:36:07,663 - INFO - Epoch: 4.95 | loss: 0.000100 | grad_norm: 0.000419 | learning_rate: 0.000000 2025-04-12 17:36:14,039 - INFO - Epoch: 4.95 | loss: 0.000100 | grad_norm: 0.000273 | learning_rate: 0.000000 2025-04-12 17:36:20,151 - INFO - Epoch: 4.95 | loss: 0.000100 | grad_norm: 0.000627 | learning_rate: 0.000000 2025-04-12 17:36:26,097 - INFO - Epoch: 4.95 | loss: 0.000100 | grad_norm: 0.000312 | learning_rate: 0.000000 2025-04-12 17:36:31,849 - INFO - Epoch: 4.96 | loss: 0.000100 | grad_norm: 0.000553 | learning_rate: 0.000000 2025-04-12 17:36:37,730 - INFO - Epoch: 4.96 | loss: 0.000100 | grad_norm: 0.000340 | learning_rate: 0.000000 2025-04-12 17:36:43,740 - INFO - Epoch: 4.96 | loss: 0.000100 | grad_norm: 0.000437 | learning_rate: 0.000000 2025-04-12 17:36:49,871 - INFO - Epoch: 4.96 | loss: 0.000100 | grad_norm: 0.000314 | learning_rate: 0.000000 2025-04-12 17:36:55,909 - INFO - Epoch: 4.97 | loss: 0.000100 | grad_norm: 0.000452 | learning_rate: 0.000000 2025-04-12 17:37:02,091 - INFO - Epoch: 4.97 | loss: 0.000100 | grad_norm: 0.000271 | learning_rate: 0.000000 2025-04-12 17:37:08,147 - INFO - Epoch: 4.97 | loss: 0.000100 | grad_norm: 0.000382 | learning_rate: 0.000000 2025-04-12 17:37:14,349 - INFO - Epoch: 4.98 | loss: 0.000100 | grad_norm: 0.000310 | learning_rate: 0.000000 2025-04-12 17:37:20,554 - INFO - Epoch: 4.98 | loss: 0.000100 | grad_norm: 0.000393 | learning_rate: 0.000000 2025-04-12 17:37:26,471 - INFO - Epoch: 4.98 | loss: 0.000100 | grad_norm: 0.000405 | learning_rate: 0.000000 2025-04-12 17:37:32,777 - INFO - Epoch: 4.98 | loss: 0.000100 | grad_norm: 0.000572 | learning_rate: 0.000000 2025-04-12 17:37:38,391 - INFO - Epoch: 4.99 | loss: 0.000100 | grad_norm: 0.000234 | learning_rate: 0.000000 2025-04-12 17:37:44,502 - INFO - Epoch: 4.99 | loss: 0.000100 | grad_norm: 0.000276 | learning_rate: 0.000000 2025-04-12 17:37:50,909 - INFO - Epoch: 4.99 | loss: 0.000100 | grad_norm: 0.000269 | learning_rate: 0.000000 2025-04-12 17:37:57,006 - INFO - Epoch: 4.99 | loss: 0.000100 | grad_norm: 0.000599 | learning_rate: 0.000000 2025-04-12 17:38:03,050 - INFO - Epoch: 5.00 | loss: 0.000100 | grad_norm: 0.000355 | learning_rate: 0.000000 2025-04-12 17:38:09,346 - INFO - Epoch: 5.00 | train_runtime: 11173.360900 | train_samples_per_second: 13.065000 | train_steps_per_second: 1.633000 | total_flos: 0.000000 | train_loss: 0.000087 2025-04-12 17:38:09,427 - INFO - Training complete. Attempting to save final model... 2025-04-12 17:38:09,427 - INFO - Attempting standard model save... 2025-04-12 17:38:10,544 - INFO - Model successfully saved with standard method to gliner_finetuned_20250412_143141 2025-04-12 17:38:10,545 - INFO - Model successfully saved to gliner_finetuned_20250412_143141 2025-04-12 17:38:10,545 - INFO - Testing the saved model... 2025-04-12 17:38:10,545 - INFO - Testing the saved model... 2025-04-12 17:38:14,485 - INFO - Model loaded successfully with standard method 2025-04-12 17:38:14,486 - INFO - Running prediction on test text... 2025-04-12 17:38:16,236 - INFO - Predicted entities: 2025-04-12 17:38:16,236 - INFO - Ola Nordmann => PERSON 2025-04-12 17:38:16,236 - INFO - 15.04.2025 => DATE_TIME 2025-04-12 17:38:16,236 - INFO - Kari Hansen => PERSON 2025-04-12 17:38:16,236 - INFO - Storgata 123, Oslo => NO_ADDRESS 2025-04-12 17:38:16,236 - INFO - +47 98765432 => NO_PHONE_NUMBER 2025-04-12 17:38:16,236 - INFO - kari.hansen@example.no => EMAIL_ADDRESS 2025-04-12 17:38:16,236 - INFO - sesongallergi => HEALTH_INFO 2025-04-12 17:38:16,236 - INFO - Model testing completed successfully! 2025-04-12 17:38:16,371 - INFO - Performing detailed evaluation... 2025-04-12 17:38:16,371 - INFO - Performing detailed model evaluation... 2025-04-12 17:41:13,577 - INFO - Evaluation Results: 2025-04-12 17:41:13,577 - INFO - Entity Type Precision Recall F1 Score Support 2025-04-12 17:41:13,577 - INFO - AGE 0.0000 0.0000 0.0000 1 2025-04-12 17:41:13,577 - INFO - AGE_INFO 0.0000 0.0000 0.0000 1 2025-04-12 17:41:13,577 - INFO - ANIMAL_INFO 0.0000 0.0000 0.0000 0 2025-04-12 17:41:13,577 - INFO - BEHAVIORAL_PATTERN 0.0000 0.0000 0.0000 1584 2025-04-12 17:41:13,577 - INFO - CONTEXT_SENSITIVE 0.0000 0.0000 0.0000 2234 2025-04-12 17:41:13,577 - INFO - CRIMINAL_RECORD 0.0000 0.0000 0.0000 3567 2025-04-12 17:41:13,577 - INFO - DATE_TIME 0.0000 0.0000 0.0000 5790 2025-04-12 17:41:13,578 - INFO - ECONOMIC_STATUS 0.0000 0.0000 0.0000 1489 2025-04-12 17:41:13,578 - INFO - EMAIL_ADDRESS 0.0000 0.0000 0.0000 3099 2025-04-12 17:41:13,578 - INFO - EMPLOYMENT_INFO 0.0000 0.0000 0.0000 2130 2025-04-12 17:41:13,578 - INFO - FAMILY_RELATION 0.0000 0.0000 0.0000 2598 2025-04-12 17:41:13,578 - INFO - FINANCIAL_INFO 0.0000 0.0000 0.0000 2093 2025-04-12 17:41:13,578 - INFO - GOV_ID 0.0000 0.0000 0.0000 2211 2025-04-12 17:41:13,578 - INFO - HEALTH_INFO 0.0009 0.0006 0.0007 4879 2025-04-12 17:41:13,578 - INFO - IDENTIFIABLE_IMAGE 0.0008 0.0007 0.0008 1407 2025-04-12 17:41:13,578 - INFO - NO_ADDRESS 0.0000 0.0000 0.0000 4247 2025-04-12 17:41:13,578 - INFO - NO_PHONE_NUMBER 0.0000 0.0000 0.0000 3103 2025-04-12 17:41:13,578 - INFO - PERSON 0.0001 0.0001 0.0001 7990 2025-04-12 17:41:13,578 - INFO - POLITICAL_CASE 0.0000 0.0000 0.0000 1930 2025-04-12 17:41:13,578 - INFO - POSTAL_CODE 0.0043 0.0029 0.0035 690 2025-04-12 17:41:13,578 - INFO - SEXUAL_ORIENTATION 0.0083 0.0066 0.0073 1221 2025-04-12 17:41:13,578 - INFO - -------------------------------------------------------------------------------- 2025-04-12 17:41:13,578 - INFO - Overall 0.0004 0.0003 0.0003 52264 2025-04-12 17:41:13,578 - WARNING - Could not create confusion matrix visualization: No module named 'matplotlib' 2025-04-12 17:41:13,579 - INFO - Evaluation results saved to gliner_finetuned_20250412_143141/evaluation_results.json 2025-04-12 17:41:13,579 - INFO - Model testing completed successfully! 2025-04-12 17:41:13,579 - INFO - --------------------------------------- 2025-04-12 17:41:13,579 - INFO - Final Model Evaluation Test 2025-04-12 17:41:13,579 - INFO - --------------------------------------- 2025-04-12 17:41:17,204 - INFO - Running entity detection test... 2025-04-12 17:41:19,131 - INFO - Detected entities in test: 2025-04-12 17:41:19,131 - INFO - Ola Nordmann => PERSON (confidence: 0.965) 2025-04-12 17:41:19,131 - INFO - 22 40 00 00 => NO_PHONE_NUMBER (confidence: 0.980) 2025-04-12 17:41:19,131 - INFO - postmottak@mattilsynet.no => EMAIL_ADDRESS (confidence: 0.952) 2025-04-12 17:41:19,131 - INFO - Felles postmottak, Postboks 383 2381 Brumunddal => NO_ADDRESS (confidence: 0.650) 2025-04-12 17:41:19,131 - INFO - 22. februar 2025 => DATE_TIME (confidence: 0.953) 2025-04-12 17:41:19,131 - INFO - Training and evaluation completed. Model is ready for use. 2025-04-12 17:41:19,131 - INFO - =============================================== 2025-04-12 17:41:19,131 - INFO - GLiNER training process complete. Output directory: gliner_finetuned_20250412_143141 2025-04-12 17:41:19,131 - INFO - ================================================