[INFO|2025-04-21 17:36:49] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/config.json [INFO|2025-04-21 17:36:49] configuration_utils.py:771 >> Model config LlavaNextConfig { "_name_or_path": "llava-hf/llava-v1.6-mistral-7b-hf", "architectures": [ "LlavaNextForConditionalGeneration" ], "ignore_index": -100, "image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ], "image_seq_length": 576, "image_token_index": 32000, "model_type": "llava_next", "multimodal_projector_bias": true, "projector_hidden_act": "gelu", "text_config": { "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2", "architectures": [ "MistralForCausalLM" ], "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "torch_dtype": "bfloat16", "vocab_size": 32064 }, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.49.0", "use_image_newline_parameter": true, "vision_config": { "hidden_size": 1024, "image_size": 336, "intermediate_size": 4096, "model_type": "clip_vision_model", "num_attention_heads": 16, "num_hidden_layers": 24, "patch_size": 14, "projection_dim": 768, "vocab_size": 32000 }, "vision_feature_layer": -2, "vision_feature_select_strategy": "default", "vocab_size": 32064 } [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer.model [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer.json [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file added_tokens.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/added_tokens.json [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/special_tokens_map.json [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer_config.json [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None [INFO|2025-04-21 17:36:49] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-04-21 17:36:50] processing_utils.py:816 >> loading configuration file processor_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/processor_config.json [INFO|2025-04-21 17:36:50] image_processing_base.py:381 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/preprocessor_config.json [WARNING|2025-04-21 17:36:50] logging.py:329 >> Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. [INFO|2025-04-21 17:36:50] image_processing_base.py:434 >> Image processor LlavaNextImageProcessor { "aspect_ratio_setting": "anyres", "crop_size": { "height": 336, "width": 336 }, "do_center_crop": true, "do_convert_rgb": true, "do_normalize": true, "do_pad": true, "do_rescale": true, "do_resize": true, "image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ], "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "LlavaNextImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "processor_class": "LlavaNextProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "shortest_edge": 336 } } [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer.model [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer.json [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file added_tokens.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/added_tokens.json [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/special_tokens_map.json [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/tokenizer_config.json [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None [INFO|2025-04-21 17:36:50] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-04-21 17:36:50] processing_utils.py:816 >> loading configuration file processor_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/processor_config.json [INFO|2025-04-21 17:36:51] processing_utils.py:876 >> Processor LlavaNextProcessor: - image_processor: LlavaNextImageProcessor { "aspect_ratio_setting": "anyres", "crop_size": { "height": 336, "width": 336 }, "do_center_crop": true, "do_convert_rgb": true, "do_normalize": true, "do_pad": true, "do_rescale": true, "do_resize": true, "image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ], "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "LlavaNextImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "processor_class": "LlavaNextProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "shortest_edge": 336 } } - tokenizer: LlamaTokenizerFast(name_or_path='llava-hf/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'pad_token': '', 'image_token': ''}, clean_up_tokenization_spaces=False, added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 1: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 32000: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 32001: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), } ) { "image_token": "", "num_additional_image_tokens": 1, "patch_size": 14, "processor_class": "LlavaNextProcessor", "vision_feature_select_strategy": "default" } [INFO|2025-04-21 17:36:51] logging.py:157 >> Loading dataset MattCoddity/dockerNLcommands... [INFO|2025-04-21 17:36:53] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/config.json [INFO|2025-04-21 17:36:53] configuration_utils.py:771 >> Model config LlavaNextConfig { "_name_or_path": "llava-hf/llava-v1.6-mistral-7b-hf", "architectures": [ "LlavaNextForConditionalGeneration" ], "ignore_index": -100, "image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ], "image_seq_length": 576, "image_token_index": 32000, "model_type": "llava_next", "multimodal_projector_bias": true, "projector_hidden_act": "gelu", "text_config": { "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2", "architectures": [ "MistralForCausalLM" ], "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "torch_dtype": "bfloat16", "vocab_size": 32064 }, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.49.0", "use_image_newline_parameter": true, "vision_config": { "hidden_size": 1024, "image_size": 336, "intermediate_size": 4096, "model_type": "clip_vision_model", "num_attention_heads": 16, "num_hidden_layers": 24, "patch_size": 14, "projection_dim": 768, "vocab_size": 32000 }, "vision_feature_layer": -2, "vision_feature_select_strategy": "default", "vocab_size": 32064 } [INFO|2025-04-21 17:36:53] logging.py:157 >> Quantizing model to 4 bit with bitsandbytes. [INFO|2025-04-21 17:36:53] modeling_utils.py:3982 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/model.safetensors.index.json [INFO|2025-04-21 17:36:53] modeling_utils.py:1633 >> Instantiating LlavaNextForConditionalGeneration model under default dtype torch.float16. [INFO|2025-04-21 17:36:53] configuration_utils.py:1140 >> Generate config GenerationConfig {} [INFO|2025-04-21 17:36:54] modeling_utils.py:1633 >> Instantiating CLIPVisionModel model under default dtype torch.float16. [INFO|2025-04-21 17:36:54] modeling_utils.py:1633 >> Instantiating MistralForCausalLM model under default dtype torch.float16. [INFO|2025-04-21 17:36:54] configuration_utils.py:1140 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 } [INFO|2025-04-21 17:37:57] modeling_utils.py:4970 >> All model checkpoint weights were used when initializing LlavaNextForConditionalGeneration. [INFO|2025-04-21 17:37:57] modeling_utils.py:4978 >> All the weights of LlavaNextForConditionalGeneration were initialized from the model checkpoint at llava-hf/llava-v1.6-mistral-7b-hf. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaNextForConditionalGeneration for predictions without further training. [INFO|2025-04-21 17:37:58] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/generation_config.json [INFO|2025-04-21 17:37:58] configuration_utils.py:1140 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 } [INFO|2025-04-21 17:37:58] logging.py:157 >> Gradient checkpointing enabled. [INFO|2025-04-21 17:37:58] logging.py:157 >> Casting multimodal projector outputs in torch.float16. [INFO|2025-04-21 17:37:58] logging.py:157 >> Using torch SDPA for faster training and inference. [INFO|2025-04-21 17:37:58] logging.py:157 >> Upcasting trainable params to float32. [INFO|2025-04-21 17:37:58] logging.py:157 >> Fine-tuning method: LoRA [INFO|2025-04-21 17:37:58] logging.py:157 >> Found linear modules: q_proj,v_proj,k_proj,gate_proj,up_proj,o_proj,down_proj [INFO|2025-04-21 17:37:58] logging.py:157 >> Set vision model not trainable: ['vision_tower']. [INFO|2025-04-21 17:37:58] logging.py:157 >> Set multi model projector not trainable: multi_modal_projector. [INFO|2025-04-21 17:37:58] logging.py:157 >> trainable params: 20,971,520 || all params: 7,587,719,168 || trainable%: 0.2764 [INFO|2025-04-21 17:37:58] trainer.py:746 >> Using auto half precision backend [WARNING|2025-04-21 17:37:58] trainer.py:781 >> No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead. [INFO|2025-04-21 17:37:59] trainer.py:2405 >> ***** Running training ***** [INFO|2025-04-21 17:37:59] trainer.py:2406 >> Num examples = 2,415 [INFO|2025-04-21 17:37:59] trainer.py:2407 >> Num Epochs = 3 [INFO|2025-04-21 17:37:59] trainer.py:2408 >> Instantaneous batch size per device = 2 [INFO|2025-04-21 17:37:59] trainer.py:2411 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|2025-04-21 17:37:59] trainer.py:2412 >> Gradient Accumulation steps = 8 [INFO|2025-04-21 17:37:59] trainer.py:2413 >> Total optimization steps = 453 [INFO|2025-04-21 17:37:59] trainer.py:2414 >> Number of trainable parameters = 20,971,520 [WARNING|2025-04-21 17:38:00] logging.py:329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. [INFO|2025-04-21 17:42:39] logging.py:157 >> {'loss': 10.8058, 'learning_rate': 1.9998e-04, 'epoch': 0.03, 'throughput': 350.83} [INFO|2025-04-21 17:47:20] logging.py:157 >> {'loss': 2.5613, 'learning_rate': 1.9985e-04, 'epoch': 0.07, 'throughput': 350.51} [INFO|2025-04-21 17:52:02] logging.py:157 >> {'loss': 1.4579, 'learning_rate': 1.9959e-04, 'epoch': 0.10, 'throughput': 350.51} [INFO|2025-04-21 17:56:43] logging.py:157 >> {'loss': 1.1407, 'learning_rate': 1.9922e-04, 'epoch': 0.13, 'throughput': 350.55} [INFO|2025-04-21 18:01:24] logging.py:157 >> {'loss': 0.7902, 'learning_rate': 1.9873e-04, 'epoch': 0.17, 'throughput': 350.53} [INFO|2025-04-21 18:06:04] logging.py:157 >> {'loss': 0.9830, 'learning_rate': 1.9812e-04, 'epoch': 0.20, 'throughput': 350.53} [INFO|2025-04-21 18:10:45] logging.py:157 >> {'loss': 0.9299, 'learning_rate': 1.9739e-04, 'epoch': 0.23, 'throughput': 350.57} [INFO|2025-04-21 18:15:26] logging.py:157 >> {'loss': 0.6314, 'learning_rate': 1.9655e-04, 'epoch': 0.26, 'throughput': 350.66} [INFO|2025-04-21 18:20:06] logging.py:157 >> {'loss': 0.8311, 'learning_rate': 1.9559e-04, 'epoch': 0.30, 'throughput': 350.69} [INFO|2025-04-21 18:24:46] logging.py:157 >> {'loss': 0.5493, 'learning_rate': 1.9451e-04, 'epoch': 0.33, 'throughput': 350.73} [INFO|2025-04-21 18:29:27] logging.py:157 >> {'loss': 0.4694, 'learning_rate': 1.9332e-04, 'epoch': 0.36, 'throughput': 350.78} [INFO|2025-04-21 18:34:07] logging.py:157 >> {'loss': 0.5595, 'learning_rate': 1.9202e-04, 'epoch': 0.40, 'throughput': 350.83} [INFO|2025-04-21 18:38:47] logging.py:157 >> {'loss': 0.2787, 'learning_rate': 1.9061e-04, 'epoch': 0.43, 'throughput': 350.84} [INFO|2025-04-21 18:43:28] logging.py:157 >> {'loss': 0.5269, 'learning_rate': 1.8971e-04, 'epoch': 0.46, 'throughput': 350.83} [INFO|2025-04-21 18:48:09] logging.py:157 >> {'loss': 0.7782, 'learning_rate': 1.8812e-04, 'epoch': 0.50, 'throughput': 350.82} [INFO|2025-04-21 18:52:48] logging.py:157 >> {'loss': 0.5458, 'learning_rate': 1.8643e-04, 'epoch': 0.53, 'throughput': 350.83} [INFO|2025-04-21 18:57:27] logging.py:157 >> {'loss': 0.3148, 'learning_rate': 1.8463e-04, 'epoch': 0.56, 'throughput': 350.84} [INFO|2025-04-21 19:02:08] logging.py:157 >> {'loss': 0.3010, 'learning_rate': 1.8274e-04, 'epoch': 0.60, 'throughput': 350.83} [INFO|2025-04-21 19:06:49] logging.py:157 >> {'loss': 0.6369, 'learning_rate': 1.8074e-04, 'epoch': 0.63, 'throughput': 350.82} [INFO|2025-04-21 19:09:03] trainer.py:2657 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|2025-04-21 19:09:03] image_processing_base.py:261 >> Image processor saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/preprocessor_config.json [INFO|2025-04-21 19:09:03] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/tokenizer_config.json [INFO|2025-04-21 19:09:03] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/special_tokens_map.json [INFO|2025-04-21 19:09:03] processing_utils.py:638 >> chat template saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/chat_template.json [INFO|2025-04-21 19:09:03] processing_utils.py:644 >> processor saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/processor_config.json [INFO|2025-04-21 19:09:03] trainer.py:3942 >> Saving model checkpoint to saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28 [INFO|2025-04-21 19:09:03] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--llava-hf--llava-v1.6-mistral-7b-hf/snapshots/144bfb964d4eef1502a22af4c5ff20d0d4a94cc1/config.json [INFO|2025-04-21 19:09:03] configuration_utils.py:771 >> Model config LlavaNextConfig { "architectures": [ "LlavaNextForConditionalGeneration" ], "ignore_index": -100, "image_grid_pinpoints": [ [ 336, 672 ], [ 672, 336 ], [ 672, 672 ], [ 1008, 336 ], [ 336, 1008 ] ], "image_seq_length": 576, "image_token_index": 32000, "model_type": "llava_next", "multimodal_projector_bias": true, "projector_hidden_act": "gelu", "text_config": { "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2", "architectures": [ "MistralForCausalLM" ], "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "torch_dtype": "bfloat16", "vocab_size": 32064 }, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.49.0", "use_image_newline_parameter": true, "vision_config": { "hidden_size": 1024, "image_size": 336, "intermediate_size": 4096, "model_type": "clip_vision_model", "num_attention_heads": 16, "num_hidden_layers": 24, "patch_size": 14, "projection_dim": 768, "vocab_size": 32000 }, "vision_feature_layer": -2, "vision_feature_select_strategy": "default", "vocab_size": 32064 } [INFO|2025-04-21 19:09:04] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/tokenizer_config.json [INFO|2025-04-21 19:09:04] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/LLaVA-NeXT-Mistral-7B-Chat/lora/train_2025-04-21-17-35-28/special_tokens_map.json [WARNING|2025-04-21 19:09:04] logging.py:162 >> No metric eval_loss to plot. [WARNING|2025-04-21 19:09:04] logging.py:162 >> No metric eval_accuracy to plot. [INFO|2025-04-21 19:09:04] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}