--- library_name: transformers license: apache-2.0 base_model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b tags: - axolotl - generated_from_trainer datasets: - PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled - PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled - PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled - PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled - PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled - PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled - PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled - PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled - PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled - PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled model-index: - name: MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS results: [] --- # BeaverAI/MS-2501-DPE-QwQify-v0.1-24B Test model to try to give an existing model QwQ's thoughts. For this version it is ontop of [`PocketDoc/Dans-PersonalityEngine-V1.2.0-24b`](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.2.0-24b) (a jack of all trades instruct model), which was trained ontop of [`mistralai/Mistral-Small-24B-Base-2501`](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501). The prompt formatting and usage should be the same as with QwQ; Use ChatML, and remove the thinking from previous turns. If thoughts arent being generated automatically, add `\n` to the start of the assistant turn. It should follow previous model turns formatting. On first turns of the conversation you may need to regen a few times, and maybe edit the model responses for the first few turns to get it to your liking. ![image/png](https://i.imgur.com/pQmSCcN.png) ![image/png](https://i.imgur.com/EnULiEI.png) [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.8.0.dev0` ```yaml mlflow_tracking_uri: http://127.0.0.1:7860 mlflow_experiment_name: MS-2501-DPE-QwQify-v0.1-24B-LoRA # Hugging Face saving config hub_model_id: BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS hub_strategy: every_save # Model checkpointing config output_dir: ./Outputs/MS-2501-DPE-QwQify-v0.1-24B-LoRA resume_from_checkpoint: save_steps: 50 save_safetensors: true save_total_limit: 3 save_only_model: false # Model architecture config base_model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b model_type: MistralForCausalLM tokenizer_type: AutoTokenizer # Mixed precision training config bf16: true fp16: false tf32: false # Model loading config load_in_8bit: false load_in_4bit: false strict: false # Sequence config sequence_len: 8192 min_sample_len: 256 sample_packing: true eval_sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false # LoRA adapter config adapter: lora lora_model_dir: lora_r: 128 lora_alpha: 128 lora_dropout: 0.125 peft_layers_to_transform: peft_use_dora: peft_use_rslora: peft_layer_replication: lora_target_modules: - gate_proj - down_proj - up_proj - q_proj - v_proj - k_proj - o_proj lora_modules_to_save: # Fix uninitialized tokens (such as <|start_header_id|> on the base L3 models) fix_untrained_tokens: # Dataset config # https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/customchatml-regex-last-only.py datasets: - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled split: train type: customchatml-regex-last-only test_datasets: - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled split: test type: customchatml-regex-last-only val_set_size: 0 eval_strategy: steps eval_steps: 50 dataset_prepared_path: ./00-Tokenized-Datasets/MS-2501-DPE-QwQify-v0.1-24B-customchatml-regex-last-only shuffle_merged_datasets: true dataset_processes: # Training hyperparameters num_epochs: 2 gradient_accumulation_steps: 1 micro_batch_size: 8 # x4 GPUs = 32 eval_batch_size: 8 # x4 GPUs = 32 warmup_steps: 0 optimizer: came_pytorch optim_args: optim_target_modules: lr_scheduler: rex learning_rate: 2e-5 cosine_min_lr_ratio: loraplus_lr_ratio: loraplus_lr_embedding: weight_decay: 0.1 max_grad_norm: 1 logging_steps: 1 # Model optimization gradient_checkpointing: unsloth flash_attention: true plugins: - axolotl.integrations.liger.LigerPlugin - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin cut_cross_entropy: true liger_rope: true liger_rms_norm: true liger_layer_norm: true liger_glu_activation: true liger_cross_entropy: false liger_fused_linear_cross_entropy: false lora_mlp_kernel: false lora_qkv_kernel: false lora_o_kernel: false # DeepSpeed deepspeed: deepspeed_configs/zero3_bf16.json # Garbage Collection gc_steps: 1 # Debug config debug: true seed: 42 # Token config special_tokens: bos_token: "" eos_token: "<|im_end|>" pad_token: "" tokens: ```

# MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS This model is a fine-tuned version of [PocketDoc/Dans-PersonalityEngine-V1.2.0-24b](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.2.0-24b) on the PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled and the PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled datasets. It achieves the following results on the evaluation set: - Loss: 1.1949 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 32 - total_eval_batch_size: 32 - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 2.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.9925 | 0.0019 | 1 | 1.9225 | | 1.4228 | 0.0936 | 50 | 1.4329 | | 1.3473 | 0.1873 | 100 | 1.3722 | | 1.3259 | 0.2809 | 150 | 1.3414 | | 1.2795 | 0.3745 | 200 | 1.3199 | | 1.2817 | 0.4682 | 250 | 1.3029 | | 1.2365 | 0.5618 | 300 | 1.2910 | | 1.2134 | 0.6554 | 350 | 1.2803 | | 1.2655 | 0.7491 | 400 | 1.2700 | | 1.2297 | 0.8427 | 450 | 1.2614 | | 1.178 | 0.9363 | 500 | 1.2524 | | 1.1525 | 1.0300 | 550 | 1.2467 | | 1.1751 | 1.1236 | 600 | 1.2411 | | 1.216 | 1.2172 | 650 | 1.2366 | | 1.1706 | 1.3109 | 700 | 1.2302 | | 1.1363 | 1.4045 | 750 | 1.2256 | | 1.1563 | 1.4981 | 800 | 1.2194 | | 1.1559 | 1.5918 | 850 | 1.2147 | | 1.1263 | 1.6854 | 900 | 1.2090 | | 1.099 | 1.7790 | 950 | 1.2038 | | 1.1786 | 1.8727 | 1000 | 1.1994 | | 1.1057 | 1.9663 | 1050 | 1.1949 | ### Framework versions - PEFT 0.14.0 - Transformers 4.49.0 - Pytorch 2.6.0+cu124 - Datasets 3.2.0 - Tokenizers 0.21.1