Model does not know how to use <think></think>
anymore, so I guess just treat it like any other Samantha tune.
See axolotl config
axolotl version: 0.6.0
# Weights and Biases logging config
wandb_project: Qwen3-4B
wandb_entity:
wandb_watch:
wandb_name: Qwen3-Samantha-v0.1-4B-LoRA-run4
wandb_log_model:
# Model checkpointing config
output_dir: ./Outputs/Qwen3-Samantha-v0.1-4B-LoRA-run4
save_steps: 10
save_safetensors: true
save_total_limit: 2
save_only_model: true
# Model architecture config
base_model: Qwen/Qwen3-4B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
# Mixed precision training config
bf16: true
fp16: false
tf32: false
# Model loading config
load_in_8bit: false
load_in_4bit: false
strict: false
# Sequence config
sequence_len: 2048
min_sample_len:
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false
# LoRA adapter config
adapter: lora
lora_r: 64
lora_alpha: 64
lora_dropout: 0.125
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
# Fix uninitialized tokens (such as <|start_header_id|> on the base L3 models)
fix_untrained_tokens:
# Dataset config
datasets:
- path: digitalpipelines/samantha-1.1-uncensored
type: customchatml-regex
- path: lodrick-the-lafted/Samantha-Opus
type: customchatml-regex
test_datasets:
val_set_size: 0.05
eval_strategy: steps
eval_steps: 10
dataset_prepared_path: ./00-Tokenized-Datasets/Qwen3-Samantha-v0.1-4B-FFT-seed42
shuffle_merged_datasets: true
# Training hyperparameters
num_epochs: 2
gradient_accumulation_steps: 2
micro_batch_size: 16
eval_batch_size: 16
warmup_steps: 0
optimizer: came_pytorch
optim_args:
enable_stochastic_rounding: true
enable_cautious: true
enable_8bit: true
optim_target_modules:
lr_scheduler: rex
learning_rate: 1e-5
cosine_min_lr_ratio: 0.05
loraplus_lr_ratio:
loraplus_lr_embedding:
weight_decay: 0.1
max_grad_norm: 0.5
logging_steps: 1
# Model optimization
gradient_checkpointing: offload
sdp_attention: true
plugins:
- axolotl.integrations.liger.LigerPlugin
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_cross_entropy: false
liger_fused_linear_cross_entropy: false
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
# DeepSpeed
deepspeed:
# Garbage Collection
gc_steps: 1
# Debug config
debug: true
seed: 42
# Token config
special_tokens:
eos_token: "<|im_end|>"
pad_token: "<|endoftext|>"
tokens:
Qwen3-Samantha-v0.1-4B
This model is a fine-tuned version of Qwen/Qwen3-4B on the digitalpipelines/samantha-1.1-uncensored and the lodrick-the-lafted/Samantha-Opus datasets. It achieves the following results on the evaluation set:
- Loss: 1.2390
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.3445 | 0.0087 | 1 | 2.0134 |
1.4708 | 0.0866 | 10 | 1.3584 |
1.2713 | 0.1732 | 20 | 1.3324 |
1.3535 | 0.2597 | 30 | 1.3123 |
1.1763 | 0.3463 | 40 | 1.2995 |
1.2369 | 0.4329 | 50 | 1.2897 |
1.3555 | 0.5195 | 60 | 1.2847 |
1.2712 | 0.6061 | 70 | 1.2751 |
1.1881 | 0.6926 | 80 | 1.2679 |
1.2572 | 0.7792 | 90 | 1.2626 |
1.2649 | 0.8658 | 100 | 1.2596 |
1.326 | 0.9524 | 110 | 1.2567 |
1.1402 | 1.0346 | 120 | 1.2587 |
1.1053 | 1.1212 | 130 | 1.2570 |
1.1195 | 1.2078 | 140 | 1.2540 |
1.2079 | 1.2944 | 150 | 1.2514 |
1.1103 | 1.3810 | 160 | 1.2491 |
1.1749 | 1.4675 | 170 | 1.2486 |
1.1748 | 1.5541 | 180 | 1.2465 |
1.0526 | 1.6407 | 190 | 1.2438 |
1.0474 | 1.7273 | 200 | 1.2435 |
1.0655 | 1.8139 | 210 | 1.2411 |
1.0812 | 1.9004 | 220 | 1.2400 |
1.0806 | 1.9870 | 230 | 1.2390 |
Framework versions
- PEFT 0.14.0
- Transformers 4.51.3
- Pytorch 2.8.0.dev20250502+rocm6.3
- Datasets 3.3.1
- Tokenizers 0.21.0
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support