See axolotl config

axolotl version: 0.9.0

seed: 42
# 学習のベースモデルに関する設定
base_model: meta-llama/Llama-3.2-1B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# 学習後のモデルのHFへのアップロードに関する設定
# hub_model_id: kajuma/Llama-3.2-1B-inst
# hub_strategy: "end"
# push_dataset_to_hub:
# hf_use_auth_token: true

# Liger Kernelの設定（学習の軽量・高速化）
plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_cross_entropy: false
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

# 量子化に関する設定
load_in_8bit: false
load_in_4bit: false

# SFTに利用するchat templateの設定
chat_template: llama3
# chat_template_jinja: "{{- bos_token }}\n{%- if custom_tools is defined %}\n    {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n    {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n    {%- if strftime_now is defined %}\n        {%- set date_string = strftime_now(\"%d %b %Y\") %}\n    {%- else %}\n        {%- set date_string = \"26 Jul 2024\" %}\n    {%- endif %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content']|trim %}\n    {%- set messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"\" %}\n{%- endif %}\n\n{#- System message #}\n{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{%- if tools is not none %}\n    {{- \"Environment: ipython\\n\" }}\n{%- endif %}\n{{- \"Cutting Knowledge Date: December 2023\\n\" }}\n{{- \"Today Date: \" + date_string + \"\\n\\n\" }}\n{%- if tools is not none and not tools_in_user_message %}\n    {{- \"You have access to the following functions. To call a function, please respond with JSON for a function call.\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n{%- endif %}\n{{- system_message }}\n{{- \"<|eot_id|>\" }}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n    {#- Extract the first user message so we can plug it in here #}\n    {%- if messages | length != 0 %}\n        {%- set first_user_message = messages[0]['content']|trim %}\n        {%- set messages = messages[1:] %}\n    {%- else %}\n        {{- raise_exception(\"Cannot put tools in the first user message when there's no first user message!\") }}\n{%- endif %}\n    {{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}\n    {{- \"Given the following functions, please respond with a JSON for a function call \" }}\n    {{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n    {{- first_user_message + \"<|eot_id|>\"}}\n{%- endif %}\n\n{%- for message in messages %}\n    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}\n        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}\n    {%- elif 'tool_calls' in message %}\n        {%- if not message.tool_calls|length == 1 %}\n            {{- raise_exception(\"This model only supports single tool-calls at once!\") }}\n        {%- endif %}\n        {%- set tool_call = message.tool_calls[0].function %}\n        {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n        {{- '{\"name\": \"' + tool_call.name + '\", ' }}\n        {{- '\"parameters\": ' }}\n        {{- tool_call.arguments | tojson }}\n        {{- \"}\" }}\n        {{- \"<|eot_id|>\" }}\n    {%- elif message.role == \"tool\" or message.role == \"ipython\" %}\n        {{- \"<|start_header_id|>ipython<|end_header_id|>\\n\\n\" }}\n        {%- if message.content is mapping or message.content is iterable %}\n            {{- message.content | tojson }}\n        {%- else %}\n            {{- message.content }}\n        {%- endif %}\n        {{- \"<|eot_id|>\" }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif %}\n"

# 学習データセットの前処理に関する設定
datasets:
  - path: Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
    split: train
    type: chat_template
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
  - path: Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted
    split: train
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: role
      content: content
  - path: Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered
    split: train
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: role
      content: content
  - path: Aratako/Open-Platypus-Japanese-masked-formatted
    split: train
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: role
      content: content
  - path: kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
    split: train
    type: chat_template
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
  - path: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
    split: 20240806filtered[0:10000]
    type: chat_template
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
  - path: Aratako/magpie-ultra-v0.1-formatted
    split: train
    type: chat_template
    field_messages: conversations
    message_property_mappings:
      role: role
      content: content
  - path: Aratako/orca-agentinstruct-1M-v1-selected
    split: train
    type: chat_template
    field_messages: messages
    message_property_mappings:
      role: role
      content: content

# データセット、モデルの出力先に関する設定
shuffle_merged_datasets: true
dataset_prepared_path: /home/kazuma/codes/halcyon_sft/data/sft-data
output_dir: /home/kazuma/codes/halcyon_sft/data/models/Llama-3.2-1B-inst-lora

# valid datasetのサイズ
val_set_size: 0.01

# LoRAに関する設定（フルファインチューニングしたい場合は全て空欄にする）
adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_modules_to_save:
  - embed_tokens
  - lm_head
lora_fan_in_fan_out:

lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true

# wandbに関する設定
wandb_project: halcyon
wandb_entity: tepic
wandb_watch:
wandb_name: Llama-3.2-1B-inst-1-lora
wandb_log_model:

# 学習に関する様々な設定
sequence_len: 2048
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

gradient_accumulation_steps: 64
micro_batch_size: 4
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.1
learning_rate: 3e-5

train_on_inputs: false
group_by_length: false
bfloat16: true
fp16:
tf32: false

gradient_checkpointing: false
early_stopping_patience:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

save_strategy: steps
save_steps: 100
save_total_limit: 2

warmup_steps: 10
eval_steps: 100
logging_steps: 1
eval_batch_size: 8
eval_table_size:
eval_max_new_tokens:
debug:
weight_decay: 0.01
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|finetune_right_pad_id|>
  eos_token: <|eot_id|>

home/kazuma/codes/halcyon_sft/data/models/Llama-3.2-1B-inst-lora

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on the Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered, the Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted, the Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered, the Aratako/Open-Platypus-Japanese-masked-formatted, the kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja, the kanhatakeyama/ramdom-to-fixed-multiturn-Calm3, the Aratako/magpie-ultra-v0.1-formatted and the Aratako/orca-agentinstruct-1M-v1-selected datasets. It achieves the following results on the evaluation set:

Loss: 1.0384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 64
total_train_batch_size: 256
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.276	0.0013	1	1.2994
1.1198	0.1259	100	1.1198
1.0641	0.2518	200	1.0796
1.0592	0.3777	300	1.0604
1.0549	0.5036	400	1.0498
1.0261	0.6295	500	1.0437
1.0342	0.7554	600	1.0402
1.0525	0.8813	700	1.0384

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.0+cu128
Datasets 3.5.0
Tokenizers 0.21.1

kajuma
/

Llama-3.2-1B-inst-w-lora

home/kazuma/codes/halcyon_sft/data/models/Llama-3.2-1B-inst-lora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kajuma/Llama-3.2-1B-inst-w-lora

Datasets used to train kajuma/Llama-3.2-1B-inst-w-lora

Evaluation results