AXCXEPT/EZO-Qwen2.5-32B-Instruct

image/png

Introduction

This model is based on Qwen/Qwen2.5-32B-Instruct with multiple tunings to improve overall performance from the Base model. It excels at Japanese language tasks, but is designed to meet a variety of global needs.

In the Japanese MT Bench using gpt-4o as the evaluator, the inference performance of this model with 4-bit quantization achieved a score approaching that of gpt-4-turbo.

Qwen/Qwen2.5-32B-Instructใ‚’ใƒ™ใƒผใ‚นใซ่ค‡ๆ•ฐใฎใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐใ‚’ๆ–ฝใ—ใ€Baseใƒขใƒ‡ใƒซใ‹ใ‚‰็ทๅˆ็š„ใชใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใ‚’ๅ‘ไธŠใ•ใ›ใŸใƒขใƒ‡ใƒซใงใ™ใ€‚ ๆ—ฅๆœฌ่ชžใ‚ฟใ‚นใ‚ฏใ‚’ๅพ—ๆ„ใจใ—ใพใ™ใŒใ€ใ‚ฐใƒญใƒผใƒใƒซใชๅคšๆง˜ใชใƒ‹ใƒผใ‚บใซๅฏพๅฟœใงใใ‚‹ใ‚ˆใ†ใซ่จญ่จˆใ•ใ‚Œใฆใ„ใพใ™ใ€‚ gpt-4oใ‚’่ฉ•ไพกๅ™จใจใ—ใŸJapanese MT BenchใซใŠใ„ใฆใ€4ใƒ“ใƒƒใƒˆ้‡ๅญๅŒ–ใ‚’็”จใ„ใŸๆœฌใƒขใƒ‡ใƒซใฎๆŽจ่ซ–่ƒฝๅŠ›ใฏgpt-4-turboใ‚’ใซ่ฟ‘ใฅใใ‚นใ‚ณใ‚ขใ‚’้”ๆˆใ„ใŸใ—ใพใ—ใŸใ€‚

[Benchmark Results]

image/png

[Usage]

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

pip install bitsandbytes transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "AXCXEPT/EZO-Qwen2.5-32B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "ไป•ไบ‹ใฎ็†ฑๆ„ใ‚’ๅ–ใ‚Šๆˆปใ™ใŸใ‚ใฎใ‚ขใ‚คใƒ‡ใ‚ขใ‚’5ใคๆŒ™ใ’ใฆใใ ใ•ใ„ใ€‚"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")
#if you don't use "load_in_4bit", you should do "model_inputs = tokenizer([text], return_tensors="pt").to(model.device)"

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Training Dataset]

We extracted high-quality data from Japanese Wikipedia and FineWeb to create instruction data. Our innovative training approach allows for performance improvements across various languages and domains, making the model suitable for global use despite its focus on Japanese data.

ๆ—ฅๆœฌ่ชžใฎWikiใƒ‡ใƒผใ‚ฟใŠใ‚ˆใณใ€FineWebใ‹ใ‚‰่‰ฏ่ณชใชใƒ‡ใƒผใ‚ฟใฎใฟใ‚’ๆŠฝๅ‡บใ—ใ€Instructionใƒ‡ใƒผใ‚ฟใ‚’ไฝœๆˆใ—ใพใ—ใŸใ€‚ใ“ใฎใƒขใƒ‡ใƒซใงใฏๆ—ฅๆœฌ่ชžใซ็‰นๅŒ–ใ•ใ›ใฆใ„ใพใ™ใŒใ€ไธ–็•Œไธญใฎใฉใ‚“ใชใƒฆใƒผใ‚นใ‚ฑใƒผใ‚นใงใ‚‚ๅˆฉ็”จๅฏ่ƒฝใชใ‚ขใƒ—ใƒญใƒผใƒใงใ™ใ€‚

https://huggingface.co/datasets/legacy-datasets/wikipedia https://huggingface.co/datasets/HuggingFaceFW/fineweb

Data Preprocessing

We used a plain instruction tuning method to train the model on exemplary responses. This approach enhances the model's ability to understand and generate high-quality responses across various languages and contexts.

ใƒ—ใƒฌใ‚คใƒณใ‚นใƒˆใƒฉใ‚ฏใƒˆใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐๆ‰‹ๆณ•ใ‚’็”จใ„ใฆใ€ๆจก็ฏ„็š„ๅ›ž็ญ”ใ‚’ๅญฆ็ฟ’ใ•ใ›ใพใ—ใŸใ€‚ใ“ใฎๆ‰‹ๆณ•ใซใ‚ˆใ‚Šใ€ใƒขใƒ‡ใƒซใฏๆง˜ใ€…ใช่จ€่ชžใ‚„ใ‚ณใƒณใƒ†ใ‚ญใ‚นใƒˆใซใŠใ„ใฆ้ซ˜ๅ“่ณชใชๅฟœ็ญ”ใ‚’็†่งฃใ—็”Ÿๆˆใ™ใ‚‹่ƒฝๅŠ›ใŒๅ‘ไธŠใ—ใฆใ„ใพใ™ใ€‚

Implementation Information

[Pre-Instruction Training]

https://huggingface.co/instruction-pretrain/instruction-synthesizer

[Disclaimer]

ใ“ใฎใƒขใƒ‡ใƒซใฏ็ ”็ฉถ้–‹็™บใฎใฟใ‚’็›ฎ็š„ใจใ—ใฆๆไพ›ใ•ใ‚Œใ‚‹ใ‚‚ใฎใงใ‚ใ‚Šใ€ๅฎŸ้จ“็š„ใชใƒ—ใƒญใƒˆใ‚ฟใ‚คใƒ—ใจใฟใชใ•ใ‚Œใ‚‹ในใใƒขใƒ‡ใƒซใงใ™ใ€‚ ๅ•†ๆฅญ็š„ใชไฝฟ็”จใ‚„ใƒŸใƒƒใ‚ทใƒงใƒณใ‚ฏใƒชใƒ†ใ‚ฃใ‚ซใƒซใช็’ฐๅขƒใธใฎ้…ๅ‚™ใ‚’ๆ„ๅ›ณใ—ใŸใ‚‚ใฎใงใฏใ‚ใ‚Šใพใ›ใ‚“ใ€‚ ๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใฏใ€ไฝฟ็”จ่€…ใฎ่ฒฌไปปใซใŠใ„ใฆ่กŒใ‚ใ‚Œใ‚‹ใ‚‚ใฎใจใ—ใ€ใใฎๆ€ง่ƒฝใŠใ‚ˆใณ็ตๆžœใฏไฟ่จผใ•ใ‚Œใพใ›ใ‚“ใ€‚ Axcxeptๆ ชๅผไผš็คพใฏใ€็›ดๆŽฅ็š„ใ€้–“ๆŽฅ็š„ใ€็‰นๅˆฅใ€ๅถ็™บ็š„ใ€็ตๆžœ็š„ใชๆๅฎณใ€ใพใŸใฏๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใ‹ใ‚‰็”Ÿใ˜ใ‚‹ใ„ใ‹ใชใ‚‹ๆๅคฑใซๅฏพใ—ใฆใ‚‚ใ€ๅพ—ใ‚‰ใ‚ŒใŸ็ตๆžœใซใ‹ใ‹ใ‚ใ‚‰ใšใ€ไธ€ๅˆ‡ใฎ่ฒฌไปปใ‚’่ฒ ใ„ใพใ›ใ‚“ใ€‚ ๅˆฉ็”จ่€…ใฏใ€ๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใซไผดใ†ใƒชใ‚นใ‚ฏใ‚’ๅๅˆ†ใซ็†่งฃใ—ใ€่‡ชๅทฑใฎๅˆคๆ–ญใงไฝฟ็”จใ™ใ‚‹ใ‚‚ใฎใจใ—ใพใ™ใ€‚

[Hardware]

A100 ร— 4(Running in 32h)

[Sponsored]

ใ“ใฎๆดปๅ‹•ใฏใ€ใ‚ฏใƒฉใ‚ฆใƒ‰ใƒ•ใ‚กใƒณใƒ‡ใ‚ฃใƒณใ‚ฐใงใฎๆ”ฏๆดใŒใ‚ใ‚Š็ถ™็ถšใŒๅฎŸ็พใ—ใŸๆดปๅ‹•ใงใ™ใ€‚ไปฅไธ‹ใฎๆ–นใ€…ใ‹ใ‚‰ใฎใ”ๆ”ฏๆดใซๆ„Ÿ่ฌ็”ณใ—ไธŠใ’ใพใ™ใ€‚

๏ผˆ้ †ไธๅŒ๏ผ‰

ใƒปMK ๆง˜

ใƒปใ—ใƒผใŸ ๆง˜

ใƒปyamako ๆง˜

ใƒปChatGPT็ ”็ฉถๆ‰€ ๆง˜

ใƒปkotarosan ๆง˜

[่ฌ่พž]

We would like to express our gratitude and respect to Alibaba Cloud and the team of developers who developed this base model, as well as to the many others who contributed to the automated evaluation methodology. ๆœฌใƒ™ใƒผใ‚นใƒขใƒ‡ใƒซใ‚’้–‹็™บใ—ใฆใใ ใ•ใฃใŸAlibaba Cloud็คพใชใ‚‰ใณใซๅฝ“่ฉฒใƒใƒผใƒ ใฎ้–‹็™บ่€…ใฎๆ–นใ€…ใ€ใพใŸ่‡ชๅ‹•่ฉ•ไพกใฎๆ‰‹ๆณ•ใ‚’ๆไพ›ใ—ใฆใใ ใ•ใฃใŸๅคšๆ•ฐใฎๆ–นใ€…ใซๆ„Ÿ่ฌใจๅฐŠๆ•ฌใฎๆ„ใ‚’่กจใ—ใพใ™ใ€‚

Company:

Axcxept co., ltd. Axcxept logo

Downloads last month
82
Safetensors
Model size
32.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AXCXEPT/EZO-Qwen2.5-32B-Instruct

Base model

Qwen/Qwen2.5-32B
Finetuned
(197)
this model
Finetunes
1 model
Merges
9 models
Quantizations
4 models

Space using AXCXEPT/EZO-Qwen2.5-32B-Instruct 1

Collections including AXCXEPT/EZO-Qwen2.5-32B-Instruct