AXCXEPT/EZO-Qwen2.5-32B-Instruct
Introduction
This model is based on Qwen/Qwen2.5-32B-Instruct with multiple tunings to improve overall performance from the Base model. It excels at Japanese language tasks, but is designed to meet a variety of global needs.
In the Japanese MT Bench using gpt-4o as the evaluator, the inference performance of this model with 4-bit quantization achieved a score approaching that of gpt-4-turbo.
Qwen/Qwen2.5-32B-Instructใใใผในใซ่คๆฐใฎใใฅใผใใณใฐใๆฝใใBaseใขใใซใใ็ทๅ็ใชใใใฉใผใใณในใๅไธใใใใขใใซใงใใ ๆฅๆฌ่ชใฟในใฏใๅพๆใจใใพใใใใฐใญใผใใซใชๅคๆงใชใใผใบใซๅฏพๅฟใงใใใใใซ่จญ่จใใใฆใใพใใ gpt-4oใ่ฉไพกๅจใจใใJapanese MT Benchใซใใใฆใ4ใใใ้ๅญๅใ็จใใๆฌใขใใซใฎๆจ่ซ่ฝๅใฏgpt-4-turboใใซ่ฟใฅใในใณใขใ้ๆใใใใพใใใ
[Benchmark Results]
[Usage]
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
pip install bitsandbytes transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "AXCXEPT/EZO-Qwen2.5-32B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "ไปไบใฎ็ฑๆใๅใๆปใใใใฎใขใคใใขใ5ใคๆใใฆใใ ใใใ"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")
#if you don't use "load_in_4bit", you should do "model_inputs = tokenizer([text], return_tensors="pt").to(model.device)"
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Training Dataset]
We extracted high-quality data from Japanese Wikipedia and FineWeb to create instruction data. Our innovative training approach allows for performance improvements across various languages and domains, making the model suitable for global use despite its focus on Japanese data.
ๆฅๆฌ่ชใฎWikiใใผใฟใใใณใFineWebใใ่ฏ่ณชใชใใผใฟใฎใฟใๆฝๅบใใInstructionใใผใฟใไฝๆใใพใใใใใฎใขใใซใงใฏๆฅๆฌ่ชใซ็นๅใใใฆใใพใใใไธ็ไธญใฎใฉใใชใฆใผในใฑใผในใงใๅฉ็จๅฏ่ฝใชใขใใญใผใใงใใ
https://huggingface.co/datasets/legacy-datasets/wikipedia https://huggingface.co/datasets/HuggingFaceFW/fineweb
Data Preprocessing
We used a plain instruction tuning method to train the model on exemplary responses. This approach enhances the model's ability to understand and generate high-quality responses across various languages and contexts.
ใใฌใคใณในใใฉใฏใใใฅใผใใณใฐๆๆณใ็จใใฆใๆจก็ฏ็ๅ็ญใๅญฆ็ฟใใใพใใใใใฎๆๆณใซใใใใขใใซใฏๆงใ ใช่จ่ชใใณใณใใญในใใซใใใฆ้ซๅ่ณชใชๅฟ็ญใ็่งฃใ็ๆใใ่ฝๅใๅไธใใฆใใพใใ
Implementation Information
[Pre-Instruction Training]
https://huggingface.co/instruction-pretrain/instruction-synthesizer
[Disclaimer]
ใใฎใขใใซใฏ็ ็ฉถ้็บใฎใฟใ็ฎ็ใจใใฆๆไพใใใใใฎใงใใใๅฎ้จ็ใชใใญใใฟใคใใจใฟใชใใใในใใขใใซใงใใ ๅๆฅญ็ใชไฝฟ็จใใใใทใงใณใฏใชใใฃใซใซใช็ฐๅขใธใฎ้ ๅใๆๅณใใใใฎใงใฏใใใพใใใ ๆฌใขใใซใฎไฝฟ็จใฏใไฝฟ็จ่ ใฎ่ฒฌไปปใซใใใฆ่กใใใใใฎใจใใใใฎๆง่ฝใใใณ็ตๆใฏไฟ่จผใใใพใใใ Axcxeptๆ ชๅผไผ็คพใฏใ็ดๆฅ็ใ้ๆฅ็ใ็นๅฅใๅถ็บ็ใ็ตๆ็ใชๆๅฎณใใพใใฏๆฌใขใใซใฎไฝฟ็จใใ็ใใใใใชใๆๅคฑใซๅฏพใใฆใใๅพใใใ็ตๆใซใใใใใใไธๅใฎ่ฒฌไปปใ่ฒ ใใพใใใ ๅฉ็จ่ ใฏใๆฌใขใใซใฎไฝฟ็จใซไผดใใชในใฏใๅๅใซ็่งฃใใ่ชๅทฑใฎๅคๆญใงไฝฟ็จใใใใฎใจใใพใใ
[Hardware]
A100 ร 4(Running in 32h)
[Sponsored]
ใใฎๆดปๅใฏใใฏใฉใฆใใใกใณใใฃใณใฐใงใฎๆฏๆดใใใ็ถ็ถใๅฎ็พใใๆดปๅใงใใไปฅไธใฎๆนใ ใใใฎใๆฏๆดใซๆ่ฌ็ณใไธใใพใใ
๏ผ้ ไธๅ๏ผ
ใปMK ๆง
ใปใใผใ ๆง
ใปyamako ๆง
ใปChatGPT็ ็ฉถๆ ๆง
ใปkotarosan ๆง
[่ฌ่พ]
We would like to express our gratitude and respect to Alibaba Cloud and the team of developers who developed this base model, as well as to the many others who contributed to the automated evaluation methodology. ๆฌใใผในใขใใซใ้็บใใฆใใ ใใฃใAlibaba Cloud็คพใชใใณใซๅฝ่ฉฒใใผใ ใฎ้็บ่ ใฎๆนใ ใใพใ่ชๅ่ฉไพกใฎๆๆณใๆไพใใฆใใ ใใฃใๅคๆฐใฎๆนใ ใซๆ่ฌใจๅฐๆฌใฎๆใ่กจใใพใใ
Company:
- Downloads last month
- 82
Model tree for AXCXEPT/EZO-Qwen2.5-32B-Instruct
Base model
Qwen/Qwen2.5-32B