๐Ÿ’จ๐Ÿ“Ÿ Vikhr-Qwen-2.5-0.5B-Instruct

RU

ะ˜ะฝัั‚ั€ัƒะบั‚ะธะฒะฝะฐั ะผะพะดะตะปัŒ ะฝะฐ ะพัะฝะพะฒะต Qwen-2.5-0.5B-Instruct, ะพะฑัƒั‡ะตะฝะฝะฐั ะฝะฐ ั€ัƒััะบะพัะทั‹ั‡ะฝะพะผ ะดะฐั‚ะฐัะตั‚ะต GrandMaster-PRO-MAX. ะ’ 4 ั€ะฐะทะฐ ัั„ั„ะตะบั‚ะธะฒะฝะตะต ะฑะฐะทะพะฒะพะน ะผะพะดะตะปะธ, ะธ ะธะดะตะฐะปัŒะฝะพ ะฟะพะดั…ะพะดะธั‚ ะดะปั ะทะฐะฟัƒัะบะฐ ะฝะฐ ัะปะฐะฑั‹ั… ะผะพะฑะธะปัŒะฝั‹ั… ัƒัั‚ั€ะพะนัั‚ะฒะฐั….

EN

Instructive model based on Qwen-2.5-0.5B-Instruct, trained on the Russian-language dataset GrandMaster-PRO-MAX. It is 4 times more efficient than the base model, making it perfect for deployment on low-end mobile devices.

GGUF

ะžัะพะฑะตะฝะฝะพัั‚ะธ:

  • ๐Ÿ“š ะžัะฝะพะฒะฐ / Base: Qwen-2.5-0.5B-Instruct
  • ๐Ÿ‡ท๐Ÿ‡บ ะกะฟะตั†ะธะฐะปะธะทะฐั†ะธั / Specialization: RU
  • ๐Ÿ’พ ะ”ะฐั‚ะฐัะตั‚ / Dataset: GrandMaster-PRO-MAX

ะŸะพะฟั€ะพะฑะพะฒะฐั‚ัŒ / Try now:

Open In Colab

ะžะฟะธัะฐะฝะธะต:

RU

Vikhr-Qwen-2.5-0.5B-instruct โ€” ัั‚ะพ ะบะพะผะฟะฐะบั‚ะฝะฐั ัะทั‹ะบะพะฒะฐั ะผะพะดะตะปัŒ, ะพะฑัƒั‡ะตะฝะฝะฐั ะฝะฐ ะดะฐั‚ะฐัะตั‚ะต GrandMaster-PRO-MAX, ัะฟะตั†ะธะฐะปัŒะฝะพ ะดะพัƒั‡ะตะฝะฝะฐั ะดะปั ะพะฑั€ะฐะฑะพั‚ะบะธ ั€ัƒััะบะพะณะพ ัะทั‹ะบะฐ. ะญั„ั„ะตะบั‚ะธะฒะฝะพัั‚ัŒ ะผะพะดะตะปะธ ะฒ 4 ั€ะฐะทะฐ ะฟั€ะตะฒั‹ัˆะฐะตั‚ ะฑะฐะทะพะฒัƒัŽ ะผะพะดะตะปัŒ, ะฐ ะตั‘ ั€ะฐะทะผะตั€ ัะพัั‚ะฐะฒะปัะตั‚ 1ะ“ะ‘ , ั‡ั‚ะพ ะดะตะปะฐะตั‚ ะตั‘ ะพั‚ะปะธั‡ะฝั‹ะผ ะฒั‹ะฑะพั€ะพะผ ะดะปั ะทะฐะฟัƒัะบะฐ ะฝะฐ ัะปะฐะฑั‹ั… ะผะพะฑะธะปัŒะฝั‹ั… ัƒัั‚ั€ะพะนัั‚ะฒะฐั….

EN

Vikhr-Qwen-2.5-0.5B-instruct is a compact language model trained on the GrandMaster-PRO-MAX dataset, specifically designed for processing the Russian language. Its efficiency is 4 times higher than the base model, and its size is 1GB, making it an excellent choice for deployment on low-end mobile devices.

ะžะฑัƒั‡ะตะฝะธะต / Train:

RU

ะ”ะปั ัะพะทะดะฐะฝะธั Vikhr-Qwen-2.5-0.5B-Instruct ะธัะฟะพะปัŒะทะพะฒะฐะปัั ะผะตั‚ะพะด SFT (Supervised Fine-Tuning). ะœั‹ ะพะฑัƒั‡ะธะปะธ ะผะพะดะตะปัŒ ะฝะฐ ัะธะฝั‚ะตั‚ะธั‡ะตัะบะพะผ ะดะฐั‚ะฐัะตั‚ะต Vikhrmodels/GrandMaster-PRO-MAX (150k ะธะฝัั‚ั€ัƒะบั†ะธะน) ั ะฟะพะดะดะตั€ะถะบะพะน CoT (Chain-Of-Thought), ะธัะฟะพะปัŒะทัƒั ะฟั€ะพะผะฟั‚ั‹ ะดะปั GPT-4-turbo.

EN

To create Vikhr-Qwen-2.5-0.5B-Instruct, the SFT (Supervised Fine-Tuning) method was used. We trained the model on a synthetic dataset Vikhrmodels/GrandMaster-PRO-MAX (150k instructions) with support for CoT (Chain-Of-Thought), utilizing prompts for GPT-4-turbo.

ะŸั€ะธะผะตั€ ะบะพะดะฐ ะดะปั ะทะฐะฟัƒัะบะฐ / Sample code to run:

ะ ะตะบะพะผะตะฝะดัƒะตะผะฐั ั‚ะตะผะฟะตั€ะฐั‚ัƒั€ะฐ ะดะปั ะณะตะฝะตั€ะฐั†ะธะธ: 0.3 / Recommended generation temperature: 0.3.

from transformers import AutoModelForCausalLM, AutoTokenizer

# ะ—ะฐะณั€ัƒะทะบะฐ ะผะพะดะตะปะธ ะธ ั‚ะพะบะตะฝะธะทะฐั‚ะพั€ะฐ
model_name = "Vikhrmodels/Vikhr-Qwen-2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ะŸะพะดะณะพั‚ะพะฒะบะฐ ะฒั…ะพะดะฝะพะณะพ ั‚ะตะบัั‚ะฐ
input_text = "ะะฐะฟะธัˆะธ ะพั‡ะตะฝัŒ ะบั€ะฐั‚ะบัƒัŽ ั€ะตั†ะตะฝะทะธัŽ ะพ ะบะฝะธะณะต ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€."

messages = [
    {"role": "system", "content": "ะ’ั‹ - Vikhr, ะฟะพะผะพั‰ะฝะธะบ ั ะธัะบัƒััั‚ะฒะตะฝะฝั‹ะผ ะธะฝั‚ะตะปะปะตะบั‚ะพะผ, ัะพะทะดะฐะฝะฝั‹ะน ะบะพะผะฟะฐะฝะธะตะน Vikhr models, ั‡ั‚ะพะฑั‹ ะฑั‹ั‚ัŒ ะฟะพะปะตะทะฝั‹ะผ, ะฑะตะทะพะฑะธะดะฝั‹ะผ ะธ ั‡ะตัั‚ะฝั‹ะผ."},
    {"role": "user", "content": input_text},
]

# ะขะพะบะตะฝะธะทะฐั†ะธั ะธ ะณะตะฝะตั€ะฐั†ะธั ั‚ะตะบัั‚ะฐ
input_ids = tokenizer.apply_chat_template(messages, truncation=True, add_generation_prompt=True, return_tensors="pt")
output = model.generate(
    input_ids,
    max_length=1512,
    temperature=0.3,
    num_return_sequences=1,
    no_repeat_ngram_size=2,
    top_k=50,
    top_p=0.95,
)

# ะ”ะตะบะพะดะธั€ะพะฒะฐะฝะธะต ะธ ะฒั‹ะฒะพะด ั€ะตะทัƒะปัŒั‚ะฐั‚ะฐ
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

ะžั‚ะฒะตั‚ ะผะพะดะตะปะธ / Model response:

ะšะฝะธะณะฐ "ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€" โ€“ ัั‚ะพ ัะตั€ะธั ะบะฝะธะณ, ะฝะฐะฟะธัะฐะฝะฝั‹ั… ะฑั€ะธั‚ะฐะฝัะบะธะผ ะฟะธัะฐั‚ะตะปะตะผ ะ”ะถะพะฐะฝ ะ ะพัƒะปะธะฝะณ. ะญั‚ะพ ะพะดะฝะพ ะธะท ัะฐะผั‹ั… ะธะทะฒะตัั‚ะฝั‹ั… ะฟั€ะพะธะทะฒะตะดะตะฝะธะน ะฒ ะผะธั€ะต ะปะธั‚ะตั€ะฐั‚ัƒั€ั‹ ะธ ะฟะพะฟัƒะปัั€ะฝะพะณะพ ะดะตั‚ัะบะพะณะพ ั‚ะฒะพั€ั‡ะตัั‚ะฒะฐ.

ะžัะฝะพะฒะฝั‹ะต ั‡ะตั€ั‚ั‹ ัะตั€ะธะธ:

  1. ะกัŽะถะตั‚: ะกะพะฑั‹ั‚ะธั ั€ะฐะทะฒะพั€ะฐั‡ะธะฒะฐัŽั‚ัั ะฒะพะบั€ัƒะณ ะผะฐะปัŒั‡ะธะบะฐ ะฟะพ ะธะผะตะฝะธ ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€, ะบะพั‚ะพั€ั‹ะน ัƒั‡ะธั‚ัั ะฒ ะจะบะพะปะต ะฒะพะปัˆะตะฑัั‚ะฒะฐ ะธ ั„ะธะปะพัะพั„ะธะธ ะฒ ะฃะฝะธะฒะตั€ัะธั‚ะตั‚ะต ะฅะพะณะฒะฐั€ั‚ั. ะžะฝ ัั‚ะฐะปะบะธะฒะฐะตั‚ัั ั ั€ะฐะทะปะธั‡ะฝั‹ะผะธ ะฟั€ะตะฟัั‚ัั‚ะฒะธัะผะธ, ะฒะบะปัŽั‡ะฐั ะฑะพั€ัŒะฑัƒ ัะพ ะทะปะพะผ, ะฟะพะธัะบ ะดั€ัƒะทะตะน ะธ ัะฐะผะพะฟะพะทะฝะฐะฝะธะต.

  2. ะŸะตั€ัะพะฝะฐะถะธ: ะ’ ะบะฝะธะณะต ะฟั€ะตะดัั‚ะฐะฒะปะตะฝั‹ ะผะฝะพะถะตัั‚ะฒะพ ะฟะตั€ัะพะฝะฐะถะตะน, ะบะฐะถะดั‹ะน ะธะท ะบะพั‚ะพั€ั‹ั… ะธะผะตะตั‚ ัะฒะพะธ ัƒะฝะธะบะฐะปัŒะฝั‹ะต ั‡ะตั€ั‚ั‹ ั…ะฐั€ะฐะบั‚ะตั€ะฐ, ะผะพั‚ะธะฒะฐั†ะธะธ ะธ ะฟั€ะพัˆะปะพะต. ะ“ะปะฐะฒะฝั‹ะน ะณะตั€ะพะน, ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€, ัะฒะปัะตั‚ัั ะฟั€ะธะผะตั€ะพะผ ะดะพะฑั€ะพะณะพ ะธ ัะผะตะปะพะณะพ ั‡ะตะปะพะฒะตะบะฐ, ะฐ ั‚ะฐะบะถะต ะฝะตะพะฑั‹ั‡ะฝะพะน ะปะธั‡ะฝะพัั‚ัŒัŽ.

  3. ะขะตะผั‹ ะธ ะธะดะตะธ: ะ ะฐััะบะฐะทั‹ ะบะฝะธะณะธ ะทะฐั‚ั€ะฐะณะธะฒะฐัŽั‚ ั‚ะตะผั‹ ะปัŽะฑะฒะธ, ะดั€ัƒะถะฑั‹, ัะฟั€ะฐะฒะตะดะปะธะฒะพัั‚ะธ, ะผะพั€ะฐะปะธ, ั‡ะตะปะพะฒะตั‡ะตัะบะพะน ะฝะตะฟะพะฒะธะฝะพะฒะตะฝะฝะพัั‚ะธ ะธ ะฒะฐะถะฝะพัั‚ะธ ะพะฑัƒั‡ะตะฝะธั ั‡ะตั€ะตะท ะฟั€ะธะบะปัŽั‡ะตะฝะธั.

  4. ะ˜ัั‚ะพั€ะธั ะธ ั€ะฐะทะฒะธั‚ะธะต ะฟะตั€ัะพะฝะฐะถะตะน: ะงะตั€ะตะท ัะพะฑั‹ั‚ะธั ะธ ะฒะทะฐะธะผะพะดะตะนัั‚ะฒะธะต ั ะดั€ัƒะณะธะผะธ ะฟะตั€ัะพะฝะฐะถะฐะผะธ ะบะฝะธะณะฐ ะธััะปะตะดัƒะตั‚ ะณะปัƒะฑะพะบะธะต ะฟัะธั…ะพะปะพะณะธั‡ะตัะบะธะต ะธ ั„ะธะปะพัะพั„ัะบะธะต ะฒะพะฟั€ะพัั‹.

  5. ะ’ะปะธัะฝะธะต ะฝะฐ ะบัƒะปัŒั‚ัƒั€ัƒ: "ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€" ะพะบะฐะทะฐะป ะพะณั€ะพะผะฝะพะต ะฒะปะธัะฝะธะต ะฝะฐ ะผะธั€ะพะฒัƒัŽ ะปะธั‚ะตั€ะฐั‚ัƒั€ัƒ, ะฟั€ะตะฒั€ะฐั‚ะธะฒัˆะธััŒ ะฒ ะบัƒะปัŒั‚ะพะฒั‹ะน ะถะฐะฝั€ ะธ ัะธะผะฒะพะป ะทะฝะฐะฝะธะน ะธ ะผัƒะดั€ะพัั‚ะธ.

  6. ะ”ะพัั‚ัƒะฟะฝะพัั‚ัŒ: ะšะฝะธะณะธ ัะตั€ะธะธ ะดะพัั‚ัƒะฟะฝั‹ ะดะปั ัˆะธั€ะพะบะพะน ะฐัƒะดะธั‚ะพั€ะธะธ ะธ ะฟะพะปัŒะทัƒัŽั‚ัั ะฑะพะปัŒัˆะธะผ ัะฟั€ะพัะพะผ, ั‡ั‚ะพ ะดะตะปะฐะตั‚ ะธั… ะฟะพะฟัƒะปัั€ะฝั‹ะผ ะฒั‹ะฑะพั€ะพะผ ัั€ะตะดะธ ั‡ะธั‚ะฐั‚ะตะปะตะน ะฒัะตั… ะฒะพะทั€ะฐัั‚ะพะฒ.

  7. ะ ะฐะทะฒะธั‚ะธะต ะถะฐะฝั€ะฐ: ะะตัะผะพั‚ั€ั ะฝะฐ ั‚ะพ ั‡ั‚ะพ "ะ“ะฐั€ั€ะธ ะŸะพั‚ั‚ะตั€" ัะฒะปัะตั‚ัั ั‡ะฐัั‚ัŒัŽ ัะตั€ะธะธ, ะพะฝ ะฟั€ะพะดะพะปะถะฐะตั‚ ะฑั‹ั‚ัŒ ะปัŽะฑะธะผั‹ะผ ะธ ะฐะบั‚ัƒะฐะปัŒะฝั‹ะผ, ั‚ะฐะบ ะบะฐะบ ะฟั€ะพะดะพะปะถะฐะตั‚ ัƒะดะธะฒะปัั‚ัŒ ั‡ะธั‚ะฐั‚ะตะปะตะน ะฝะพะฒั‹ะผะธ ะธัั‚ะพั€ะธัะผะธ ะธ ะฟะตั€ัะพะฝะฐะถะฐะผะธ.

ะญั‚ะฐ ัะตั€ะธั ะบะฝะธะณ ะพัั‚ะฐะตั‚ัั ะพะดะฝะพะน ะธะท ัะฐะผั‹ั… ะทะฝะฐั‡ะธั‚ะตะปัŒะฝั‹ั… ะธ ะฒะปะธัั‚ะตะปัŒะฝั‹ั… ะฒ ะธัั‚ะพั€ะธะธ ะปะธั‚ะตั€ะฐั‚ัƒั€ั‹, ะพะบะฐะทะฐะฒ ะฒะปะธัะฝะธะต ะฝะฐ ั€ะฐะทะฒะธั‚ะธะต ะผะธั€ะพะฒะพะน ะบัƒะปัŒั‚ัƒั€ั‹ ะธ ะพะฑั€ะฐะทะพะฒะฐะฝะธะต.

ะะฒั‚ะพั€ั‹ / Authors

@article{nikolich2024vikhr,
  title={Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian},
  author={Aleksandr Nikolich and Konstantin Korolev and Sergey Bratchikov and Nikolay Kompanets and Artem Shelmanov},
  journal={arXiv preprint arXiv:2405.13929},
  year={2024},
  url={https://arxiv.org/pdf/2405.13929}
}
Downloads last month
2,053
Safetensors
Model size
494M params
Tensor type
FP16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Vikhrmodels/Vikhr-Qwen-2.5-0.5b-Instruct

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(335)
this model
Finetunes
2 models
Quantizations
3 models

Dataset used to train Vikhrmodels/Vikhr-Qwen-2.5-0.5b-Instruct

Space using Vikhrmodels/Vikhr-Qwen-2.5-0.5b-Instruct 1