Qwen
/

Qwen-7B-Chat


๐Ÿค— Hugging Face   |   ๐Ÿค– ModelScope   |    ๐Ÿ“‘ Paper    ๏ฝœ   ๐Ÿ–ฅ๏ธ Demo
WeChat (ๅพฎไฟก)   |   Discord   ๏ฝœ   API


ไป‹็ป๏ผˆIntroduction๏ผ‰

้€šไน‰ๅƒ้—ฎ-7B๏ผˆQwen-7B๏ผ‰ๆ˜ฏ้˜ฟ้‡Œไบ‘็ ”ๅ‘็š„้€šไน‰ๅƒ้—ฎๅคงๆจกๅž‹็ณปๅˆ—็š„70ไบฟๅ‚ๆ•ฐ่ง„ๆจก็š„ๆจกๅž‹ใ€‚Qwen-7Bๆ˜ฏๅŸบไบŽTransformer็š„ๅคง่ฏญ่จ€ๆจกๅž‹, ๅœจ่ถ…ๅคง่ง„ๆจก็š„้ข„่ฎญ็ปƒๆ•ฐๆฎไธŠ่ฟ›่กŒ่ฎญ็ปƒๅพ—ๅˆฐใ€‚้ข„่ฎญ็ปƒๆ•ฐๆฎ็ฑปๅž‹ๅคšๆ ท๏ผŒ่ฆ†็›–ๅนฟๆณ›๏ผŒๅŒ…ๆ‹ฌๅคง้‡็ฝ‘็ปœๆ–‡ๆœฌใ€ไธ“ไธšไนฆ็ฑใ€ไปฃ็ ็ญ‰ใ€‚ๅŒๆ—ถ๏ผŒๅœจQwen-7B็š„ๅŸบ็ก€ไธŠ๏ผŒๆˆ‘ไปฌไฝฟ็”จๅฏน้ฝๆœบๅˆถๆ‰“้€ ไบ†ๅŸบไบŽๅคง่ฏญ่จ€ๆจกๅž‹็š„AIๅŠฉๆ‰‹Qwen-7B-Chatใ€‚็›ธ่พƒไบŽๆœ€ๅˆๅผ€ๆบ็š„Qwen-7Bๆจกๅž‹๏ผŒๆˆ‘ไปฌ็Žฐๅทฒๅฐ†้ข„่ฎญ็ปƒๆจกๅž‹ๅ’ŒChatๆจกๅž‹ๆ›ดๆ–ฐๅˆฐๆ•ˆๆžœๆ›ดไผ˜็š„็‰ˆๆœฌใ€‚ๆœฌไป“ๅบ“ไธบQwen-7B-Chat็š„ไป“ๅบ“ใ€‚

ๅฆ‚ๆžœๆ‚จๆƒณไบ†่งฃๆ›ดๅคšๅ…ณไบŽ้€šไน‰ๅƒ้—ฎ-7Bๅผ€ๆบๆจกๅž‹็š„็ป†่Š‚๏ผŒๆˆ‘ไปฌๅปบ่ฎฎๆ‚จๅ‚้˜…GitHubไปฃ็ ๅบ“ใ€‚

Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. Now we have updated both our pretrained and chat models with better performances. This repository is the one for Qwen-7B-Chat.

For more details about Qwen, please refer to the GitHub code repository.

่ฆๆฑ‚๏ผˆRequirements๏ผ‰

  • python 3.8ๅŠไปฅไธŠ็‰ˆๆœฌ
  • pytorch 1.12ๅŠไปฅไธŠ็‰ˆๆœฌ๏ผŒๆŽจ่2.0ๅŠไปฅไธŠ็‰ˆๆœฌ
  • ๅปบ่ฎฎไฝฟ็”จCUDA 11.4ๅŠไปฅไธŠ๏ผˆGPU็”จๆˆทใ€flash-attention็”จๆˆท็ญ‰้œ€่€ƒ่™‘ๆญค้€‰้กน๏ผ‰
  • python 3.8 and above
  • pytorch 1.12 and above, 2.0 and above are recommended
  • CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

ไพ่ต–้กน๏ผˆDependency๏ผ‰

่ฟ่กŒQwen-7B-Chat๏ผŒ่ฏท็กฎไฟๆปก่ถณไธŠ่ฟฐ่ฆๆฑ‚๏ผŒๅ†ๆ‰ง่กŒไปฅไธ‹pipๅ‘ฝไปคๅฎ‰่ฃ…ไพ่ต–ๅบ“

To run Qwen-7B-Chat, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.

pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

ๅฆๅค–๏ผŒๆŽจ่ๅฎ‰่ฃ…flash-attentionๅบ“๏ผˆๅฝ“ๅ‰ๅทฒๆ”ฏๆŒflash attention 2๏ผ‰๏ผŒไปฅๅฎž็Žฐๆ›ด้ซ˜็š„ๆ•ˆ็Ž‡ๅ’Œๆ›ดไฝŽ็š„ๆ˜พๅญ˜ๅ ็”จใ€‚

In addition, it is recommended to install the flash-attention library (we support flash attention 2 now.) for higher efficiency and lower memory usage.

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# ไธ‹ๆ–นๅฎ‰่ฃ…ๅฏ้€‰๏ผŒๅฎ‰่ฃ…ๅฏ่ƒฝๆฏ”่พƒ็ผ“ๆ…ขใ€‚
# pip install csrc/layer_norm
# pip install csrc/rotary

ๅฟซ้€Ÿไฝฟ็”จ๏ผˆQuickstart๏ผ‰

ไธ‹้ขๆˆ‘ไปฌๅฑ•็คบไบ†ไธ€ไธชไฝฟ็”จQwen-7B-Chatๆจกๅž‹๏ผŒ่ฟ›่กŒๅคš่ฝฎๅฏน่ฏไบคไบ’็š„ๆ ทไพ‹๏ผš

We show an example of multi-turn interaction with Qwen-7B-Chat in the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
# model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # ๅฏๆŒ‡ๅฎšไธๅŒ็š„็”Ÿๆˆ้•ฟๅบฆใ€top_p็ญ‰็›ธๅ…ณ่ถ…ๅ‚

# ็ฌฌไธ€่ฝฎๅฏน่ฏ 1st dialogue turn
response, history = model.chat(tokenizer, "ไฝ ๅฅฝ", history=None)
print(response)
# ไฝ ๅฅฝ๏ผๅพˆ้ซ˜ๅ…ดไธบไฝ ๆไพ›ๅธฎๅŠฉใ€‚

# ็ฌฌไบŒ่ฝฎๅฏน่ฏ 2nd dialogue turn
response, history = model.chat(tokenizer, "็ป™ๆˆ‘่ฎฒไธ€ไธชๅนด่ฝปไบบๅฅ‹ๆ–—ๅˆ›ไธšๆœ€็ปˆๅ–ๅพ—ๆˆๅŠŸ็š„ๆ•…ไบ‹ใ€‚", history=history)
print(response)
# ่ฟ™ๆ˜ฏไธ€ไธชๅ…ณไบŽไธ€ไธชๅนด่ฝปไบบๅฅ‹ๆ–—ๅˆ›ไธšๆœ€็ปˆๅ–ๅพ—ๆˆๅŠŸ็š„ๆ•…ไบ‹ใ€‚
# ๆ•…ไบ‹็š„ไธปไบบๅ…ฌๅซๆŽๆ˜Ž๏ผŒไป–ๆฅ่‡ชไธ€ไธชๆ™ฎ้€š็š„ๅฎถๅบญ๏ผŒ็ˆถๆฏ้ƒฝๆ˜ฏๆ™ฎ้€š็š„ๅทฅไบบใ€‚ไปŽๅฐ๏ผŒๆŽๆ˜Žๅฐฑ็ซ‹ไธ‹ไบ†ไธ€ไธช็›ฎๆ ‡๏ผš่ฆๆˆไธบไธ€ๅๆˆๅŠŸ็š„ไผไธšๅฎถใ€‚
# ไธบไบ†ๅฎž็Žฐ่ฟ™ไธช็›ฎๆ ‡๏ผŒๆŽๆ˜Žๅ‹คๅฅ‹ๅญฆไน ๏ผŒ่€ƒไธŠไบ†ๅคงๅญฆใ€‚ๅœจๅคงๅญฆๆœŸ้—ด๏ผŒไป–็งฏๆžๅ‚ๅŠ ๅ„็งๅˆ›ไธšๆฏ”่ต›๏ผŒ่Žทๅพ—ไบ†ไธๅฐ‘ๅฅ–้กนใ€‚ไป–่ฟ˜ๅˆฉ็”จ่ฏพไฝ™ๆ—ถ้—ดๅŽปๅฎžไน ๏ผŒ็งฏ็ดฏไบ†ๅฎ่ดต็š„็ป้ชŒใ€‚
# ๆฏ•ไธšๅŽ๏ผŒๆŽๆ˜Žๅ†ณๅฎšๅผ€ๅง‹่‡ชๅทฑ็š„ๅˆ›ไธšไน‹่ทฏใ€‚ไป–ๅผ€ๅง‹ๅฏปๆ‰พๆŠ•่ต„ๆœบไผš๏ผŒไฝ†ๅคšๆฌก้ƒฝ่ขซๆ‹’็ปไบ†ใ€‚็„ถ่€Œ๏ผŒไป–ๅนถๆฒกๆœ‰ๆ”พๅผƒใ€‚ไป–็ปง็ปญๅŠชๅŠ›๏ผŒไธๆ–ญๆ”น่ฟ›่‡ชๅทฑ็š„ๅˆ›ไธš่ฎกๅˆ’๏ผŒๅนถๅฏปๆ‰พๆ–ฐ็š„ๆŠ•่ต„ๆœบไผšใ€‚
# ๆœ€็ปˆ๏ผŒๆŽๆ˜ŽๆˆๅŠŸๅœฐ่Žทๅพ—ไบ†ไธ€็ฌ”ๆŠ•่ต„๏ผŒๅผ€ๅง‹ไบ†่‡ชๅทฑ็š„ๅˆ›ไธšไน‹่ทฏใ€‚ไป–ๆˆ็ซ‹ไบ†ไธ€ๅฎถ็ง‘ๆŠ€ๅ…ฌๅธ๏ผŒไธ“ๆณจไบŽๅผ€ๅ‘ๆ–ฐๅž‹่ฝฏไปถใ€‚ๅœจไป–็š„้ข†ๅฏผไธ‹๏ผŒๅ…ฌๅธ่ฟ…้€Ÿๅ‘ๅฑ•่ตทๆฅ๏ผŒๆˆไธบไบ†ไธ€ๅฎถๆˆๅŠŸ็š„็ง‘ๆŠ€ไผไธšใ€‚
# ๆŽๆ˜Ž็š„ๆˆๅŠŸๅนถไธๆ˜ฏๅถ็„ถ็š„ใ€‚ไป–ๅ‹คๅฅ‹ใ€ๅš้Ÿงใ€ๅ‹‡ไบŽๅ†’้™ฉ๏ผŒไธๆ–ญๅญฆไน ๅ’Œๆ”น่ฟ›่‡ชๅทฑใ€‚ไป–็š„ๆˆๅŠŸไนŸ่ฏๆ˜Žไบ†๏ผŒๅช่ฆๅŠชๅŠ›ๅฅ‹ๆ–—๏ผŒไปปไฝ•ไบบ้ƒฝๆœ‰ๅฏ่ƒฝๅ–ๅพ—ๆˆๅŠŸใ€‚

# ็ฌฌไธ‰่ฝฎๅฏน่ฏ 3rd dialogue turn
response, history = model.chat(tokenizer, "็ป™่ฟ™ไธชๆ•…ไบ‹่ตทไธ€ไธชๆ ‡้ข˜", history=history)
print(response)
# ใ€Šๅฅ‹ๆ–—ๅˆ›ไธš๏ผšไธ€ไธชๅนด่ฝปไบบ็š„ๆˆๅŠŸไน‹่ทฏใ€‹

ๅ…ณไบŽๆ›ดๅคš็š„ไฝฟ็”จ่ฏดๆ˜Ž๏ผŒ่ฏทๅ‚่€ƒๆˆ‘ไปฌ็š„GitHub repo่Žทๅ–ๆ›ดๅคšไฟกๆฏใ€‚

For more information, please refer to our GitHub repo for more information.

Tokenizer

ๆณจ๏ผšไฝœไธบๆœฏ่ฏญ็š„โ€œtokenizationโ€ๅœจไธญๆ–‡ไธญๅฐšๆ— ๅ…ฑ่ฏ†็š„ๆฆ‚ๅฟตๅฏนๅบ”๏ผŒๆœฌๆ–‡ๆกฃ้‡‡็”จ่‹ฑๆ–‡่กจ่พพไปฅๅˆฉ่ฏดๆ˜Žใ€‚

ๅŸบไบŽtiktoken็š„ๅˆ†่ฏๅ™จๆœ‰ๅˆซไบŽๅ…ถไป–ๅˆ†่ฏๅ™จ๏ผŒๆฏ”ๅฆ‚sentencepieceๅˆ†่ฏๅ™จใ€‚ๅฐคๅ…ถๅœจๅพฎ่ฐƒ้˜ถๆฎต๏ผŒ้œ€่ฆ็‰นๅˆซๆณจๆ„็‰นๆฎŠtoken็š„ไฝฟ็”จใ€‚ๅ…ณไบŽtokenizer็š„ๆ›ดๅคšไฟกๆฏ๏ผŒไปฅๅŠๅพฎ่ฐƒๆ—ถๆถ‰ๅŠ็š„็›ธๅ…ณไฝฟ็”จ๏ผŒ่ฏทๅ‚้˜…ๆ–‡ๆกฃใ€‚

Our tokenizer based on tiktoken is different from other tokenizers, e.g., sentencepiece tokenizer. You need to pay attention to special tokens, especially in finetuning. For more detailed information on the tokenizer and related use in fine-tuning, please refer to the documentation.

้‡ๅŒ– (Quantization)

็”จๆณ• (Usage)

่ฏทๆณจๆ„๏ผšๆˆ‘ไปฌๆ›ดๆ–ฐ้‡ๅŒ–ๆ–นๆกˆไธบๅŸบไบŽAutoGPTQ็š„้‡ๅŒ–๏ผŒๆไพ›Qwen-7B-Chat็š„Int4้‡ๅŒ–ๆจกๅž‹็‚นๅ‡ป่ฟ™้‡Œใ€‚็›ธๆฏ”ๆญคๅ‰ๆ–นๆกˆ๏ผŒ่ฏฅๆ–นๆกˆๅœจๆจกๅž‹่ฏ„ๆต‹ๆ•ˆๆžœๅ‡ ไนŽๆ— ๆŸ๏ผŒไธ”ๅญ˜ๅ‚จ้œ€ๆฑ‚ๆ›ดไฝŽ๏ผŒๆŽจ็†้€Ÿๅบฆๆ›ดไผ˜ใ€‚

Note: we provide a new solution based on AutoGPTQ, and release an Int4 quantized model for Qwen-7B-Chat Click here, which achieves nearly lossless model effects but improved performance on both memory costs and inference speed, in comparison with the previous solution.

ไปฅไธ‹ๆˆ‘ไปฌๆไพ›็คบไพ‹่ฏดๆ˜Žๅฆ‚ไฝ•ไฝฟ็”จInt4้‡ๅŒ–ๆจกๅž‹ใ€‚ๅœจๅผ€ๅง‹ไฝฟ็”จๅ‰๏ผŒ่ฏทๅ…ˆไฟ่ฏๆปก่ถณ่ฆๆฑ‚๏ผˆๅฆ‚torch 2.0ๅŠไปฅไธŠ๏ผŒtransformers็‰ˆๆœฌไธบ4.32.0ๅŠไปฅไธŠ๏ผŒ็ญ‰็ญ‰๏ผ‰๏ผŒๅนถๅฎ‰่ฃ…ๆ‰€้œ€ๅฎ‰่ฃ…ๅŒ…๏ผš

Here we demonstrate how to use our provided quantized models for inference. Before you start, make sure you meet the requirements of auto-gptq (e.g., torch 2.0 and above, transformers 4.32.0 and above, etc.) and install the required packages:

pip install auto-gptq optimum

ๅฆ‚ๅฎ‰่ฃ…auto-gptq้‡ๅˆฐ้—ฎ้ข˜๏ผŒๆˆ‘ไปฌๅปบ่ฎฎๆ‚จๅˆฐๅฎ˜ๆ–นrepoๆœ็ดขๅˆ้€‚็š„้ข„็ผ–่ฏ‘wheelใ€‚

้šๅŽๅณๅฏไฝฟ็”จๅ’ŒไธŠ่ฟฐไธ€่‡ด็š„็”จๆณ•่ฐƒ็”จ้‡ๅŒ–ๆจกๅž‹๏ผš

If you meet problems installing auto-gptq, we advise you to check out the official repo to find a pre-build wheel.

Then you can load the quantized model easily and run inference as same as usual:

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-7B-Chat-Int4",
    device_map="auto",
    trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "ไฝ ๅฅฝ", history=None)

ๆ•ˆๆžœ่ฏ„ๆต‹

ๆˆ‘ไปฌๅฏนBF16๏ผŒInt8ๅ’ŒInt4ๆจกๅž‹ๅœจๅŸบๅ‡†่ฏ„ๆต‹ไธŠๅšไบ†ๆต‹่ฏ•๏ผˆไฝฟ็”จzero-shot่ฎพ็ฝฎ๏ผ‰๏ผŒๅ‘็Žฐ้‡ๅŒ–ๆจกๅž‹ๆ•ˆๆžœๆŸๅคฑ่พƒๅฐ๏ผŒ็ป“ๆžœๅฆ‚ไธ‹ๆ‰€็คบ๏ผš

We illustrate the zero-shot performance of both BF16, Int8 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:

Quantization MMLU CEval (val) GSM8K Humaneval
BF16 55.8 59.7 50.3 37.2
Int8 55.4 59.4 48.3 34.8
Int4 55.1 59.2 49.7 29.9

ๆŽจ็†้€Ÿๅบฆ (Inference Speed)

ๆˆ‘ไปฌๆต‹็ฎ—ไบ†ไธๅŒ็ฒพๅบฆๆจกๅž‹ไปฅๅŠไธๅŒFlashAttnๅบ“็‰ˆๆœฌไธ‹ๆจกๅž‹็”Ÿๆˆ2048ๅ’Œ8192ไธชtoken็š„ๅนณๅ‡ๆŽจ็†้€Ÿๅบฆใ€‚ๅฆ‚ๅ›พๆ‰€็คบ๏ผš

We measured the average inference speed of generating 2048 and 8192 tokens with different quantization levels and versions of flash-attention, respectively.

Quantization FlashAttn Speed (2048 tokens) Speed (8192 tokens)
BF16 v2 40.93 36.14
Int8 v2 37.47 32.54
Int4 v2 50.09 38.61
BF16 v1 40.75 35.34
Int8 v1 37.51 32.39
Int4 v1 45.98 36.47
BF16 Disabled 37.55 33.56
Int8 Disabled 37.84 32.65
Int4 Disabled 48.12 36.70

ๅ…ทไฝ“่€Œ่จ€๏ผŒๆˆ‘ไปฌ่ฎฐๅฝ•ๅœจ้•ฟๅบฆไธบ1็š„ไธŠไธ‹ๆ–‡็š„ๆกไปถไธ‹็”Ÿๆˆ8192ไธชtoken็š„ๆ€ง่ƒฝใ€‚่ฏ„ๆต‹่ฟ่กŒไบŽๅ•ๅผ A100-SXM4-80G GPU๏ผŒไฝฟ็”จPyTorch 2.0.1ๅ’ŒCUDA 11.8ใ€‚ๆŽจ็†้€Ÿๅบฆๆ˜ฏ็”Ÿๆˆ8192ไธชtoken็š„้€Ÿๅบฆๅ‡ๅ€ผใ€‚

In detail, the setting of profiling is generating 8192 new tokens with 1 context token. The profiling runs on a single A100-SXM4-80G GPU with PyTorch 2.0.1 and CUDA 11.8. The inference speed is averaged over the generated 8192 tokens.

ๆณจๆ„๏ผšไปฅไธŠInt4/Int8ๆจกๅž‹็”Ÿๆˆ้€Ÿๅบฆไฝฟ็”จautogptqๅบ“็ป™ๅ‡บ๏ผŒๅฝ“ๅ‰AutoModelForCausalLM.from_pretrained่ฝฝๅ…ฅ็š„ๆจกๅž‹็”Ÿๆˆ้€Ÿๅบฆไผšๆ…ขๅคง็บฆ20%ใ€‚ๆˆ‘ไปฌๅทฒ็ปๅฐ†่ฏฅ้—ฎ้ข˜ๆฑ‡ๆŠฅ็ป™HuggingFaceๅ›ข้˜Ÿ๏ผŒ่‹ฅๆœ‰่งฃๅ†ณๆ–นๆกˆๅฐ†ๅณๆ—ถๆ›ดๆ–ฐใ€‚

Note: The generation speed of the Int4/Int8 models mentioned above is provided by the autogptq library. The current speed of the model loaded using "AutoModelForCausalLM.from_pretrained" will be approximately 20% slower. We have reported this issue to the HuggingFace team and will update it promptly if a solution is available.

ๆ˜พๅญ˜ไฝฟ็”จ (GPU Memory Usage)

ๆˆ‘ไปฌ่ฟ˜ๆต‹็ฎ—ไบ†ไธๅŒๆจกๅž‹็ฒพๅบฆ็ผ–็ 2048ไธชtokenๅŠ็”Ÿๆˆ8192ไธชtoken็š„ๅณฐๅ€ผๆ˜พๅญ˜ๅ ็”จๆƒ…ๅ†ตใ€‚๏ผˆๆ˜พๅญ˜ๆถˆ่€—ๅœจๆ˜ฏๅฆไฝฟ็”จFlashAttn็š„ๆƒ…ๅ†ตไธ‹ๅ‡็ฑปไผผใ€‚๏ผ‰็ป“ๆžœๅฆ‚ไธ‹ๆ‰€็คบ๏ผš

We also profile the peak GPU memory usage for encoding 2048 tokens as context (and generating single token) and generating 8192 tokens (with single token as context) under different quantization levels, respectively. ๏ผˆThe GPU memory usage is similar when using flash-attention or not.๏ผ‰The results are shown below.

Quantization Level Peak Usage for Encoding 2048 Tokens Peak Usage for Generating 8192 Tokens
BF16 16.99GB 22.53GB
Int8 11.20GB 16.62GB
Int4 8.21GB 13.63GB

ไธŠ่ฟฐๆ€ง่ƒฝๆต‹็ฎ—ไฝฟ็”จๆญค่„šๆœฌๅฎŒๆˆใ€‚

The above speed and memory profiling are conducted using this script.

ๆจกๅž‹็ป†่Š‚๏ผˆModel๏ผ‰

ไธŽQwen-7B้ข„่ฎญ็ปƒๆจกๅž‹็›ธๅŒ๏ผŒQwen-7B-Chatๆจกๅž‹่ง„ๆจกๅŸบๆœฌๆƒ…ๅ†ตๅฆ‚ไธ‹ๆ‰€็คบ:

The details of the model architecture of Qwen-7B-Chat are listed as follows:

Hyperparameter Value
n_layers 32
n_heads 32
d_model 4096
vocab size 151851
sequence length 8192

ๅœจไฝ็ฝฎ็ผ–็ ใ€FFNๆฟ€ๆดปๅ‡ฝๆ•ฐๅ’Œnormalization็š„ๅฎž็Žฐๆ–นๅผไธŠ๏ผŒๆˆ‘ไปฌไนŸ้‡‡็”จไบ†็›ฎๅ‰ๆœ€ๆต่กŒ็š„ๅšๆณ•๏ผŒ ๅณRoPE็›ธๅฏนไฝ็ฝฎ็ผ–็ ใ€SwiGLUๆฟ€ๆดปๅ‡ฝๆ•ฐใ€RMSNorm๏ผˆๅฏ้€‰ๅฎ‰่ฃ…flash-attentionๅŠ ้€Ÿ๏ผ‰ใ€‚

ๅœจๅˆ†่ฏๅ™จๆ–น้ข๏ผŒ็›ธๆฏ”็›ฎๅ‰ไธปๆตๅผ€ๆบๆจกๅž‹ไปฅไธญ่‹ฑ่ฏ่กจไธบไธป๏ผŒQwen-7B-Chatไฝฟ็”จไบ†็บฆ15ไธ‡tokenๅคงๅฐ็š„่ฏ่กจใ€‚ ่ฏฅ่ฏ่กจๅœจGPT-4ไฝฟ็”จ็š„BPE่ฏ่กจcl100k_baseๅŸบ็ก€ไธŠ๏ผŒๅฏนไธญๆ–‡ใ€ๅคš่ฏญ่จ€่ฟ›่กŒไบ†ไผ˜ๅŒ–๏ผŒๅœจๅฏนไธญใ€่‹ฑใ€ไปฃ็ ๆ•ฐๆฎ็š„้ซ˜ๆ•ˆ็ผ–่งฃ็ ็š„ๅŸบ็ก€ไธŠ๏ผŒๅฏน้ƒจๅˆ†ๅคš่ฏญ่จ€ๆ›ดๅŠ ๅ‹ๅฅฝ๏ผŒๆ–นไพฟ็”จๆˆทๅœจไธๆ‰ฉๅฑ•่ฏ่กจ็š„ๆƒ…ๅ†ตไธ‹ๅฏน้ƒจๅˆ†่ฏญ็ง่ฟ›่กŒ่ƒฝๅŠ›ๅขžๅผบใ€‚ ่ฏ่กจๅฏนๆ•ฐๅญ—ๆŒ‰ๅ•ไธชๆ•ฐๅญ—ไฝๅˆ‡ๅˆ†ใ€‚่ฐƒ็”จ่พƒไธบ้ซ˜ๆ•ˆ็š„tiktokenๅˆ†่ฏๅบ“่ฟ›่กŒๅˆ†่ฏใ€‚

For position encoding, FFN activation function, and normalization calculation methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).

For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B-Chat uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.

่ฏ„ๆต‹ๆ•ˆๆžœ๏ผˆEvaluation๏ผ‰

ๅฏนไบŽQwen-7B-Chatๆจกๅž‹๏ผŒๆˆ‘ไปฌๅŒๆ ท่ฏ„ๆต‹ไบ†ๅธธ่ง„็š„ไธญๆ–‡็†่งฃ๏ผˆC-Eval๏ผ‰ใ€่‹ฑๆ–‡็†่งฃ๏ผˆMMLU๏ผ‰ใ€ไปฃ็ ๏ผˆHumanEval๏ผ‰ๅ’Œๆ•ฐๅญฆ๏ผˆGSM8K๏ผ‰็ญ‰ๆƒๅจไปปๅŠก๏ผŒๅŒๆ—ถๅŒ…ๅซไบ†้•ฟๅบๅˆ—ไปปๅŠก็š„่ฏ„ๆต‹็ป“ๆžœใ€‚็”ฑไบŽQwen-7B-Chatๆจกๅž‹็ป่ฟ‡ๅฏน้ฝๅŽ๏ผŒๆฟ€ๅ‘ไบ†่พƒๅผบ็š„ๅค–้ƒจ็ณป็ปŸ่ฐƒ็”จ่ƒฝๅŠ›๏ผŒๆˆ‘ไปฌ่ฟ˜่ฟ›่กŒไบ†ๅทฅๅ…ทไฝฟ็”จ่ƒฝๅŠ›ๆ–น้ข็š„่ฏ„ๆต‹ใ€‚

ๆ็คบ๏ผš็”ฑไบŽ็กฌไปถๅ’Œๆก†ๆžถ้€ ๆˆ็š„่ˆๅ…ฅ่ฏฏๅทฎ๏ผŒๅค็Žฐ็ป“ๆžœๅฆ‚ๆœ‰ๆณขๅŠจๅฑžไบŽๆญฃๅธธ็Žฐ่ฑกใ€‚

For Qwen-7B-Chat, we also evaluate the model on C-Eval, MMLU, HumanEval, GSM8K, etc., as well as the benchmark evaluation for long-context understanding, and tool usage.

Note: Due to rounding errors caused by hardware and framework, differences in reproduced results are possible.

ไธญๆ–‡่ฏ„ๆต‹๏ผˆChinese Evaluation๏ผ‰

C-Eval

ๅœจC-Eval้ชŒ่ฏ้›†ไธŠ๏ผŒๆˆ‘ไปฌ่ฏ„ไปทไบ†Qwen-7B-Chatๆจกๅž‹็š„0-shot & 5-shotๅ‡†็กฎ็Ž‡

We demonstrate the 0-shot & 5-shot accuracy of Qwen-7B-Chat on C-Eval validation set

Model Avg. Acc.
LLaMA2-7B-Chat 31.9
LLaMA2-13B-Chat 36.2
LLaMA2-70B-Chat 44.3
ChatGLM2-6B-Chat 52.6
InternLM-7B-Chat 53.6
Baichuan2-7B-Chat 55.6
Baichuan2-13B-Chat 56.7
Qwen-7B-Chat (original) (0-shot) 54.2
Qwen-7B-Chat (0-shot) 59.7
Qwen-7B-Chat (5-shot) 59.3
Qwen-14B-Chat (0-shot) 69.8
Qwen-14B-Chat (5-shot) 71.7

C-Evalๆต‹่ฏ•้›†ไธŠ๏ผŒQwen-7B-Chatๆจกๅž‹็š„zero-shotๅ‡†็กฎ็Ž‡็ป“ๆžœๅฆ‚ไธ‹๏ผš

The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:

Model Avg. STEM Social Sciences Humanities Others
Chinese-Alpaca-Plus-13B 41.5 36.6 49.7 43.1 41.2
Chinese-Alpaca-2-7B 40.3 - - - -
ChatGLM2-6B-Chat 50.1 46.4 60.4 50.6 46.9
Baichuan-13B-Chat 51.5 43.7 64.6 56.2 49.2
Qwen-7B-Chat (original) 54.6 47.8 67.6 59.3 50.6
Qwen-7B-Chat 58.6 53.3 72.1 62.8 52.0
Qwen-14B-Chat 69.1 65.1 80.9 71.2 63.4

ๅœจ7B่ง„ๆจกๆจกๅž‹ไธŠ๏ผŒ็ป่ฟ‡ไบบ็ฑปๆŒ‡ไปคๅฏน้ฝ็š„Qwen-7B-Chatๆจกๅž‹๏ผŒๅ‡†็กฎ็Ž‡ๅœจๅŒ็ฑป็›ธ่ฟ‘่ง„ๆจกๆจกๅž‹ไธญไป็„ถๅค„ไบŽๅ‰ๅˆ—ใ€‚

Compared with other pretrained models with comparable model size, the human-aligned Qwen-7B-Chat performs well in C-Eval accuracy.

่‹ฑๆ–‡่ฏ„ๆต‹๏ผˆEnglish Evaluation๏ผ‰

MMLU

MMLU่ฏ„ๆต‹้›†ไธŠ๏ผŒQwen-7B-Chatๆจกๅž‹็š„ 0-shot & 5-shot ๅ‡†็กฎ็Ž‡ๅฆ‚ไธ‹๏ผŒๆ•ˆๆžœๅŒๆ ทๅœจๅŒ็ฑปๅฏน้ฝๆจกๅž‹ไธญๅŒๆ ท่กจ็Žฐ่พƒไผ˜ใ€‚

The 0-shot & 5-shot accuracy of Qwen-7B-Chat on MMLU is provided below. The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.

Model Avg. Acc.
ChatGLM2-6B-Chat 46.0
LLaMA2-7B-Chat 46.2
InternLM-7B-Chat 51.1
Baichuan2-7B-Chat 52.9
LLaMA2-13B-Chat 54.6
Baichuan2-13B-Chat 57.3
LLaMA2-70B-Chat 63.8
Qwen-7B-Chat (original) (0-shot) 53.9
Qwen-7B-Chat (0-shot) 55.8
Qwen-7B-Chat (5-shot) 57.0
Qwen-14B-Chat (0-shot) 64.6
Qwen-14B-Chat (5-shot) 66.5

ไปฃ็ ่ฏ„ๆต‹๏ผˆCoding Evaluation๏ผ‰

Qwen-7B-ChatๅœจHumanEval็š„zero-shot Pass@1ๆ•ˆๆžœๅฆ‚ไธ‹

The zero-shot Pass@1 of Qwen-7B-Chat on HumanEval is demonstrated below

Model Pass@1
ChatGLM2-6B-Chat 11.0
LLaMA2-7B-Chat 12.2
Baichuan2-7B-Chat 13.4
InternLM-7B-Chat 14.6
Baichuan2-13B-Chat 17.7
LLaMA2-13B-Chat 18.9
LLaMA2-70B-Chat 32.3
Qwen-7B-Chat (original) 24.4
Qwen-7B-Chat 37.2
Qwen-14B-Chat 43.9

ๆ•ฐๅญฆ่ฏ„ๆต‹๏ผˆMathematics Evaluation๏ผ‰

ๅœจ่ฏ„ๆต‹ๆ•ฐๅญฆ่ƒฝๅŠ›็š„GSM8KไธŠ๏ผŒQwen-7B-Chat็š„ๅ‡†็กฎ็Ž‡็ป“ๆžœๅฆ‚ไธ‹

The accuracy of Qwen-7B-Chat on GSM8K is shown below

Model Acc.
LLaMA2-7B-Chat 26.3
ChatGLM2-6B-Chat 28.8
Baichuan2-7B-Chat 32.8
InternLM-7B-Chat 33.0
LLaMA2-13B-Chat 37.1
Baichuan2-13B-Chat 55.3
LLaMA2-70B-Chat 59.3
Qwen-7B-Chat (original) (0-shot) 41.1
Qwen-7B-Chat (0-shot) 50.3
Qwen-7B-Chat (8-shot) 54.1
Qwen-14B-Chat (0-shot) 60.1
Qwen-14B-Chat (8-shot) 59.3

้•ฟๅบๅˆ—่ฏ„ๆต‹๏ผˆLong-Context Understanding๏ผ‰

้€š่ฟ‡NTKๆ’ๅ€ผ๏ผŒLogNๆณจๆ„ๅŠ›็ผฉๆ”พๅฏไปฅๆ‰ฉๅฑ•Qwen-7B-Chat็š„ไธŠไธ‹ๆ–‡้•ฟๅบฆใ€‚ๅœจ้•ฟๆ–‡ๆœฌๆ‘˜่ฆๆ•ฐๆฎ้›†VCSUMไธŠ๏ผˆๆ–‡ๆœฌๅนณๅ‡้•ฟๅบฆๅœจ15Kๅทฆๅณ๏ผ‰๏ผŒQwen-7B-Chat็š„Rouge-L็ป“ๆžœๅฆ‚ไธ‹๏ผš

(่‹ฅ่ฆๅฏ็”จ่ฟ™ไบ›ๆŠ€ๅทง๏ผŒ่ฏทๅฐ†config.json้‡Œ็š„use_dynamic_ntkๅ’Œuse_logn_attn่ฎพ็ฝฎไธบtrue)

We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset VCSUM (The average length of this dataset is around 15K) are shown below:

(To use these tricks, please set use_dynamic_ntk and use_long_attn to true in config.json.)

Model VCSUM (zh)
GPT-3.5-Turbo-16k 16.0
LLama2-7B-Chat 0.2
InternLM-7B-Chat 13.0
ChatGLM2-6B-Chat 16.3
Qwen-7B-Chat 16.6

ๅทฅๅ…ทไฝฟ็”จ่ƒฝๅŠ›็š„่ฏ„ๆต‹๏ผˆTool Usage๏ผ‰

ReAct Prompting

ๅƒ้—ฎๆ”ฏๆŒ้€š่ฟ‡ ReAct Prompting ่ฐƒ็”จๆ’ไปถ/ๅทฅๅ…ท/APIใ€‚ReAct ไนŸๆ˜ฏ LangChain ๆก†ๆžถ้‡‡็”จ็š„ไธป่ฆๆ–นๅผไน‹ไธ€ใ€‚ๅœจๆˆ‘ไปฌๅผ€ๆบ็š„ใ€็”จไบŽ่ฏ„ไผฐๅทฅๅ…ทไฝฟ็”จ่ƒฝๅŠ›็š„่ฏ„ๆต‹ๅŸบๅ‡†ไธŠ๏ผŒๅƒ้—ฎ็š„่กจ็Žฐๅฆ‚ไธ‹๏ผš

Qwen-Chat supports calling plugins/tools/APIs through ReAct Prompting. ReAct is also one of the main approaches used by the LangChain framework. In our evaluation benchmark for assessing tool usage capabilities, Qwen-Chat's performance is as follows:

Chinese Tool-Use Benchmark
ModelTool Selection (Acc.โ†‘)Tool Input (Rouge-Lโ†‘)False Positive Errorโ†“
GPT-495%0.9015.0%
GPT-3.585%0.8875.0%
Qwen-7B-Chat98%0.917.3%
Qwen-14B-Chat98%0.932.4%

่ฏ„ๆต‹ๅŸบๅ‡†ไธญๅ‡บ็Žฐ็š„ๆ’ไปถๅ‡ๆฒกๆœ‰ๅ‡บ็Žฐๅœจๅƒ้—ฎ็š„่ฎญ็ปƒ้›†ไธญใ€‚่ฏฅๅŸบๅ‡†่ฏ„ไผฐไบ†ๆจกๅž‹ๅœจๅคšไธชๅ€™้€‰ๆ’ไปถไธญ้€‰ๆ‹ฉๆญฃ็กฎๆ’ไปถ็š„ๅ‡†็กฎ็Ž‡ใ€ไผ ๅ…ฅๆ’ไปถ็š„ๅ‚ๆ•ฐ็š„ๅˆ็†ๆ€งใ€ไปฅๅŠๅ‡้˜ณ็Ž‡ใ€‚ๅ‡้˜ณ็Ž‡๏ผˆFalse Positive๏ผ‰ๅฎšไน‰๏ผšๅœจๅค„็†ไธ่ฏฅ่ฐƒ็”จๆ’ไปถ็š„่ฏทๆฑ‚ๆ—ถ๏ผŒ้”™่ฏฏๅœฐ่ฐƒ็”จไบ†ๆ’ไปถใ€‚

The plugins that appear in the evaluation set do not appear in the training set of Qwen. This benchmark evaluates the accuracy of the model in selecting the correct plugin from multiple candidate plugins, the rationality of the parameters passed into the plugin, and the false positive rate. False Positive: Incorrectly invoking a plugin when it should not have been called when responding to a query.

Code Interpreter

ไธบไบ†่€ƒๅฏŸQwenไฝฟ็”จPython Code InterpreterๅฎŒๆˆๆ•ฐๅญฆ่งฃ้ข˜ใ€ๆ•ฐๆฎๅฏ่ง†ๅŒ–ใ€ๅŠๆ–‡ไปถๅค„็†ไธŽ็ˆฌ่™ซ็ญ‰ไปปๅŠก็š„่ƒฝๅŠ›๏ผŒๆˆ‘ไปฌไธ“้—จๅปบ่ฎพๅนถๅผ€ๆบไบ†ไธ€ไธช่ฏ„ๆต‹่ฟ™ๆ–น้ข่ƒฝๅŠ›็š„่ฏ„ๆต‹ๅŸบๅ‡†ใ€‚

ๆˆ‘ไปฌๅ‘็ŽฐQwenๅœจ็”Ÿๆˆไปฃ็ ็š„ๅฏๆ‰ง่กŒ็Ž‡ใ€็ป“ๆžœๆญฃ็กฎๆ€งไธŠๅ‡่กจ็Žฐ่พƒๅฅฝ๏ผš

To assess Qwen's ability to use the Python Code Interpreter for tasks such as mathematical problem solving, data visualization, and other general-purpose tasks such as file handling and web scraping, we have created and open-sourced a benchmark specifically designed for evaluating these capabilities. You can find the benchmark at this link.

We have observed that Qwen performs well in terms of code executability and result accuracy when generating code:

Executable Rate of Generated Code (%)
ModelMathโ†‘Visualizationโ†‘Generalโ†‘
GPT-491.985.982.8
GPT-3.589.265.074.1
LLaMA2-7B-Chat 41.9 33.1 24.1
LLaMA2-13B-Chat 50.0 40.5 48.3
CodeLLaMA-7B-Instruct 85.1 54.0 70.7
CodeLLaMA-13B-Instruct 93.2 55.8 74.1
InternLM-7B-Chat-v1.1 78.4 44.2 62.1
InternLM-20B-Chat 70.3 44.2 65.5
Qwen-7B-Chat 82.4 64.4 67.2
Qwen-14B-Chat 89.2 84.1 65.5
Accuracy of Code Execution Results (%)
ModelMathโ†‘Visualization-Hardโ†‘Visualization-Easyโ†‘
GPT-482.866.760.8
GPT-3.547.333.355.7
LLaMA2-7B-Chat 3.9 14.3 39.2
LLaMA2-13B-Chat 8.3 8.3 40.5
CodeLLaMA-7B-Instruct 14.3 26.2 60.8
CodeLLaMA-13B-Instruct 28.2 27.4 62.0
InternLM-7B-Chat-v1.1 28.5 4.8 40.5
InternLM-20B-Chat 34.6 21.4 45.6
Qwen-7B-Chat 41.9 40.5 54.4
Qwen-14B-Chat 58.4 53.6 59.5



Huggingface Agent

ๅƒ้—ฎ่ฟ˜ๅ…ทๅค‡ไฝœไธบ HuggingFace Agent ็š„่ƒฝๅŠ›ใ€‚ๅฎƒๅœจ Huggingface ๆไพ›็š„runๆจกๅผ่ฏ„ๆต‹ๅŸบๅ‡†ไธŠ็š„่กจ็Žฐๅฆ‚ไธ‹๏ผš

Qwen-Chat also has the capability to be used as a HuggingFace Agent. Its performance on the run-mode benchmark provided by HuggingFace is as follows:

HuggingFace Agent Benchmark- Run Mode
ModelTool Selectionโ†‘Tool Usedโ†‘Codeโ†‘
GPT-410010097.4
GPT-3.595.496.387.0
StarCoder-Base-15B86.187.068.9
StarCoder-15B87.088.068.9
Qwen-7B-Chat87.087.071.5
Qwen-14B-Chat93.594.487.0
HuggingFace Agent Benchmark - Chat Mode
ModelTool Selectionโ†‘Tool Usedโ†‘Codeโ†‘
GPT-497.997.998.5
GPT-3.597.396.889.6
StarCoder-Base-15B97.997.991.1
StarCoder-15B97.997.989.6
Qwen-7B-Chat94.794.785.1
Qwen-14B-Chat97.997.995.5

x86 ๅนณๅฐ (x86 Platforms)

ๅœจ ้…ท็ฟโ„ข/่‡ณๅผบยฎ ๅฏๆ‰ฉๅฑ•ๅค„็†ๅ™จๆˆ– Arcโ„ข GPU ไธŠ้ƒจ็ฝฒ้‡ๅŒ–ๆจกๅž‹ๆ—ถ๏ผŒๅปบ่ฎฎไฝฟ็”จ OpenVINOโ„ข Toolkitไปฅๅ……ๅˆ†ๅˆฉ็”จ็กฌไปถ๏ผŒๅฎž็Žฐๆ›ดๅฅฝ็š„ๆŽจ็†ๆ€ง่ƒฝใ€‚ๆ‚จๅฏไปฅๅฎ‰่ฃ…ๅนถ่ฟ่กŒๆญค example notebookใ€‚็›ธๅ…ณ้—ฎ้ข˜๏ผŒๆ‚จๅฏๅœจOpenVINO repoไธญๆไบคใ€‚

When deploy on Coreโ„ข/Xeonยฎ Scalable Processors or with Arcโ„ข GPU, OpenVINOโ„ข Toolkit is recommended. You can install and run this example notebook. For related issues, you are welcome to file an issue at OpenVINO repo.

FAQ

ๅฆ‚้‡ๅˆฐ้—ฎ้ข˜๏ผŒๆ•ฌ่ฏทๆŸฅ้˜…FAQไปฅๅŠissueๅŒบ๏ผŒๅฆ‚ไปๆ— ๆณ•่งฃๅ†ณๅ†ๆไบคissueใ€‚

If you meet problems, please refer to FAQ and the issues first to search a solution before you launch a new issue.

ๅผ•็”จ (Citation)

ๅฆ‚ๆžœไฝ ่ง‰ๅพ—ๆˆ‘ไปฌ็š„ๅทฅไฝœๅฏนไฝ ๆœ‰ๅธฎๅŠฉ๏ผŒๆฌข่ฟŽๅผ•็”จ๏ผ

If you find our work helpful, feel free to give us a cite.

@article{qwen,
  title={Qwen Technical Report},
  author={Jinze Bai and Shuai Bai and Yunfei Chu and Zeyu Cui and Kai Dang and Xiaodong Deng and Yang Fan and Wenbin Ge and Yu Han and Fei Huang and Binyuan Hui and Luo Ji and Mei Li and Junyang Lin and Runji Lin and Dayiheng Liu and Gao Liu and Chengqiang Lu and Keming Lu and Jianxin Ma and Rui Men and Xingzhang Ren and Xuancheng Ren and Chuanqi Tan and Sinan Tan and Jianhong Tu and Peng Wang and Shijie Wang and Wei Wang and Shengguang Wu and Benfeng Xu and Jin Xu and An Yang and Hao Yang and Jian Yang and Shusheng Yang and Yang Yao and Bowen Yu and Hongyi Yuan and Zheng Yuan and Jianwei Zhang and Xingxuan Zhang and Yichang Zhang and Zhenru Zhang and Chang Zhou and Jingren Zhou and Xiaohuan Zhou and Tianhang Zhu},
  journal={arXiv preprint arXiv:2309.16609},
  year={2023}
}

ไฝฟ็”จๅ่ฎฎ๏ผˆLicense Agreement๏ผ‰

ๆˆ‘ไปฌ็š„ไปฃ็ ๅ’Œๆจกๅž‹ๆƒ้‡ๅฏนๅญฆๆœฏ็ ”็ฉถๅฎŒๅ…จๅผ€ๆ”พ๏ผŒๅนถๆ”ฏๆŒๅ•†็”จใ€‚่ฏทๆŸฅ็œ‹LICENSEไบ†่งฃๅ…ทไฝ“็š„ๅผ€ๆบๅ่ฎฎ็ป†่Š‚ใ€‚ๅฆ‚้œ€ๅ•†็”จ๏ผŒ่ฏทๅกซๅ†™้—ฎๅท็”ณ่ฏทใ€‚

Our code and checkpoints are open to research purpose, and they are allowed for commercial purposes. Check LICENSE for more details about the license. If you have requirements for commercial use, please fill out the form to apply.

่”็ณปๆˆ‘ไปฌ๏ผˆContact Us๏ผ‰

ๅฆ‚ๆžœไฝ ๆƒณ็ป™ๆˆ‘ไปฌ็š„็ ”ๅ‘ๅ›ข้˜Ÿๅ’Œไบงๅ“ๅ›ข้˜Ÿ็•™่จ€๏ผŒๆฌข่ฟŽๅŠ ๅ…ฅๆˆ‘ไปฌ็š„ๅพฎไฟก็พคใ€้’‰้’‰็พคไปฅๅŠDiscord๏ผๅŒๆ—ถ๏ผŒไนŸๆฌข่ฟŽ้€š่ฟ‡้‚ฎไปถ๏ผˆ[email protected]๏ผ‰่”็ณปๆˆ‘ไปฌใ€‚

If you are interested to leave a message to either our research team or product team, join our Discord or WeChat groups! Also, feel free to send an email to [email protected].

Downloads last month
96,367
Safetensors
Model size
7.72B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for Qwen/Qwen-7B-Chat

Adapters
8 models
Finetunes
1 model
Quantizations
7 models

Spaces using Qwen/Qwen-7B-Chat 100

Collection including Qwen/Qwen-7B-Chat