Geralt-Targaryen
/

QwQ-Math-7B-Persona

Model card Files Files and versions Community

Geralt-Targaryen commited on Jan 21

Commit

9d0ad9a

·

verified ·

1 Parent(s): 8615ce3

Update README.md

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+---
+license: apache-2.0
+---
+# QwQ-Math-7B-Persona
+## Introduction
+QwQ-Math-7B-Persona is finetuned from Qwen2.5-Math-7B-Instruct on 1 million math persona data (see [this paper](https://arxiv.org/abs/2406.20094) for details about how to construct the data).
+Currently QwQ-Math-7B-Persona is meant to serve as a draft model for losslessly accelerating the inference of QwQ-32B, but you may also use it as a standalone model.
+## Quickstart
+Here is a code snippet for using QwQ-Math-7B-Persona to accelerate the inference of QwQ 32B:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/QwQ-32B-Preview",
+    torch_dtype="auto",
+    device_map={'': 0}
+)
+draft_model = AutoModelForCausalLM.from_pretrained(
+    "Geralt-Targaryen/QwQ-Math-7B-Persona",
+    torch_dtype="auto",
+    device_map={'': 0}
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B-Preview")
+prompt = "How many r in strawberry."
+messages = [
+    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512,
+    assistant_model=draft_model
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+For the more advanced SVIP draft length policy, please refer to [this GitHub repo](https://github.com/Geralt-Targaryen/SVIP).
+## Citation
+If you find QwQ-Math-1.5B-Persona to be helpful, please cite the following paper.
+```
+@misc{zhang2024svip,
+      title={Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding},
+      author={Ziyin Zhang and Jiahao Xu and Tian Liang and Xingyu Chen and Zhiwei He and Rui Wang and Zhaopeng Tu},
+      year={2024},
+      eprint={2411.18462},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2411.18462},
+}
+```