LeanQuant commited on
Commit
e20dde2
·
verified ·
1 Parent(s): d61ad4f

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +68 -0
  3. added_tokens.json +28 -0
  4. config.json +48 -0
  5. generation_config.json +13 -0
  6. lm_head.safetensors +3 -0
  7. merges.txt +0 -0
  8. model.safetensors +3 -0
  9. model_embed_tokens.safetensors +3 -0
  10. model_layers_0.safetensors +3 -0
  11. model_layers_1.safetensors +3 -0
  12. model_layers_10.safetensors +3 -0
  13. model_layers_11.safetensors +3 -0
  14. model_layers_12.safetensors +3 -0
  15. model_layers_13.safetensors +3 -0
  16. model_layers_14.safetensors +3 -0
  17. model_layers_15.safetensors +3 -0
  18. model_layers_16.safetensors +3 -0
  19. model_layers_17.safetensors +3 -0
  20. model_layers_18.safetensors +3 -0
  21. model_layers_19.safetensors +3 -0
  22. model_layers_2.safetensors +3 -0
  23. model_layers_20.safetensors +3 -0
  24. model_layers_21.safetensors +3 -0
  25. model_layers_22.safetensors +3 -0
  26. model_layers_23.safetensors +3 -0
  27. model_layers_24.safetensors +3 -0
  28. model_layers_25.safetensors +3 -0
  29. model_layers_26.safetensors +3 -0
  30. model_layers_27.safetensors +3 -0
  31. model_layers_28.safetensors +3 -0
  32. model_layers_29.safetensors +3 -0
  33. model_layers_3.safetensors +3 -0
  34. model_layers_30.safetensors +3 -0
  35. model_layers_31.safetensors +3 -0
  36. model_layers_32.safetensors +3 -0
  37. model_layers_33.safetensors +3 -0
  38. model_layers_34.safetensors +3 -0
  39. model_layers_35.safetensors +3 -0
  40. model_layers_36.safetensors +3 -0
  41. model_layers_37.safetensors +3 -0
  42. model_layers_38.safetensors +3 -0
  43. model_layers_39.safetensors +3 -0
  44. model_layers_4.safetensors +3 -0
  45. model_layers_40.safetensors +3 -0
  46. model_layers_41.safetensors +3 -0
  47. model_layers_42.safetensors +3 -0
  48. model_layers_43.safetensors +3 -0
  49. model_layers_44.safetensors +3 -0
  50. model_layers_45.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/QwQ-32B
3
+ base_model_relation: quantized
4
+ tags:
5
+ - dfloat11
6
+ - df11
7
+ - lossless compression
8
+ - 70% size, 100% accuracy
9
+ ---
10
+
11
+ ## DFloat11 Compressed Model: `Qwen/QwQ-32B`
12
+
13
+ This is a **losslessly compressed** version of [`Qwen/QwQ-32B`](https://huggingface.co/Qwen/QwQ-32B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
14
+
15
+ ### 🔍 How It Works
16
+
17
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
18
+
19
+ Key benefits:
20
+
21
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
22
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
23
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
24
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
25
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
26
+
27
+ ### 🔧 How to Use
28
+
29
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
30
+
31
+ ```bash
32
+ pip install -U dfloat11[cuda12]
33
+ # or if you have CUDA version 11:
34
+ # pip install -U dfloat11[cuda11]
35
+ ```
36
+
37
+ 2. To use the DFloat11 model, run the following example code in Python:
38
+
39
+ ```python
40
+ import torch
41
+ from dfloat11 import DFloat11Model
42
+ from transformers import AutoTokenizer
43
+
44
+ model_id = "DFloat11/QwQ-32B-DF11"
45
+
46
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
49
+ tokenizer.pad_token = tokenizer.eos_token
50
+
51
+ prompt = "Question: What is a binary tree and its applications? Answer:"
52
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
53
+
54
+ with torch.no_grad():
55
+ output = model.generate(
56
+ **inputs,
57
+ max_new_tokens=256,
58
+ do_sample=True,
59
+ )
60
+
61
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
62
+ ```
63
+
64
+ ### 📄 Learn More
65
+
66
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
67
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
68
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "dfloat11_config": {
8
+ "bytes_per_thread": 8,
9
+ "pattern_dict": {
10
+ "lm_head": [],
11
+ "model.embed_tokens": [],
12
+ "model.layers.\\d+": [
13
+ "self_attn.q_proj",
14
+ "self_attn.k_proj",
15
+ "self_attn.v_proj",
16
+ "self_attn.o_proj",
17
+ "mlp.gate_proj",
18
+ "mlp.up_proj",
19
+ "mlp.down_proj"
20
+ ]
21
+ },
22
+ "threads_per_block": [
23
+ 512
24
+ ],
25
+ "version": "0.2.0"
26
+ },
27
+ "eos_token_id": 151645,
28
+ "hidden_act": "silu",
29
+ "hidden_size": 5120,
30
+ "initializer_range": 0.02,
31
+ "intermediate_size": 27648,
32
+ "max_position_embeddings": 40960,
33
+ "max_window_layers": 64,
34
+ "model_type": "qwen2",
35
+ "num_attention_heads": 40,
36
+ "num_hidden_layers": 64,
37
+ "num_key_value_heads": 8,
38
+ "rms_norm_eps": 1e-05,
39
+ "rope_scaling": null,
40
+ "rope_theta": 1000000.0,
41
+ "sliding_window": 32768,
42
+ "tie_word_embeddings": false,
43
+ "torch_dtype": "bfloat16",
44
+ "transformers_version": "4.51.3",
45
+ "use_cache": true,
46
+ "use_sliding_window": false,
47
+ "vocab_size": 152064
48
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 40,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.51.3"
13
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70d47386531959f4e760a20adf712f6ed82335895140870156d12e9e9fb8ce2a
3
+ size 1056938932
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e285573e9b9ae1fa383b0f47708694e0338eca3031b144585367cf23b8d22fc4
3
+ size 10360
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1dd0e2e5d4663801a94ecd45f2718455445cf869139807288d686d055f693bfc
3
+ size 1074577818
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78838424a1bb5027ae58648d34d0618783d4d443337ecd78021058ba2ad0c22e
3
+ size 662055522
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57d4a81d06e13733dd06d901b166570ca68358762208de4ac9a96e97312f9047
3
+ size 723928827
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7abc5768e5d6a2705cbca5148e44ddbb3b95eb8fc358eb402ec17a7c2f9363bd
3
+ size 659600696
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15523d7a4ed7143a649181bd48750f031e7f6a80e34f7591859b17fc81b9535d
3
+ size 659571948
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70e4814cae42cfc1feb4e14a685a79f66f978e2901f8009e19b5112bda51c63e
3
+ size 659080427
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:260b816337b930ed7fdf9a7ef311d0a86b8dc7de1d1a1bf59259650a79f7647a
3
+ size 659360936
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9c4e4b4484bef4cd5f07d4a9a2dfa8adb2c3f97513930605496a037576d6141
3
+ size 659356974
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5db47f3d0adfab047a298ce9d3ca7e3099bd5b99b9f3a7673904ff6ce1b0c160
3
+ size 659709049
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fb9253e392f112ce56d220f64fde4097f3b78971f86adba84bfde47448d7e50
3
+ size 659579600
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49e9ae0ada5dd9af3ebdd77ab714bc7202c457e745632106571257bec4cc7723
3
+ size 659488337
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:210fdc838c258d1db34d09505e8c3f08d491eb531852f336b218946c949e3ede
3
+ size 659513908
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bc34adb5a3f4a8b58980180ca4612c43390404b8398ade837d450ef1e220837
3
+ size 659559584
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9d25f2d1cdb1af86032e89e555993cdffb6dcfc5da95e70877e85aa6bbcedb0
3
+ size 719742735
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1129cce2d740b80e224ba64ab07e93e47dd40ce1b551c09c6b89421f65758c69
3
+ size 659504224
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7ee83af2ec47995c56ce84eaad709dc2d70963dc71174602c80c42b8504fd51
3
+ size 659500101
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95f052379ac92f75e2d9c3bdd7b2f2018e71caab098da3f0a9ab7fc1137acc36
3
+ size 659479278
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7de178a00465240ad6802e80b560749f844dceb71a02faf81fe04d48c92feca
3
+ size 659566263
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:876626d56ef40ba1276473a17242aa1d7129e42d49dc7cb734f7b009dba97b6e
3
+ size 659507944
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19ebf1e0db8cf7e9afa48af5365b3ca25e2bcbb95f52d08062a7fd135c0b4484
3
+ size 659516512
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa9a8cc351b48526bb9259e168dc6f1522fb377c9fbd134f007954c919ea66a3
3
+ size 659614787
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d952b370bc91837f98b968799800598c85eba962301dad799ff323b3e42ed2ac
3
+ size 659673947
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cddd054d16f5b862ca3c72936d87fdfac67faf333d39942b63e6762c1c6732f9
3
+ size 659769658
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb0ac8792e56a5d37f5bb8cabbe649b650daec0d0f76032287c6c6261317fd17
3
+ size 659740698
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21f0f25061232c850ea50d3191067843bc06cc678ea169172d4c99eb32775d44
3
+ size 715771427
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53da6aa1caf4199a1c7a5d1ceb48039c1e95675ea7f4b70f8bc7998ad278baaf
3
+ size 659677250
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fc21fed06361a3b7154d44d1c97c4cd331e506f5b5b21bbdec060795038dc30
3
+ size 659585047
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93128e0cd05234111cb0225c624fe9ea40dcece02e77192b58512c4b9ae6dcda
3
+ size 659808068
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b0125ce4b999caec7ac0ef591ecee407992d78011ac09e13ba95540cbc7bf9d
3
+ size 659715005
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76003ee84059d017fbb79f663e1d3fcff13e398884858aba5784e818de5c0f46
3
+ size 659846359
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa763e43343791eb7af11a300e65fe0eafc5e09db5a15f036b1d3767fb78e052
3
+ size 660017867
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44682f3dbaf0e8ed513ec9f3f013b67688678ed5af7245675a73c9efb9b7e59b
3
+ size 660045307
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c7d46835453bed9e1f7293465c5fcff698fdec0934e0b89262cf93e736894ea
3
+ size 659976371
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc34c8398113377a5e1976a94d67b05584185632454191be47961211a7fbe0e9
3
+ size 659924652
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:786d8bcd0f38aee70266ebc8a110dc963128d3b4a59948adf9c09260bfa85c10
3
+ size 659991416
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:395709048222c41e60b67b26d84ee75cd164e3495ddb8ee0a6c3b5c391541324
3
+ size 713511641
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89f9e73d3c7c768dd90bac48b6d7e800c954ad97f7dbfa9fef12621cbe9a6e79
3
+ size 659877437
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:faf7e85b3582fbd4e8db8c81ce4901f09880120f63878eda26b2876c0370bf81
3
+ size 659986816
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cb343b5968801ec1adff2febf259af45b730f043d8280c830540c37c4ead67d
3
+ size 660041371
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45b526d5f6cd8e0a99b141f91f2371f866e1e94bed1adaf58e1c4511352bdb49
3
+ size 660087671
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7054e4f13c43f61dc808beb4d19d3d1a50dd1deb482f7e4ad17c387531bc7d1e
3
+ size 660209988
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2422b66e1b8fe12e1126e9638237bef24ca06f95300216fe7391ac3ca5c62850
3
+ size 659948373