LeanQuant commited on
Commit
a77c717
·
verified ·
1 Parent(s): 309d6c1

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +58 -0
  3. added_tokens.json +28 -0
  4. config.json +50 -0
  5. generation_config.json +13 -0
  6. lm_head.safetensors +3 -0
  7. merges.txt +0 -0
  8. model.safetensors +3 -0
  9. model_embed_tokens.safetensors +3 -0
  10. model_layers_0.safetensors +3 -0
  11. model_layers_1.safetensors +3 -0
  12. model_layers_10.safetensors +3 -0
  13. model_layers_11.safetensors +3 -0
  14. model_layers_12.safetensors +3 -0
  15. model_layers_13.safetensors +3 -0
  16. model_layers_14.safetensors +3 -0
  17. model_layers_15.safetensors +3 -0
  18. model_layers_16.safetensors +3 -0
  19. model_layers_17.safetensors +3 -0
  20. model_layers_18.safetensors +3 -0
  21. model_layers_19.safetensors +3 -0
  22. model_layers_2.safetensors +3 -0
  23. model_layers_20.safetensors +3 -0
  24. model_layers_21.safetensors +3 -0
  25. model_layers_22.safetensors +3 -0
  26. model_layers_23.safetensors +3 -0
  27. model_layers_24.safetensors +3 -0
  28. model_layers_25.safetensors +3 -0
  29. model_layers_26.safetensors +3 -0
  30. model_layers_27.safetensors +3 -0
  31. model_layers_28.safetensors +3 -0
  32. model_layers_29.safetensors +3 -0
  33. model_layers_3.safetensors +3 -0
  34. model_layers_30.safetensors +3 -0
  35. model_layers_31.safetensors +3 -0
  36. model_layers_32.safetensors +3 -0
  37. model_layers_33.safetensors +3 -0
  38. model_layers_34.safetensors +3 -0
  39. model_layers_35.safetensors +3 -0
  40. model_layers_36.safetensors +3 -0
  41. model_layers_37.safetensors +3 -0
  42. model_layers_38.safetensors +3 -0
  43. model_layers_39.safetensors +3 -0
  44. model_layers_4.safetensors +3 -0
  45. model_layers_40.safetensors +3 -0
  46. model_layers_41.safetensors +3 -0
  47. model_layers_42.safetensors +3 -0
  48. model_layers_43.safetensors +3 -0
  49. model_layers_44.safetensors +3 -0
  50. model_layers_45.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## DFloat11 Compressed Model: `Qwen/Qwen3-32B`
2
+
3
+ This is a **losslessly compressed** version of [`Qwen/Qwen3-32B`](https://huggingface.co/Qwen/Qwen3-32B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
4
+
5
+ ### 🔍 How It Works
6
+
7
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
8
+
9
+ Key benefits:
10
+
11
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
12
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
13
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
14
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
15
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
16
+
17
+ ### 🔧 How to Use
18
+
19
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
20
+
21
+ ```bash
22
+ pip install dfloat11[cuda12]
23
+ # or if you have CUDA version 11:
24
+ # pip install dfloat11[cuda11]
25
+ ```
26
+
27
+ 2. To use the DFloat11 model, run the following example code in Python:
28
+
29
+ ```python
30
+ import torch
31
+ from dfloat11 import DFloat11Model
32
+ from transformers import AutoTokenizer
33
+
34
+ model_id = "DFloat11/Qwen3-32B-DF11"
35
+
36
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ tokenizer.pad_token = tokenizer.eos_token
40
+
41
+ prompt = "Question: What is a binary tree and its applications? Answer:"
42
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
43
+
44
+ with torch.no_grad():
45
+ output = model.generate(
46
+ **inputs,
47
+ max_new_tokens=256,
48
+ do_sample=True,
49
+ )
50
+
51
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
52
+ ```
53
+
54
+ ### 📄 Learn More
55
+
56
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
57
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
58
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dfloat11_config": {
9
+ "bytes_per_thread": 8,
10
+ "pattern_dict": {
11
+ "lm_head": [],
12
+ "model.embed_tokens": [],
13
+ "model.layers.\\d+": [
14
+ "self_attn.q_proj",
15
+ "self_attn.k_proj",
16
+ "self_attn.v_proj",
17
+ "self_attn.o_proj",
18
+ "mlp.gate_proj",
19
+ "mlp.up_proj",
20
+ "mlp.down_proj"
21
+ ]
22
+ },
23
+ "threads_per_block": [
24
+ 512
25
+ ],
26
+ "version": "0.2.0"
27
+ },
28
+ "eos_token_id": 151645,
29
+ "head_dim": 128,
30
+ "hidden_act": "silu",
31
+ "hidden_size": 5120,
32
+ "initializer_range": 0.02,
33
+ "intermediate_size": 25600,
34
+ "max_position_embeddings": 40960,
35
+ "max_window_layers": 64,
36
+ "model_type": "qwen3",
37
+ "num_attention_heads": 64,
38
+ "num_hidden_layers": 64,
39
+ "num_key_value_heads": 8,
40
+ "rms_norm_eps": 1e-06,
41
+ "rope_scaling": null,
42
+ "rope_theta": 1000000,
43
+ "sliding_window": null,
44
+ "tie_word_embeddings": false,
45
+ "torch_dtype": "bfloat16",
46
+ "transformers_version": "4.51.3",
47
+ "use_cache": true,
48
+ "use_sliding_window": false,
49
+ "vocab_size": 151936
50
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.51.3"
13
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05cdfc8728ebdc1f35bb326b28678356f99173853545e3a23f6bccdfc2f15f39
3
+ size 1054877539
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c059b9923b407bbd4d8bb5c4021b9cdb9c1ecc30b0f180ff655f8abd460475c
3
+ size 10360
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c53531eacab9da6c0c6a33dcaef6a4ab584577a873abd9399477d88c3a7b590b
3
+ size 1058489203
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a5fd08a1dad32ce0ea65d8568c81f5162f612c98f9025e8510d0f4a1472d275
3
+ size 666102678
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38a9da78265dca69a6770e6fdd329b0cf2f5b334c642e76b513ba59d039af028
3
+ size 717204587
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec48c6381a7443121cba7744c7b463c6d7ad7c1e5ad09c014dea6b61a37d6482
3
+ size 658473845
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:671da055943806291af58bdcbe1a51b6a926162d3573dd7cc02949190ff19ab2
3
+ size 658505849
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e74e88ff1d7299e9368b882340bfb7a8904540a74b03c46fc6bb4db72c173dc3
3
+ size 659316099
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:855173ed9c284d8261707ff30433129c457651ab49f88ea23cab4ed05040e3f5
3
+ size 659377807
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec5bc6b5783267e86caa7a2ec2c3323d96a91502dd5a323f8356d0ffea7a52cc
3
+ size 660506237
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a11554da39814f597834f1f2a65fc5587632216bbf1cc952c1b5487fec09a3fd
3
+ size 659344819
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b68bb7e44cb5156a70afcc0c4d621c3d2a411d1ff79ff0dbb36f287dbf46187f
3
+ size 659677758
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b12a89cb6eadc7d99558524fb8350ea4b61137c98badbeb1b82e68a541937735
3
+ size 660393373
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9d8cbbb9684cf6810b6c82b77673d5bf16222be0abb7a93937be75bc864cd60
3
+ size 659807455
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe33bff5372f0cfca7d06e633436829a4c74db5db7d55a4bedb716ab3357f73e
3
+ size 659369489
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:136f6e4d9780480f84f13f89e63d10fdab2766c03f061ce32962e4857ef10c52
3
+ size 715739961
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d84c5b78cd18d3b0bd7a4c082fd3770f3b85f3634fcc11ed0fb755a9cad4d7f
3
+ size 659844174
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5c48cd26aa08322c01a42ce9b584eecfc252aea44bcf726f8231816b0b0e223
3
+ size 659272504
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25a7f2ea0168763ca7fff2eb8580f54582f635d97516898859e4ef45cff97094
3
+ size 659355561
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be50f9c9032b67c0e65e975c9477a2c78d210d27494b1b9d27f5e4c29cd87c7d
3
+ size 659921969
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a319fd466cd97bb5836feb6525028fcdd481c44beb77cd1d8ad2f6231b6a1b7
3
+ size 659035002
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:365aaea652f0113afdb4d2a6c74d04e3bc3fb690bfe477e705d802ad62dfc1f7
3
+ size 658958886
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:818bfebe6bb92aeb77f229de3bf568fafd5cb8e6a9899e668f6df746c69c6976
3
+ size 659149200
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f60caa75454b835024ff97939bf7bdda1a6563822caef76d2b7864a35dd4f070
3
+ size 659341198
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f84bd11c230343acaecc603816432cee8d4e59bcc658237181a00c56c2de9769
3
+ size 659385659
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e36a493e8a5fe173a7a97d340da65200d0c29a56d3858717c9e425969ca5c1d6
3
+ size 659379681
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20c8eed33d00bbe38b179f73b9a081b372311439ff5b6457dedb66af9921aaa8
3
+ size 707649301
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66e5174b8952f0d987dc6825696974be26603b41d6b7a909782d1d98fffb92fe
3
+ size 661506869
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8387f325d22678929e18fe9b29a7ba5d4db64aca1429f0e2bea6984dd67a85b
3
+ size 661311896
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b26b67ec41f3d1f568273ab4548d2cc92c89ef71218584d26786743cee48607
3
+ size 659766382
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71e50c6f42a65b5e6f33cbd92b18098db8d26a37f3f6493e7bbde0c2c2f537f7
3
+ size 659905876
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81f0f87e569ec108dc6fb54ba2644fc1f50016c26dfcee4f13799ead38c908ad
3
+ size 658183158
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7e3bee630e2f1eeed5b8ab30689e3a08c0cad014cb3642a733b0d7b889ff79e
3
+ size 658247792
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44989d9c01c4a252309fb02bfcaa83603d7c8fce3c93ca995da4361d77a4689a
3
+ size 658721791
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9caea1cc59bf506bffecfa83c52d34c72fe9ce8ce4530524af4ade9868784424
3
+ size 659077121
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84cf9f71cf22392b7343ec763ca889f6dfac3e29417a05ec7e1db158d29fd868
3
+ size 659216449
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb8ddf1fdc2e3ac20f46f806559b958f3af4f43f294789889c8d5f8cac5c75b4
3
+ size 659042564
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6381ffd35ffa38fd4b4c2e0f9cfcbe04f45e836ea612b1ca3feaf27c1f9f3976
3
+ size 694879348
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c34764b4233e18b9b7268057e818de90b9c3af7c39cd69ffb684bf9bacd0493
3
+ size 659425034
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ad59090816b089eec433cbe7a4b349ddd6db5a582b7809efefaedbffd27bdaa
3
+ size 659719828
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f016757764dee021dc0354047fd9c1b94d6527c50a5c39aed2fa6020a67b0c4
3
+ size 660161683
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52b29ef94c5ec69e34242117bce7ad1ccd6228593c34bf10ea0bf478895ea03b
3
+ size 660325959
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66994cf45225b9f3ce0a1894e1a3c5d23231a62942cb5950a37b0750f05f114f
3
+ size 660582706
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ab35340c8d37aa9bd89d16ba6fa5b39d2b8f7d996b89f441a9f045ec1e1ea5b
3
+ size 660224074