LeanQuant commited on
Commit
11cc13f
·
verified ·
1 Parent(s): 5ffb283

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +58 -0
  3. config.json +48 -0
  4. generation_config.json +9 -0
  5. lm_head.safetensors +3 -0
  6. model.safetensors +3 -0
  7. model_embed_tokens.safetensors +3 -0
  8. model_layers_0.safetensors +3 -0
  9. model_layers_1.safetensors +3 -0
  10. model_layers_10.safetensors +3 -0
  11. model_layers_11.safetensors +3 -0
  12. model_layers_12.safetensors +3 -0
  13. model_layers_13.safetensors +3 -0
  14. model_layers_14.safetensors +3 -0
  15. model_layers_15.safetensors +3 -0
  16. model_layers_16.safetensors +3 -0
  17. model_layers_17.safetensors +3 -0
  18. model_layers_18.safetensors +3 -0
  19. model_layers_19.safetensors +3 -0
  20. model_layers_2.safetensors +3 -0
  21. model_layers_20.safetensors +3 -0
  22. model_layers_21.safetensors +3 -0
  23. model_layers_22.safetensors +3 -0
  24. model_layers_23.safetensors +3 -0
  25. model_layers_24.safetensors +3 -0
  26. model_layers_25.safetensors +3 -0
  27. model_layers_26.safetensors +3 -0
  28. model_layers_27.safetensors +3 -0
  29. model_layers_28.safetensors +3 -0
  30. model_layers_29.safetensors +3 -0
  31. model_layers_3.safetensors +3 -0
  32. model_layers_30.safetensors +3 -0
  33. model_layers_31.safetensors +3 -0
  34. model_layers_32.safetensors +3 -0
  35. model_layers_33.safetensors +3 -0
  36. model_layers_34.safetensors +3 -0
  37. model_layers_35.safetensors +3 -0
  38. model_layers_36.safetensors +3 -0
  39. model_layers_37.safetensors +3 -0
  40. model_layers_38.safetensors +3 -0
  41. model_layers_39.safetensors +3 -0
  42. model_layers_4.safetensors +3 -0
  43. model_layers_40.safetensors +3 -0
  44. model_layers_41.safetensors +3 -0
  45. model_layers_42.safetensors +3 -0
  46. model_layers_43.safetensors +3 -0
  47. model_layers_44.safetensors +3 -0
  48. model_layers_45.safetensors +3 -0
  49. model_layers_46.safetensors +3 -0
  50. model_layers_47.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## DFloat11 Compressed Model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`
2
+
3
+ This is a **losslessly compressed** version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
4
+
5
+ ### 🔍 How It Works
6
+
7
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
8
+
9
+ Key benefits:
10
+
11
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
12
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
13
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
14
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
15
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
16
+
17
+ ### 🔧 How to Use
18
+
19
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
20
+
21
+ ```bash
22
+ pip install dfloat11[cuda12]
23
+ # or if you have CUDA version 11:
24
+ # pip install dfloat11[cuda11]
25
+ ```
26
+
27
+ 2. To use the DFloat11 model, run the following example code in Python:
28
+
29
+ ```python
30
+ import torch
31
+ from dfloat11 import DFloat11Model
32
+ from transformers import AutoTokenizer
33
+
34
+ model_id = "DFloat11/DeepSeek-R1-Distill-Qwen-32B-DF11"
35
+
36
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ tokenizer.pad_token = tokenizer.eos_token
40
+
41
+ prompt = "Question: What is a binary tree and its applications? Answer:"
42
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
43
+
44
+ with torch.no_grad():
45
+ output = model.generate(
46
+ **inputs,
47
+ max_new_tokens=256,
48
+ do_sample=True,
49
+ )
50
+
51
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
52
+ ```
53
+
54
+ ### 📄 Learn More
55
+
56
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
57
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
58
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "dfloat11_config": {
8
+ "bytes_per_thread": 8,
9
+ "pattern_dict": {
10
+ "lm_head": [],
11
+ "model.embed_tokens": [],
12
+ "model.layers.\\d+": [
13
+ "self_attn.q_proj",
14
+ "self_attn.k_proj",
15
+ "self_attn.v_proj",
16
+ "self_attn.o_proj",
17
+ "mlp.gate_proj",
18
+ "mlp.up_proj",
19
+ "mlp.down_proj"
20
+ ]
21
+ },
22
+ "threads_per_block": [
23
+ 512
24
+ ],
25
+ "version": "0.2.0"
26
+ },
27
+ "eos_token_id": 151643,
28
+ "hidden_act": "silu",
29
+ "hidden_size": 5120,
30
+ "initializer_range": 0.02,
31
+ "intermediate_size": 27648,
32
+ "max_position_embeddings": 131072,
33
+ "max_window_layers": 64,
34
+ "model_type": "qwen2",
35
+ "num_attention_heads": 40,
36
+ "num_hidden_layers": 64,
37
+ "num_key_value_heads": 8,
38
+ "rms_norm_eps": 1e-05,
39
+ "rope_scaling": null,
40
+ "rope_theta": 1000000.0,
41
+ "sliding_window": 131072,
42
+ "tie_word_embeddings": false,
43
+ "torch_dtype": "bfloat16",
44
+ "transformers_version": "4.51.3",
45
+ "use_cache": true,
46
+ "use_sliding_window": false,
47
+ "vocab_size": 152064
48
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151646,
4
+ "do_sample": true,
5
+ "eos_token_id": 151643,
6
+ "temperature": 0.6,
7
+ "top_p": 0.95,
8
+ "transformers_version": "4.51.3"
9
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:262f0cdad054005a32309fe1b157f7b426a07ae903ebf7b736813de8d3ee2003
3
+ size 1056885536
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7372e0345e6413aeb3caed3266d35640bbb13159a20f9349bafd50c3c2ee1cb1
3
+ size 10360
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3d754feea1cfb7eb41cbd1bcd18055f8313dc53c7fad0d36bea999143711ac1
3
+ size 1073106128
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28e556d9a487202fc83e1eaff3a5f88abbb36aebcade45c708c9d9f737ffdd5f
3
+ size 662441978
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3a2af66877831e05bf8ee4c0f58d11c795051c34482e79704fff5fe964d132c
3
+ size 725166597
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d4bed358e0658a03c157ba5ced51ecb132c101106a09f942177035c425c3c5e
3
+ size 659979946
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:228cd95e46bd52cfbfae70e04933ed9d7dc41a82888dc0f47199429121b386ba
3
+ size 659918803
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88ced64d84d3d6918ec19e6152e5e02e7658d913e6af1ec6907cf2524f0ee0bc
3
+ size 659389766
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f24235a9337b972a840f21defe923b903c5d4c2975c8085ef3cb9c36034e8ec7
3
+ size 659693888
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63d8de556cbea7c1dbaa89ffde95581537182afba37d3038cdb7d2421f18f647
3
+ size 659693904
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b46adbc65842166dcdbd3ab5438a329640ff71ca7be7a8a492961e10113c50b6
3
+ size 660158903
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00ea785666a9f7e9ff8e91dfe6ffdba58fea79f5cb7593905c908c171ecaf9d3
3
+ size 659882224
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbc546e926d8063087dbaa6da4dba3745b0aa30d441050cde8059b7427596a76
3
+ size 659742874
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a78c4bce9f8293d5882b3761ac222cdd7ffb370ca66fbdf66988c80a0bab6d24
3
+ size 659930747
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80d914f4114c2c54493cf69414996bcb071f6427f4937d80109b2ca810e79906
3
+ size 660045727
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f00ad02e65119dead518589a1e2ac382470d906980dd58305ff3f5be3b3d001e
3
+ size 720484054
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b2f59693b89676df3d8e07556651cc93004c0ac81bd6233c94ba8fc1765e492
3
+ size 660008541
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e9a00c2a3a819c691b61a3b6344b075ad2625ec0d8e932560a5e33f91bb21fc
3
+ size 660025414
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27431a21b432735837d72d8ea0748b2e923d92e1ef2cfb343574a34230d92011
3
+ size 660041687
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0a23c6d94ba63a1f33f6f5d461c50720c46bc5c4f1213736db9af57386b3a57
3
+ size 660102695
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b52a4c49ae539bfa7309a56904b088b99a65717bed4662da1b21e5d71509f49
3
+ size 659879208
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4631ddf9a3a228ffde06d53798672e25694e19b68b64848cfba5795dbbe92e72
3
+ size 659860281
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd35a238487ac3a1c05f3afb84ef9abd5f4785103a259fbd8af28aad4420c679
3
+ size 659937778
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01bf4b3141223368471cea3580de1fced077ac5331ba91b18328b8841e71ff37
3
+ size 660095500
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb52dc9d522c9833b67c0874e9dbf6dc272a3bd808a05ec3307a29a34144152a
3
+ size 660349657
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:506f46a23fe2f36ffe27850a3bb927403b5c5c5d4a6d2f8978d5ed65cc66be20
3
+ size 660238782
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2eedd33b98f5da3123d346e92dfc9e25ea72b4709419a6f4459b4643a416b32
3
+ size 716178698
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba1b2e12aea9e57c8fa96260082383573acbcd283a7d00b73bdc1f2786f89990
3
+ size 660154409
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05718dfa286672dc4b2b71f7b040c26e5d33326ad4350683f2253d03fcf9ed2b
3
+ size 660085711
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ecb318a215c56dade952ebf4c513bf17d640508e8a9bf4bd6161041a895591b
3
+ size 660131842
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f5d38b55219e05bdd43a13eaf0c07321c4541554a7d0008b908773dadd02690
3
+ size 659999114
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a661ad479392473b13e4733fbe0f5389646ad7a41814da558f97e3e5747e69ff
3
+ size 660178412
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f0d3a8d010aaaea9fc638af2939e807adc7a7b72be8fe7f70f51fe7ade0d0ec
3
+ size 660284891
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33a0057ad0e1d13732a8758f50c30610c45673e0af91a505a9a4d616b98d35d1
3
+ size 660306945
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5326682e243820a870ac9e28303b01c3fdfc5363c49d5bf334b50c5fd115967
3
+ size 660254993
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:876242355e5f0ce8cdedd1256f581b18811924c3f96fa776529755b05cc2ca3d
3
+ size 660325297
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36de3203a71c654b83266fff9e24d6a455ad22c772d7bbce21821f7108d80251
3
+ size 660467243
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bad41f176b4e3b871369a0a03bd58bc89d0bcb6bccb4c4fa2a06011df0709cf
3
+ size 714368386
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60bb8269125282554b424d5a8061254d423d032ebe64e4b32c3d75e389bfebff
3
+ size 660206507
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64704696897292329cf325060769a67c4d159cac3c98b8c186f513f5fb7ed8ab
3
+ size 660317617
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa780b39fd43aac49e6e728c64a183394bb171a76e5466e658b551c9f267b92b
3
+ size 660258726
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1075e1d05911b07102d9b931046b21b67234560302a378cbe78310b2d4a7ade0
3
+ size 660384825
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc1ff11c68214120aa1b62b261af4befa3a82aa86b2613f1b68be59afdf1500c
3
+ size 660669818
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d442edc5540a500225a803d9c0d72ec2c88ef2c6dbb82153ac70cdd74e82b5b
3
+ size 660444923
model_layers_46.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8ef6022314f23eb41b92cf1c1904ff9f89af535e82b8b573e0fa2dae81186ea
3
+ size 660008546
model_layers_47.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:640485599adf9fcda532463cc0beb916fcacd7e38b3e65bd5e365fd1fda76f61
3
+ size 659742991