LeanQuant commited on
Commit
52fa776
·
verified ·
1 Parent(s): 872e9cb

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +68 -0
  3. added_tokens.json +24 -0
  4. config.json +48 -0
  5. generation_config.json +14 -0
  6. lm_head.safetensors +3 -0
  7. merges.txt +0 -0
  8. model.safetensors +3 -0
  9. model_embed_tokens.safetensors +3 -0
  10. model_layers_0.safetensors +3 -0
  11. model_layers_1.safetensors +3 -0
  12. model_layers_10.safetensors +3 -0
  13. model_layers_11.safetensors +3 -0
  14. model_layers_12.safetensors +3 -0
  15. model_layers_13.safetensors +3 -0
  16. model_layers_14.safetensors +3 -0
  17. model_layers_15.safetensors +3 -0
  18. model_layers_16.safetensors +3 -0
  19. model_layers_17.safetensors +3 -0
  20. model_layers_18.safetensors +3 -0
  21. model_layers_19.safetensors +3 -0
  22. model_layers_2.safetensors +3 -0
  23. model_layers_20.safetensors +3 -0
  24. model_layers_21.safetensors +3 -0
  25. model_layers_22.safetensors +3 -0
  26. model_layers_23.safetensors +3 -0
  27. model_layers_24.safetensors +3 -0
  28. model_layers_25.safetensors +3 -0
  29. model_layers_26.safetensors +3 -0
  30. model_layers_27.safetensors +3 -0
  31. model_layers_28.safetensors +3 -0
  32. model_layers_29.safetensors +3 -0
  33. model_layers_3.safetensors +3 -0
  34. model_layers_30.safetensors +3 -0
  35. model_layers_31.safetensors +3 -0
  36. model_layers_32.safetensors +3 -0
  37. model_layers_33.safetensors +3 -0
  38. model_layers_34.safetensors +3 -0
  39. model_layers_35.safetensors +3 -0
  40. model_layers_36.safetensors +3 -0
  41. model_layers_37.safetensors +3 -0
  42. model_layers_38.safetensors +3 -0
  43. model_layers_39.safetensors +3 -0
  44. model_layers_4.safetensors +3 -0
  45. model_layers_40.safetensors +3 -0
  46. model_layers_41.safetensors +3 -0
  47. model_layers_42.safetensors +3 -0
  48. model_layers_43.safetensors +3 -0
  49. model_layers_44.safetensors +3 -0
  50. model_layers_45.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-14B-Instruct
3
+ base_model_relation: quantized
4
+ tags:
5
+ - dfloat11
6
+ - df11
7
+ - lossless compression
8
+ - 70% size, 100% accuracy
9
+ ---
10
+
11
+ ## DFloat11 Compressed Model: `Qwen/Qwen2.5-14B-Instruct`
12
+
13
+ This is a **losslessly compressed** version of [`Qwen/Qwen2.5-14B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
14
+
15
+ ### 🔍 How It Works
16
+
17
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
18
+
19
+ Key benefits:
20
+
21
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
22
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
23
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
24
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
25
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
26
+
27
+ ### 🔧 How to Use
28
+
29
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
30
+
31
+ ```bash
32
+ pip install -U dfloat11[cuda12]
33
+ # or if you have CUDA version 11:
34
+ # pip install -U dfloat11[cuda11]
35
+ ```
36
+
37
+ 2. To use the DFloat11 model, run the following example code in Python:
38
+
39
+ ```python
40
+ import torch
41
+ from dfloat11 import DFloat11Model
42
+ from transformers import AutoTokenizer
43
+
44
+ model_id = "DFloat11/Qwen2.5-14B-Instruct-DF11"
45
+
46
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
49
+ tokenizer.pad_token = tokenizer.eos_token
50
+
51
+ prompt = "Question: What is a binary tree and its applications? Answer:"
52
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
53
+
54
+ with torch.no_grad():
55
+ output = model.generate(
56
+ **inputs,
57
+ max_new_tokens=256,
58
+ do_sample=True,
59
+ )
60
+
61
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
62
+ ```
63
+
64
+ ### 📄 Learn More
65
+
66
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
67
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
68
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "dfloat11_config": {
8
+ "bytes_per_thread": 8,
9
+ "pattern_dict": {
10
+ "lm_head": [],
11
+ "model.embed_tokens": [],
12
+ "model.layers.\\d+": [
13
+ "self_attn.q_proj",
14
+ "self_attn.k_proj",
15
+ "self_attn.v_proj",
16
+ "self_attn.o_proj",
17
+ "mlp.gate_proj",
18
+ "mlp.up_proj",
19
+ "mlp.down_proj"
20
+ ]
21
+ },
22
+ "threads_per_block": [
23
+ 512
24
+ ],
25
+ "version": "0.2.0"
26
+ },
27
+ "eos_token_id": 151645,
28
+ "hidden_act": "silu",
29
+ "hidden_size": 5120,
30
+ "initializer_range": 0.02,
31
+ "intermediate_size": 13824,
32
+ "max_position_embeddings": 32768,
33
+ "max_window_layers": 70,
34
+ "model_type": "qwen2",
35
+ "num_attention_heads": 40,
36
+ "num_hidden_layers": 48,
37
+ "num_key_value_heads": 8,
38
+ "rms_norm_eps": 1e-06,
39
+ "rope_scaling": null,
40
+ "rope_theta": 1000000.0,
41
+ "sliding_window": 131072,
42
+ "tie_word_embeddings": false,
43
+ "torch_dtype": "bfloat16",
44
+ "transformers_version": "4.51.3",
45
+ "use_cache": true,
46
+ "use_sliding_window": false,
47
+ "vocab_size": 152064
48
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.51.3"
14
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05c23c266ab8ea4644d474c06e97827a752fb1f71278a9e9de9a82aa30ece442
3
+ size 1056687515
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d196a7586fb627a3f91bd4ba6516d0a540caa7dbd59bf61dba46152ae03615e7
3
+ size 10360
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8353b9640344a9a3e469c469f63dd53c7a10fc48ff6d55e3145ce6819af9d616
3
+ size 1071180380
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89e8e247741a45c2806ed365cf56033a1e069111d5c502a05677180ea7f8cb7b
3
+ size 373256706
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ff6b3837db0433d7fa31a31253f48d7d69584c2db3b76eeaeb251fa5e2f2c4c
3
+ size 406484748
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c94c0f3e035aa6301ba460ed43d175c1f435607cd1ace3402b2997dd4bda952
3
+ size 372391092
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8fc7799740d5b3b9da81b9ab89b0893fb28a504608da458d4e60efc46b75f81
3
+ size 372482539
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33a2f05170f6084759501607f221776404e68d3dc3549a5af13a6dbd1feb5daf
3
+ size 372257266
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10c6dd64dfbb2ef812b6c197a894d623099f503bb1c76e85b95b75eb19fed2b0
3
+ size 372354324
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3f82ad92357de639b0394ad09b0395a6cd0edf8adf62587220a26530f3c207d
3
+ size 372422278
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:188b9e48683e000a4bdca3e06e3712ba5454cdedfae32133a60942bc7e6e7bae
3
+ size 372477370
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f065eda0d8b4170ca6c90d5b74ac76d19344fd0e25b48e6efaae3b8a92b32b88
3
+ size 372648557
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e92796bb910725ad4bf28f4f5852f21a9298486017c9380cecbfa044ebae9b8
3
+ size 372665588
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f18ac6251648b12869b59658507b1e7ba53e407833f133afd3595569a3f0dcec
3
+ size 372733375
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb23cd851fe215da05db5929082df230202423abd09c3516847ee21752d7a619
3
+ size 372792975
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32c1700f4af40e3d7df4ec6927211cdd2c4375add31f0548d5f8d380b863ba43
3
+ size 408064978
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e5f8080181b742e03948bd71870a958a0eae96944b7fb02956809e304af6777
3
+ size 372860519
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6315d4d7faf9d6a433b218484c3f4f23407dd25a285df923543bd9887fec1f2b
3
+ size 372801120
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82471f9614e94faeee356b57defcfd84698c4fade71c8961b11359edcb86b275
3
+ size 372777734
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a66679017a5da11381d933a0c6ef6c02f34992b0cf90ef4d09e797983e74a810
3
+ size 372808801
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85e3a588571bf0a83d83519b16627118b53e23648a6d9501155c5051bc5ecb6b
3
+ size 372841217
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd4a0d0c20f1418ea3137c4f140c548494fb400c14a9d0c8d8a57075c690d580
3
+ size 372902112
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1cee3cc9da1014eac2871fd331653836a3e8d3bc57cdb6f563754c1dff4297c
3
+ size 372861926
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30af40cb5bfb0840c8eac112feb4eba36d1891792788d9fbfa372f220cd395fa
3
+ size 372893037
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:527e740b05b83b0455d71db30b2a8daae3ccfd8ca1d42352cfa147925e038f5f
3
+ size 372931412
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:163d638f51f3a491cd9a5c4b59aeb7c7e665df0577168c5a7b0a33b47c3bdc1f
3
+ size 372905593
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31e1582a1f4823770adf416b2f80a6429a9a39518c1b483fcadbcd8b03319c2a
3
+ size 402083011
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e08bef49ac321aab4a18bf3f6b739dee71b94c0831adf719943ae7931188c27
3
+ size 372682758
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:269e5cb7709b7de674f02100090fae9251cafdedc9e87862aa0c356cf17b6292
3
+ size 372557390
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eac675a0b93391a0e3db03f61630baac58934b8ed2ec5a98068c05e7a1ca192e
3
+ size 372525663
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e56520584d271b2de80672470fe5ee0c7bc85d78df1abe157e4f067a9f21f2b
3
+ size 372567622
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69f3beb29c17177f1e4fba35e40dd08b7432ef46325fcaee4bfb6c8d39d4298a
3
+ size 372529440
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fee8a7fdf7385abc6b8ddbc89f8f2a64c1fb31a235edfae088faaf13f3eccee5
3
+ size 372514582
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a878f163989c9d5055de12d9be4341536c99d2d0d8886f9411ab2ed2f56fff8
3
+ size 372581465
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:485583d61114b831ba8d3bbed11e46b1b32560fc13a7d4231ebf3e82f69606d2
3
+ size 372347205
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d94ed8520cf038fd85bb775e1c8cedca58b62e9d15ae2f0c995c37d17a77757
3
+ size 372291387
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc2c02ca1bc18e7254992e60a34586fc72ae6336de83680875d03c19328e4297
3
+ size 372303952
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93ecf2959477739310fe2d0dd0f4d96bf0470d25d2ac6b3a04b711df637a1cc5
3
+ size 399836153
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26f72f5f5d292dd2430c0275913df1efb5adac9d85418a8f997ade5482dec935
3
+ size 372319664
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64f626e77db75ebc2fd212ff01fcbf98ce14d1aff47e5bcb45d17f16d2906ae1
3
+ size 372297297
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4598111fd672c816e58b80895ae2ee8ec3e3f6e60343d45d343f75d1de4648c
3
+ size 372344250
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c3fb2b0aa4c8320b55da3d9305eb4556f84e077140e714549e3df653c91fd3d
3
+ size 372583922
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3007b96538a23a77fdba0abd873c69e0b87137d4d88a28ea004b419fc5fa40a2
3
+ size 372679269
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8919f255f3556263604aabd101b0710e7662ad20516bc2395bbb43761a11843
3
+ size 372710655