LeanQuant commited on
Commit
a3ef3c8
·
verified ·
1 Parent(s): e8973e6

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +68 -0
  3. config.json +59 -0
  4. generation_config.json +12 -0
  5. lm_head.safetensors +3 -0
  6. model.safetensors +3 -0
  7. model_embed_tokens.safetensors +3 -0
  8. model_layers_0.safetensors +3 -0
  9. model_layers_1.safetensors +3 -0
  10. model_layers_10.safetensors +3 -0
  11. model_layers_11.safetensors +3 -0
  12. model_layers_12.safetensors +3 -0
  13. model_layers_13.safetensors +3 -0
  14. model_layers_14.safetensors +3 -0
  15. model_layers_15.safetensors +3 -0
  16. model_layers_16.safetensors +3 -0
  17. model_layers_17.safetensors +3 -0
  18. model_layers_18.safetensors +3 -0
  19. model_layers_19.safetensors +3 -0
  20. model_layers_2.safetensors +3 -0
  21. model_layers_20.safetensors +3 -0
  22. model_layers_21.safetensors +3 -0
  23. model_layers_22.safetensors +3 -0
  24. model_layers_23.safetensors +3 -0
  25. model_layers_24.safetensors +3 -0
  26. model_layers_25.safetensors +3 -0
  27. model_layers_26.safetensors +3 -0
  28. model_layers_27.safetensors +3 -0
  29. model_layers_28.safetensors +3 -0
  30. model_layers_29.safetensors +3 -0
  31. model_layers_3.safetensors +3 -0
  32. model_layers_30.safetensors +3 -0
  33. model_layers_31.safetensors +3 -0
  34. model_layers_32.safetensors +3 -0
  35. model_layers_33.safetensors +3 -0
  36. model_layers_34.safetensors +3 -0
  37. model_layers_35.safetensors +3 -0
  38. model_layers_36.safetensors +3 -0
  39. model_layers_37.safetensors +3 -0
  40. model_layers_38.safetensors +3 -0
  41. model_layers_39.safetensors +3 -0
  42. model_layers_4.safetensors +3 -0
  43. model_layers_40.safetensors +3 -0
  44. model_layers_41.safetensors +3 -0
  45. model_layers_42.safetensors +3 -0
  46. model_layers_43.safetensors +3 -0
  47. model_layers_44.safetensors +3 -0
  48. model_layers_45.safetensors +3 -0
  49. model_layers_46.safetensors +3 -0
  50. model_layers_47.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-3.3-70B-Instruct
3
+ base_model_relation: quantized
4
+ tags:
5
+ - dfloat11
6
+ - df11
7
+ - lossless compression
8
+ - 70% size, 100% accuracy
9
+ ---
10
+
11
+ ## DFloat11 Compressed Model: `meta-llama/Llama-3.3-70B-Instruct`
12
+
13
+ This is a **losslessly compressed** version of [`meta-llama/Llama-3.3-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
14
+
15
+ ### 🔍 How It Works
16
+
17
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
18
+
19
+ Key benefits:
20
+
21
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
22
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
23
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
24
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
25
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
26
+
27
+ ### 🔧 How to Use
28
+
29
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
30
+
31
+ ```bash
32
+ pip install -U dfloat11[cuda12]
33
+ # or if you have CUDA version 11:
34
+ # pip install -U dfloat11[cuda11]
35
+ ```
36
+
37
+ 2. To use the DFloat11 model, run the following example code in Python:
38
+
39
+ ```python
40
+ import torch
41
+ from dfloat11 import DFloat11Model
42
+ from transformers import AutoTokenizer
43
+
44
+ model_id = "DFloat11/Llama-3.3-70B-Instruct-DF11"
45
+
46
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
49
+ tokenizer.pad_token = tokenizer.eos_token
50
+
51
+ prompt = "Question: What is a binary tree and its applications? Answer:"
52
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
53
+
54
+ with torch.no_grad():
55
+ output = model.generate(
56
+ **inputs,
57
+ max_new_tokens=256,
58
+ do_sample=True,
59
+ )
60
+
61
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
62
+ ```
63
+
64
+ ### 📄 Learn More
65
+
66
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
67
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
68
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "dfloat11_config": {
9
+ "bytes_per_thread": 8,
10
+ "pattern_dict": {
11
+ "lm_head": [],
12
+ "model\\.embed_tokens": [],
13
+ "model\\.layers\\.\\d+": [
14
+ "self_attn.q_proj",
15
+ "self_attn.k_proj",
16
+ "self_attn.v_proj",
17
+ "self_attn.o_proj",
18
+ "mlp.gate_proj",
19
+ "mlp.up_proj",
20
+ "mlp.down_proj"
21
+ ]
22
+ },
23
+ "threads_per_block": [
24
+ 512
25
+ ],
26
+ "version": "0.2.0"
27
+ },
28
+ "eos_token_id": [
29
+ 128001,
30
+ 128008,
31
+ 128009
32
+ ],
33
+ "head_dim": 128,
34
+ "hidden_act": "silu",
35
+ "hidden_size": 8192,
36
+ "initializer_range": 0.02,
37
+ "intermediate_size": 28672,
38
+ "max_position_embeddings": 131072,
39
+ "mlp_bias": false,
40
+ "model_type": "llama",
41
+ "num_attention_heads": 64,
42
+ "num_hidden_layers": 80,
43
+ "num_key_value_heads": 8,
44
+ "pretraining_tp": 1,
45
+ "rms_norm_eps": 1e-05,
46
+ "rope_scaling": {
47
+ "factor": 8.0,
48
+ "high_freq_factor": 4.0,
49
+ "low_freq_factor": 1.0,
50
+ "original_max_position_embeddings": 8192,
51
+ "rope_type": "llama3"
52
+ },
53
+ "rope_theta": 500000.0,
54
+ "tie_word_embeddings": false,
55
+ "torch_dtype": "bfloat16",
56
+ "transformers_version": "4.51.3",
57
+ "use_cache": true,
58
+ "vocab_size": 128256
59
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
+ "temperature": 0.6,
10
+ "top_p": 0.9,
11
+ "transformers_version": "4.51.3"
12
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46c090b4d335c7b0933c67226989e2c73f7ad9bc0eb013a9bc75c755f2d0d995
3
+ size 1423358924
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:039ff0691e1282c8fdd29c9c944af5dc5eec942ec8e621eaf8052ab8def5cba0
3
+ size 16504
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:069ca889f29017156744afc0df278ba632f97fb3e0dbc2a3cc6c4b8464574df3
3
+ size 1425659798
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:254bb931c0f8296245d5ac799701d118b480c232534df8239e01038caefc9d8c
3
+ size 1184264874
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bc645366b4864ca8b228308de8d2aae95d865904e3dfb8252b8fb36e4f39050
3
+ size 1163448837
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c77924d3fbffccf26b43cc70fe722f3b8984d8a99b847a6c2bcd8714960f1de
3
+ size 1157115996
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c642f884ee2de5059463ab469b41dd17a0ef5ae1b13c1df8fb7bb338202dd6d5
3
+ size 1156385598
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90d970699de0f638ea7e3ae30e6a9b256a6c419826cf892913fbd498ed15fbce
3
+ size 1156692364
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:daf0cf0b0e674f8642af2e2761b70b498898c219d667ccc9d4197b584b2c1ee2
3
+ size 1157021634
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12b1a65db878733f86d7d85c4d5b960851c0b007b6061e7832362ca94496a235
3
+ size 1157204005
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e42022d45a163771db7f24df14820d2b96cd4edaab5c10a13240e523d0e9ac34
3
+ size 1156818968
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ba876cf32dffe02e5feae9e638f57861963fae51b2327e97b1055389c0c050b
3
+ size 1156731338
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f23bbcdab675a751a15106f0928e5a117fef164ced1ba27ce0c04de70e1ea36
3
+ size 1156996149
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06cd2835997cf30ccda91e3754cba35d84f2d0efb304d3a0ede8763809a5dab7
3
+ size 1156929280
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ad6d1b8ec0dfeb9a98ba1aac46c5131a6910b56b81ab6fda48554a4032c9c8a
3
+ size 1156512080
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38c93632acc82d7d36a584026fe93d5e12c07f3d43562f341af9f706343c1562
3
+ size 1159907766
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d85f8dfe834ed429f1178655a290808506e4368800f3701766be171a130cd00
3
+ size 1156025976
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03f899813adbf0296ec832e101628aba715edd49917ed30275a0b827b17f5924
3
+ size 1156434583
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d77dd02dcc396de88c7577d69bc9687070e4395a587e4728f3e336fddbdacbc
3
+ size 1156120613
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a22813bc56092c8e81f9aa6854f5cb1f93d55e482992eea092911abc7523eab
3
+ size 1156266787
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a460ab01d1aab00281780807c67097a0583d3a054ae491c7d5fd45b7675cbf3
3
+ size 1156178627
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9149bf70b7c8ae9d2fdb5c2374e314408d3559e589704571a51411d92bcd0a98
3
+ size 1156171670
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85a37aaadd2bfffc7feeb6e6b848a398746bd39cb5a07e1335881e53fa526516
3
+ size 1156040296
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb8463b250b2b542e00062d122adc0aa401c1c05ee95da25750636b15ef5c563
3
+ size 1156821884
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b6f469f831b5a5cb9c58fc027319fe25081ff064a332b2b839e1dd035d315ba
3
+ size 1156773429
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbaa0a2e27b8f5f854d7859c7e8094fbca7c0473ede6e469e9599f0b308bc21b
3
+ size 1156749948
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e68c1a74db63fed8b15ec0cef7a0fcd3154e8fe392008cdcaec3a7fa3db9329
3
+ size 1158316999
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e28c8e33904fb2ddd66337c915507379f4401d7c7593aa2436ec1beff142121
3
+ size 1157050612
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:292d9a3d685837edd537d120527791a22ca4fc1b9b441e0b8ea3901e66e0b183
3
+ size 1157366887
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c96ed9271413dd61295934b27aeaf9f44468f1ea5f1ddeb9f11736d309f35de3
3
+ size 1156775505
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a343bb812026cac39a7847732b669fc476589649c704453301c4e21c3cae714
3
+ size 1157299042
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ae383541b6404a9225fc0318d88a14de6f5e09d1c8b759ee39de20fb4e3aec0
3
+ size 1157281698
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27fe6cffdb6fe925ad870ee82eb7e13e4e9f7ece8c4492f07d63fdcd3fdddc7f
3
+ size 1157327678
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c190ff9afc3c8dc24ab7781917a0d6b301eb2312ff1bdb83ac54d3ff14f077ee
3
+ size 1156911167
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa995c7793f4ed3d2c51b730f70a62ecd8c55cc979af0534eb6892755e85c402
3
+ size 1156904251
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76403c150d83e375d2987b48893bc29103604bcb69a0002de69c1a4fdc6dd256
3
+ size 1156730330
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66d296649a5e6f597d4083b22b1b7e82eaeeb558fc58d2cc534db6d9014d2ff5
3
+ size 1156541498
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dff52847a2e232c85ba190e9f636ad677a63da93e39daef00e7afcd294e281e0
3
+ size 1157766210
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e60d9c84bf265c072e37b63c7a115bec877c1206498e9e7ca1f1a03a0ded91a
3
+ size 1156031005
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50790ab3717f94afc4054caa3c203bef42beb46beb74922280b3a3c4bbbfba86
3
+ size 1156344960
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d78e121ab4b1f6431528a87c79c83c439ce0bdc0bd0a0ed13117e1e4a5935b1f
3
+ size 1155967004
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e465c66e1f6fa59ec8340f5e52948d95152ff244f2fa8b03214fe5f3b7af8cd
3
+ size 1155883852
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3650e6ee5e2dbbf0fdc90249155eef63124237b8519c857b59055785afa359dd
3
+ size 1156120094
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:093477186dcbf6ebfe7c2785659a23e9de6d8a49e5e2b602cc3e36d3dc73383c
3
+ size 1156086722
model_layers_46.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f1d0a6997c8476eba0c2b31b349874591d6a2367c43e4acaf3c211eca7c67f6
3
+ size 1155270096
model_layers_47.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c3481c3728541bb43760c2e145da5bbc7d4a4f56d545bf5a2ffce100f5bf255
3
+ size 1155692958