LeanQuant commited on
Commit
05a9032
·
verified ·
1 Parent(s): 76cb1cf

Add files using upload-large-folder tool

Browse files
Files changed (44) hide show
  1. .gitattributes +1 -0
  2. README.md +58 -0
  3. added_tokens.json +3 -0
  4. config.json +80 -0
  5. generation_config.json +13 -0
  6. language_model_model_layers_0.safetensors +3 -0
  7. language_model_model_layers_1.safetensors +3 -0
  8. language_model_model_layers_10.safetensors +3 -0
  9. language_model_model_layers_11.safetensors +3 -0
  10. language_model_model_layers_12.safetensors +3 -0
  11. language_model_model_layers_13.safetensors +3 -0
  12. language_model_model_layers_14.safetensors +3 -0
  13. language_model_model_layers_15.safetensors +3 -0
  14. language_model_model_layers_16.safetensors +3 -0
  15. language_model_model_layers_17.safetensors +3 -0
  16. language_model_model_layers_18.safetensors +3 -0
  17. language_model_model_layers_19.safetensors +3 -0
  18. language_model_model_layers_2.safetensors +3 -0
  19. language_model_model_layers_20.safetensors +3 -0
  20. language_model_model_layers_21.safetensors +3 -0
  21. language_model_model_layers_22.safetensors +3 -0
  22. language_model_model_layers_23.safetensors +3 -0
  23. language_model_model_layers_24.safetensors +3 -0
  24. language_model_model_layers_25.safetensors +3 -0
  25. language_model_model_layers_26.safetensors +3 -0
  26. language_model_model_layers_27.safetensors +3 -0
  27. language_model_model_layers_28.safetensors +3 -0
  28. language_model_model_layers_29.safetensors +3 -0
  29. language_model_model_layers_3.safetensors +3 -0
  30. language_model_model_layers_30.safetensors +3 -0
  31. language_model_model_layers_31.safetensors +3 -0
  32. language_model_model_layers_32.safetensors +3 -0
  33. language_model_model_layers_33.safetensors +3 -0
  34. language_model_model_layers_4.safetensors +3 -0
  35. language_model_model_layers_5.safetensors +3 -0
  36. language_model_model_layers_6.safetensors +3 -0
  37. language_model_model_layers_7.safetensors +3 -0
  38. language_model_model_layers_8.safetensors +3 -0
  39. language_model_model_layers_9.safetensors +3 -0
  40. model.safetensors +3 -0
  41. special_tokens_map.json +33 -0
  42. tokenizer.json +3 -0
  43. tokenizer.model +3 -0
  44. tokenizer_config.json +0 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## DFloat11 Compressed Model: `google/gemma-3-4b-it`
2
+
3
+ This is a **losslessly compressed** version of [`google/gemma-3-4b-it`](https://huggingface.co/google/gemma-3-4b-it) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
4
+
5
+ ### 🔍 How It Works
6
+
7
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
8
+
9
+ Key benefits:
10
+
11
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
12
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
13
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
14
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
15
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
16
+
17
+ ### 🔧 How to Use
18
+
19
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
20
+
21
+ ```bash
22
+ pip install dfloat11[cuda12]
23
+ # or if you have CUDA version 11:
24
+ # pip install dfloat11[cuda11]
25
+ ```
26
+
27
+ 2. To use the DFloat11 model, run the following example code in Python:
28
+
29
+ ```python
30
+ import torch
31
+ from dfloat11 import DFloat11Model
32
+ from transformers import AutoTokenizer
33
+
34
+ model_id = "DFloat11/gemma-3-4b-it-DF11"
35
+
36
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ tokenizer.pad_token = tokenizer.eos_token
40
+
41
+ prompt = "Question: What is a binary tree and its applications? Answer:"
42
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
43
+
44
+ with torch.no_grad():
45
+ output = model.generate(
46
+ **inputs,
47
+ max_new_tokens=256,
48
+ do_sample=True,
49
+ )
50
+
51
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
52
+ ```
53
+
54
+ ### 📄 Learn More
55
+
56
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
57
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
58
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
config.json ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Gemma3ForConditionalGeneration"
4
+ ],
5
+ "boi_token_index": 255999,
6
+ "dfloat11_config": {
7
+ "bytes_per_thread": 8,
8
+ "pattern_dict": {
9
+ "language_model.model.layers.\\d+": [
10
+ "self_attn.q_proj",
11
+ "self_attn.k_proj",
12
+ "self_attn.v_proj",
13
+ "self_attn.o_proj",
14
+ "mlp.gate_proj",
15
+ "mlp.up_proj",
16
+ "mlp.down_proj"
17
+ ]
18
+ },
19
+ "threads_per_block": [
20
+ 512
21
+ ],
22
+ "version": "0.2.0"
23
+ },
24
+ "eoi_token_index": 256000,
25
+ "eos_token_id": [
26
+ 1,
27
+ 106
28
+ ],
29
+ "image_token_index": 262144,
30
+ "initializer_range": 0.02,
31
+ "mm_tokens_per_image": 256,
32
+ "model_type": "gemma3",
33
+ "text_config": {
34
+ "attention_bias": false,
35
+ "attention_dropout": 0.0,
36
+ "attn_logit_softcapping": null,
37
+ "cache_implementation": "hybrid",
38
+ "final_logit_softcapping": null,
39
+ "head_dim": 256,
40
+ "hidden_activation": "gelu_pytorch_tanh",
41
+ "hidden_size": 2560,
42
+ "initializer_range": 0.02,
43
+ "intermediate_size": 10240,
44
+ "max_position_embeddings": 131072,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 8,
47
+ "num_hidden_layers": 34,
48
+ "num_key_value_heads": 4,
49
+ "query_pre_attn_scalar": 256,
50
+ "rms_norm_eps": 1e-06,
51
+ "rope_local_base_freq": 10000.0,
52
+ "rope_scaling": {
53
+ "factor": 8.0,
54
+ "rope_type": "linear"
55
+ },
56
+ "rope_theta": 1000000.0,
57
+ "sliding_window": 1024,
58
+ "sliding_window_pattern": 6,
59
+ "torch_dtype": "bfloat16",
60
+ "use_cache": true,
61
+ "vocab_size": 262208
62
+ },
63
+ "torch_dtype": "bfloat16",
64
+ "transformers_version": "4.51.3",
65
+ "vision_config": {
66
+ "attention_dropout": 0.0,
67
+ "hidden_act": "gelu_pytorch_tanh",
68
+ "hidden_size": 1152,
69
+ "image_size": 896,
70
+ "intermediate_size": 4304,
71
+ "layer_norm_eps": 1e-06,
72
+ "model_type": "siglip_vision_model",
73
+ "num_attention_heads": 16,
74
+ "num_channels": 3,
75
+ "num_hidden_layers": 27,
76
+ "patch_size": 14,
77
+ "torch_dtype": "bfloat16",
78
+ "vision_use_head": false
79
+ }
80
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "cache_implementation": "hybrid",
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 106
8
+ ],
9
+ "pad_token_id": 0,
10
+ "top_k": 64,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.51.3"
13
+ }
language_model_model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d96d1c18d4c910f85cdfaca653bfbeef0c278dd580735694f7abee75c3b2d41d
3
+ size 130067084
language_model_model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94b140291206b39729a80108c10ad39b7446a868014cf1f44ca93e8bdc8de40d
3
+ size 131243860
language_model_model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:312a41e6f08d6bf508384626ea3ac2184067d68670231f50f9b0f5095be36f1e
3
+ size 129889362
language_model_model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8e073cca130406b8724356539bcf82de791971efad0df9802b6d6aaf7d70684
3
+ size 131255936
language_model_model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:271b7e427e45fd879fb1e270afb5634ac5c6a44ebb9aee94635966d92310682c
3
+ size 129072774
language_model_model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:734e44ed162ef2ab001993ec00e30f946f019c1e199458050dd44c3f07830646
3
+ size 129713937
language_model_model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cdb8f6e7a199a75ffa91197af09f3703f68dcc1235d17f4841dbced277329e4
3
+ size 130166670
language_model_model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14044517d99440b9724e5433b21781ed744735d96adb123eb8fec1e6cb1454af
3
+ size 131318060
language_model_model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4b473b45d6d2ad0b617cabcc82c169126317f5d106eb9f3e79c84d51e0be0e3
3
+ size 129423821
language_model_model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf677d9648711d84b91e1c6a48e47ee2419f862e961bd44730c0b8aa7725354d
3
+ size 129659537
language_model_model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aecfc721f3577d14bbb72e4fe92b788385b0cc815a2ce85a6fa86fb756c863e6
3
+ size 129900626
language_model_model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4c7197c56985889aa6d398f3aa7a735858aa11e26c33213b857e1754186d289
3
+ size 130376486
language_model_model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:983e98ca6dbfd10a2209331665305dbdc7651d6f5ce0b800f50fdec0662ba168
3
+ size 130189406
language_model_model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b2b53aa7d9201c305918861b57c7e44ee1646b23152124dd2756f1d527e7b8a
3
+ size 130110260
language_model_model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67036e439f81ec79e6a3076bb5976a5e5e0a912416031ececb936487d902320c
3
+ size 130046228
language_model_model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf6500e71f8609251355517380e603869e186b1b87ad63b85c1db32e26f0859f
3
+ size 129728210
language_model_model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44c17c0583a9c1cce8946fe80439e7af5de0508297830ef6e1f8d046c3a91495
3
+ size 130031413
language_model_model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e88444ad556a35457d334e91060f38476cb886edfa7e3dc07e6966489077fb6a
3
+ size 130844436
language_model_model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0be9e9f850cd96b1f4072bbaa13a86b8dacc891d0850498f3963c9470176291
3
+ size 130842754
language_model_model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c712012d5cac0b3faf6c718c12b48a324aa763e8d1662df8180e123405a1085
3
+ size 130986040
language_model_model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e64a448b881750d83f5b01725ff151d1b031972a15e03394d4c76ab61d374f23
3
+ size 128983307
language_model_model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f3f163e0b5b554b8f9939a622bab5561a97d00b818c83fcb01f77040ed0ccac
3
+ size 129339139
language_model_model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:911ad3c606a9225c126ef2d7b45d5ad063c80b10e0ba9d045674e542f98c53d4
3
+ size 129636099
language_model_model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57cb796cb40f0a7dfd54f69747d83716d900f7aa5473f52e61e3a96c63b308a3
3
+ size 129269069
language_model_model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a5b1215dcc25fc97b812b4480e8c01278775cf46f481c3bb0db95aa3b6a234f
3
+ size 130210658
language_model_model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e949b6653f27749f6141510d3e3af0f7b077f553a4a8a21156749738707fcde7
3
+ size 129042385
language_model_model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3db564ac2741992a44b4c7bfdfa94c311a42341702410e2645441fcb62cef9a8
3
+ size 130803443
language_model_model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e06a57cb8b2e3c0f00a0ecc44d460de2576c4a79db85370e578e8ca4aecb19
3
+ size 129538026
language_model_model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:483e521ad8f080c6302b2f04d1f8d052d3a1264756eeb56f48bdab5f6c2449f0
3
+ size 130128696
language_model_model_layers_5.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cc6389e3d9973b6e34d3fbe9fe3b2fb3365dee4cde56cabf9c0e1e62ecf2649
3
+ size 131390778
language_model_model_layers_6.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d92d1f5a862f24f595233ccc9edee93cbd658ff01e63a0e428d19e11b4ba996a
3
+ size 129612687
language_model_model_layers_7.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44b9b1f90d39e034c66c4f3041f61c8e0c347f81a4d4c4a839e97335ccc80d13
3
+ size 129364046
language_model_model_layers_8.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e42ee77737412c45c23ce55720066bb91b67188cd8420afa16a7ef88207ee6cf
3
+ size 128874261
language_model_model_layers_9.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32a3b29c62806c01f91971a15712c5336b7e6be741a75f8b37c476e3d08e883d
3
+ size 129368394
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b013904f9a949dc3c02db07bc8d42ba27ef0eb2829bc8d8be53908efee9b333c
3
+ size 2182203576
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
3
+ size 33384568
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff