LeanQuant commited on 23 days ago

Commit

a77c717

verified ·

1 Parent(s): 309d6c1

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +58 -0
added_tokens.json +28 -0
config.json +50 -0
generation_config.json +13 -0
lm_head.safetensors +3 -0
merges.txt +0 -0
model.safetensors +3 -0
model_embed_tokens.safetensors +3 -0
model_layers_0.safetensors +3 -0
model_layers_1.safetensors +3 -0
model_layers_10.safetensors +3 -0
model_layers_11.safetensors +3 -0
model_layers_12.safetensors +3 -0
model_layers_13.safetensors +3 -0
model_layers_14.safetensors +3 -0
model_layers_15.safetensors +3 -0
model_layers_16.safetensors +3 -0
model_layers_17.safetensors +3 -0
model_layers_18.safetensors +3 -0
model_layers_19.safetensors +3 -0
model_layers_2.safetensors +3 -0
model_layers_20.safetensors +3 -0
model_layers_21.safetensors +3 -0
model_layers_22.safetensors +3 -0
model_layers_23.safetensors +3 -0
model_layers_24.safetensors +3 -0
model_layers_25.safetensors +3 -0
model_layers_26.safetensors +3 -0
model_layers_27.safetensors +3 -0
model_layers_28.safetensors +3 -0
model_layers_29.safetensors +3 -0
model_layers_3.safetensors +3 -0
model_layers_30.safetensors +3 -0
model_layers_31.safetensors +3 -0
model_layers_32.safetensors +3 -0
model_layers_33.safetensors +3 -0
model_layers_34.safetensors +3 -0
model_layers_35.safetensors +3 -0
model_layers_36.safetensors +3 -0
model_layers_37.safetensors +3 -0
model_layers_38.safetensors +3 -0
model_layers_39.safetensors +3 -0
model_layers_4.safetensors +3 -0
model_layers_40.safetensors +3 -0
model_layers_41.safetensors +3 -0
model_layers_42.safetensors +3 -0
model_layers_43.safetensors +3 -0
model_layers_44.safetensors +3 -0
model_layers_45.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+## DFloat11 Compressed Model: `Qwen/Qwen3-32B`
+This is a **losslessly compressed** version of [`Qwen/Qwen3-32B`](https://huggingface.co/Qwen/Qwen3-32B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
+### 🔍 How It Works
+DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
+Key benefits:
+* **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
+* **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
+* DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
+* At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
+* The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
+### 🔧 How to Use
+1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
+    ```bash
+    pip install dfloat11[cuda12]
+    # or if you have CUDA version 11:
+    # pip install dfloat11[cuda11]
+    ```
+2. To use the DFloat11 model, run the following example code in Python:
+    ```python
+    import torch
+    from dfloat11 import DFloat11Model
+    from transformers import AutoTokenizer
+    model_id = "DFloat11/Qwen3-32B-DF11"
+    model = DFloat11Model.from_pretrained(model_id, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+    prompt = "Question: What is a binary tree and its applications? Answer:"
+    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=256,
+            do_sample=True,
+        )
+    print(tokenizer.batch_decode(output, skip_special_tokens=True))
+    ```
+### 📄 Learn More
+* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
+* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
+* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dfloat11_config": {
+    "bytes_per_thread": 8,
+    "pattern_dict": {
+      "lm_head": [],
+      "model.embed_tokens": [],
+      "model.layers.\\d+": [
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj"
+      ]
+    },
+    "threads_per_block": [
+      512
+    ],
+    "version": "0.2.0"
+  },
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 25600,
+  "max_position_embeddings": 40960,
+  "max_window_layers": 64,
+  "model_type": "qwen3",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "temperature": 0.6,
+  "top_k": 20,
+  "top_p": 0.95,
+  "transformers_version": "4.51.3"
+}

lm_head.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05cdfc8728ebdc1f35bb326b28678356f99173853545e3a23f6bccdfc2f15f39
+size 1054877539

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c059b9923b407bbd4d8bb5c4021b9cdb9c1ecc30b0f180ff655f8abd460475c
+size 10360

model_embed_tokens.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c53531eacab9da6c0c6a33dcaef6a4ab584577a873abd9399477d88c3a7b590b
+size 1058489203

model_layers_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a5fd08a1dad32ce0ea65d8568c81f5162f612c98f9025e8510d0f4a1472d275
+size 666102678

model_layers_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38a9da78265dca69a6770e6fdd329b0cf2f5b334c642e76b513ba59d039af028
+size 717204587

model_layers_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec48c6381a7443121cba7744c7b463c6d7ad7c1e5ad09c014dea6b61a37d6482
+size 658473845

model_layers_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:671da055943806291af58bdcbe1a51b6a926162d3573dd7cc02949190ff19ab2
+size 658505849

model_layers_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e74e88ff1d7299e9368b882340bfb7a8904540a74b03c46fc6bb4db72c173dc3
+size 659316099

model_layers_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:855173ed9c284d8261707ff30433129c457651ab49f88ea23cab4ed05040e3f5
+size 659377807

model_layers_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec5bc6b5783267e86caa7a2ec2c3323d96a91502dd5a323f8356d0ffea7a52cc
+size 660506237

model_layers_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a11554da39814f597834f1f2a65fc5587632216bbf1cc952c1b5487fec09a3fd
+size 659344819

model_layers_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b68bb7e44cb5156a70afcc0c4d621c3d2a411d1ff79ff0dbb36f287dbf46187f
+size 659677758

model_layers_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b12a89cb6eadc7d99558524fb8350ea4b61137c98badbeb1b82e68a541937735
+size 660393373

model_layers_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9d8cbbb9684cf6810b6c82b77673d5bf16222be0abb7a93937be75bc864cd60
+size 659807455

model_layers_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe33bff5372f0cfca7d06e633436829a4c74db5db7d55a4bedb716ab3357f73e
+size 659369489

model_layers_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:136f6e4d9780480f84f13f89e63d10fdab2766c03f061ce32962e4857ef10c52
+size 715739961

model_layers_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d84c5b78cd18d3b0bd7a4c082fd3770f3b85f3634fcc11ed0fb755a9cad4d7f
+size 659844174

model_layers_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e5c48cd26aa08322c01a42ce9b584eecfc252aea44bcf726f8231816b0b0e223
+size 659272504

model_layers_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25a7f2ea0168763ca7fff2eb8580f54582f635d97516898859e4ef45cff97094
+size 659355561

model_layers_23.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be50f9c9032b67c0e65e975c9477a2c78d210d27494b1b9d27f5e4c29cd87c7d
+size 659921969

model_layers_24.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a319fd466cd97bb5836feb6525028fcdd481c44beb77cd1d8ad2f6231b6a1b7
+size 659035002

model_layers_25.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:365aaea652f0113afdb4d2a6c74d04e3bc3fb690bfe477e705d802ad62dfc1f7
+size 658958886

model_layers_26.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:818bfebe6bb92aeb77f229de3bf568fafd5cb8e6a9899e668f6df746c69c6976
+size 659149200

model_layers_27.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f60caa75454b835024ff97939bf7bdda1a6563822caef76d2b7864a35dd4f070
+size 659341198

model_layers_28.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f84bd11c230343acaecc603816432cee8d4e59bcc658237181a00c56c2de9769
+size 659385659

model_layers_29.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e36a493e8a5fe173a7a97d340da65200d0c29a56d3858717c9e425969ca5c1d6
+size 659379681

model_layers_3.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20c8eed33d00bbe38b179f73b9a081b372311439ff5b6457dedb66af9921aaa8
+size 707649301

model_layers_30.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66e5174b8952f0d987dc6825696974be26603b41d6b7a909782d1d98fffb92fe
+size 661506869

model_layers_31.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8387f325d22678929e18fe9b29a7ba5d4db64aca1429f0e2bea6984dd67a85b
+size 661311896

model_layers_32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b26b67ec41f3d1f568273ab4548d2cc92c89ef71218584d26786743cee48607
+size 659766382

model_layers_33.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:71e50c6f42a65b5e6f33cbd92b18098db8d26a37f3f6493e7bbde0c2c2f537f7
+size 659905876

model_layers_34.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81f0f87e569ec108dc6fb54ba2644fc1f50016c26dfcee4f13799ead38c908ad
+size 658183158

model_layers_35.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7e3bee630e2f1eeed5b8ab30689e3a08c0cad014cb3642a733b0d7b889ff79e
+size 658247792

model_layers_36.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44989d9c01c4a252309fb02bfcaa83603d7c8fce3c93ca995da4361d77a4689a
+size 658721791

model_layers_37.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9caea1cc59bf506bffecfa83c52d34c72fe9ce8ce4530524af4ade9868784424
+size 659077121

model_layers_38.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84cf9f71cf22392b7343ec763ca889f6dfac3e29417a05ec7e1db158d29fd868
+size 659216449

model_layers_39.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb8ddf1fdc2e3ac20f46f806559b958f3af4f43f294789889c8d5f8cac5c75b4
+size 659042564

model_layers_4.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6381ffd35ffa38fd4b4c2e0f9cfcbe04f45e836ea612b1ca3feaf27c1f9f3976
+size 694879348

model_layers_40.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c34764b4233e18b9b7268057e818de90b9c3af7c39cd69ffb684bf9bacd0493
+size 659425034

model_layers_41.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ad59090816b089eec433cbe7a4b349ddd6db5a582b7809efefaedbffd27bdaa
+size 659719828

model_layers_42.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f016757764dee021dc0354047fd9c1b94d6527c50a5c39aed2fa6020a67b0c4
+size 660161683

model_layers_43.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:52b29ef94c5ec69e34242117bce7ad1ccd6228593c34bf10ea0bf478895ea03b
+size 660325959

model_layers_44.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66994cf45225b9f3ce0a1894e1a3c5d23231a62942cb5950a37b0750f05f114f
+size 660582706

model_layers_45.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ab35340c8d37aa9bd89d16ba6fa5b39d2b8f7d996b89f441a9f045ec1e1ea5b
+size 660224074