LeanQuant commited on 23 days ago

Commit

11cc13f

verified ·

1 Parent(s): 5ffb283

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +58 -0
config.json +48 -0
generation_config.json +9 -0
lm_head.safetensors +3 -0
model.safetensors +3 -0
model_embed_tokens.safetensors +3 -0
model_layers_0.safetensors +3 -0
model_layers_1.safetensors +3 -0
model_layers_10.safetensors +3 -0
model_layers_11.safetensors +3 -0
model_layers_12.safetensors +3 -0
model_layers_13.safetensors +3 -0
model_layers_14.safetensors +3 -0
model_layers_15.safetensors +3 -0
model_layers_16.safetensors +3 -0
model_layers_17.safetensors +3 -0
model_layers_18.safetensors +3 -0
model_layers_19.safetensors +3 -0
model_layers_2.safetensors +3 -0
model_layers_20.safetensors +3 -0
model_layers_21.safetensors +3 -0
model_layers_22.safetensors +3 -0
model_layers_23.safetensors +3 -0
model_layers_24.safetensors +3 -0
model_layers_25.safetensors +3 -0
model_layers_26.safetensors +3 -0
model_layers_27.safetensors +3 -0
model_layers_28.safetensors +3 -0
model_layers_29.safetensors +3 -0
model_layers_3.safetensors +3 -0
model_layers_30.safetensors +3 -0
model_layers_31.safetensors +3 -0
model_layers_32.safetensors +3 -0
model_layers_33.safetensors +3 -0
model_layers_34.safetensors +3 -0
model_layers_35.safetensors +3 -0
model_layers_36.safetensors +3 -0
model_layers_37.safetensors +3 -0
model_layers_38.safetensors +3 -0
model_layers_39.safetensors +3 -0
model_layers_4.safetensors +3 -0
model_layers_40.safetensors +3 -0
model_layers_41.safetensors +3 -0
model_layers_42.safetensors +3 -0
model_layers_43.safetensors +3 -0
model_layers_44.safetensors +3 -0
model_layers_45.safetensors +3 -0
model_layers_46.safetensors +3 -0
model_layers_47.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+## DFloat11 Compressed Model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`
+This is a **losslessly compressed** version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
+### 🔍 How It Works
+DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
+Key benefits:
+* **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
+* **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
+* DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
+* At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
+* The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
+### 🔧 How to Use
+1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
+    ```bash
+    pip install dfloat11[cuda12]
+    # or if you have CUDA version 11:
+    # pip install dfloat11[cuda11]
+    ```
+2. To use the DFloat11 model, run the following example code in Python:
+    ```python
+    import torch
+    from dfloat11 import DFloat11Model
+    from transformers import AutoTokenizer
+    model_id = "DFloat11/DeepSeek-R1-Distill-Qwen-32B-DF11"
+    model = DFloat11Model.from_pretrained(model_id, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+    prompt = "Question: What is a binary tree and its applications? Answer:"
+    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=256,
+            do_sample=True,
+        )
+    print(tokenizer.batch_decode(output, skip_special_tokens=True))
+    ```
+### 📄 Learn More
+* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
+* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
+* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dfloat11_config": {
+    "bytes_per_thread": 8,
+    "pattern_dict": {
+      "lm_head": [],
+      "model.embed_tokens": [],
+      "model.layers.\\d+": [
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj"
+      ]
+    },
+    "threads_per_block": [
+      512
+    ],
+    "version": "0.2.0"
+  },
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 27648,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 64,
+  "model_type": "qwen2",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 131072,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 151646,
+  "do_sample": true,
+  "eos_token_id": 151643,
+  "temperature": 0.6,
+  "top_p": 0.95,
+  "transformers_version": "4.51.3"
+}

lm_head.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:262f0cdad054005a32309fe1b157f7b426a07ae903ebf7b736813de8d3ee2003
+size 1056885536

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7372e0345e6413aeb3caed3266d35640bbb13159a20f9349bafd50c3c2ee1cb1
+size 10360

model_embed_tokens.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3d754feea1cfb7eb41cbd1bcd18055f8313dc53c7fad0d36bea999143711ac1
+size 1073106128

model_layers_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28e556d9a487202fc83e1eaff3a5f88abbb36aebcade45c708c9d9f737ffdd5f
+size 662441978

model_layers_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c3a2af66877831e05bf8ee4c0f58d11c795051c34482e79704fff5fe964d132c
+size 725166597

model_layers_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d4bed358e0658a03c157ba5ced51ecb132c101106a09f942177035c425c3c5e
+size 659979946

model_layers_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:228cd95e46bd52cfbfae70e04933ed9d7dc41a82888dc0f47199429121b386ba
+size 659918803

model_layers_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88ced64d84d3d6918ec19e6152e5e02e7658d913e6af1ec6907cf2524f0ee0bc
+size 659389766

model_layers_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f24235a9337b972a840f21defe923b903c5d4c2975c8085ef3cb9c36034e8ec7
+size 659693888

model_layers_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63d8de556cbea7c1dbaa89ffde95581537182afba37d3038cdb7d2421f18f647
+size 659693904

model_layers_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b46adbc65842166dcdbd3ab5438a329640ff71ca7be7a8a492961e10113c50b6
+size 660158903

model_layers_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:00ea785666a9f7e9ff8e91dfe6ffdba58fea79f5cb7593905c908c171ecaf9d3
+size 659882224

model_layers_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbc546e926d8063087dbaa6da4dba3745b0aa30d441050cde8059b7427596a76
+size 659742874

model_layers_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a78c4bce9f8293d5882b3761ac222cdd7ffb370ca66fbdf66988c80a0bab6d24
+size 659930747

model_layers_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80d914f4114c2c54493cf69414996bcb071f6427f4937d80109b2ca810e79906
+size 660045727

model_layers_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f00ad02e65119dead518589a1e2ac382470d906980dd58305ff3f5be3b3d001e
+size 720484054

model_layers_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b2f59693b89676df3d8e07556651cc93004c0ac81bd6233c94ba8fc1765e492
+size 660008541

model_layers_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e9a00c2a3a819c691b61a3b6344b075ad2625ec0d8e932560a5e33f91bb21fc
+size 660025414

model_layers_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27431a21b432735837d72d8ea0748b2e923d92e1ef2cfb343574a34230d92011
+size 660041687

model_layers_23.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0a23c6d94ba63a1f33f6f5d461c50720c46bc5c4f1213736db9af57386b3a57
+size 660102695

model_layers_24.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b52a4c49ae539bfa7309a56904b088b99a65717bed4662da1b21e5d71509f49
+size 659879208

model_layers_25.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4631ddf9a3a228ffde06d53798672e25694e19b68b64848cfba5795dbbe92e72
+size 659860281

model_layers_26.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd35a238487ac3a1c05f3afb84ef9abd5f4785103a259fbd8af28aad4420c679
+size 659937778

model_layers_27.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01bf4b3141223368471cea3580de1fced077ac5331ba91b18328b8841e71ff37
+size 660095500

model_layers_28.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb52dc9d522c9833b67c0874e9dbf6dc272a3bd808a05ec3307a29a34144152a
+size 660349657

model_layers_29.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:506f46a23fe2f36ffe27850a3bb927403b5c5c5d4a6d2f8978d5ed65cc66be20
+size 660238782

model_layers_3.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2eedd33b98f5da3123d346e92dfc9e25ea72b4709419a6f4459b4643a416b32
+size 716178698

model_layers_30.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba1b2e12aea9e57c8fa96260082383573acbcd283a7d00b73bdc1f2786f89990
+size 660154409

model_layers_31.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05718dfa286672dc4b2b71f7b040c26e5d33326ad4350683f2253d03fcf9ed2b
+size 660085711

model_layers_32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ecb318a215c56dade952ebf4c513bf17d640508e8a9bf4bd6161041a895591b
+size 660131842

model_layers_33.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f5d38b55219e05bdd43a13eaf0c07321c4541554a7d0008b908773dadd02690
+size 659999114

model_layers_34.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a661ad479392473b13e4733fbe0f5389646ad7a41814da558f97e3e5747e69ff
+size 660178412

model_layers_35.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f0d3a8d010aaaea9fc638af2939e807adc7a7b72be8fe7f70f51fe7ade0d0ec
+size 660284891

model_layers_36.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33a0057ad0e1d13732a8758f50c30610c45673e0af91a505a9a4d616b98d35d1
+size 660306945

model_layers_37.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5326682e243820a870ac9e28303b01c3fdfc5363c49d5bf334b50c5fd115967
+size 660254993

model_layers_38.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:876242355e5f0ce8cdedd1256f581b18811924c3f96fa776529755b05cc2ca3d
+size 660325297

model_layers_39.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36de3203a71c654b83266fff9e24d6a455ad22c772d7bbce21821f7108d80251
+size 660467243

model_layers_4.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bad41f176b4e3b871369a0a03bd58bc89d0bcb6bccb4c4fa2a06011df0709cf
+size 714368386

model_layers_40.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60bb8269125282554b424d5a8061254d423d032ebe64e4b32c3d75e389bfebff
+size 660206507

model_layers_41.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64704696897292329cf325060769a67c4d159cac3c98b8c186f513f5fb7ed8ab
+size 660317617

model_layers_42.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa780b39fd43aac49e6e728c64a183394bb171a76e5466e658b551c9f267b92b
+size 660258726

model_layers_43.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1075e1d05911b07102d9b931046b21b67234560302a378cbe78310b2d4a7ade0
+size 660384825

model_layers_44.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc1ff11c68214120aa1b62b261af4befa3a82aa86b2613f1b68be59afdf1500c
+size 660669818

model_layers_45.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d442edc5540a500225a803d9c0d72ec2c88ef2c6dbb82153ac70cdd74e82b5b
+size 660444923

model_layers_46.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8ef6022314f23eb41b92cf1c1904ff9f89af535e82b8b573e0fa2dae81186ea
+size 660008546

model_layers_47.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:640485599adf9fcda532463cc0beb916fcacd7e38b3e65bd5e365fd1fda76f61
+size 659742991