LeanQuant commited on 22 days ago

Commit

52fa776

verified ·

1 Parent(s): 872e9cb

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +68 -0
added_tokens.json +24 -0
config.json +48 -0
generation_config.json +14 -0
lm_head.safetensors +3 -0
merges.txt +0 -0
model.safetensors +3 -0
model_embed_tokens.safetensors +3 -0
model_layers_0.safetensors +3 -0
model_layers_1.safetensors +3 -0
model_layers_10.safetensors +3 -0
model_layers_11.safetensors +3 -0
model_layers_12.safetensors +3 -0
model_layers_13.safetensors +3 -0
model_layers_14.safetensors +3 -0
model_layers_15.safetensors +3 -0
model_layers_16.safetensors +3 -0
model_layers_17.safetensors +3 -0
model_layers_18.safetensors +3 -0
model_layers_19.safetensors +3 -0
model_layers_2.safetensors +3 -0
model_layers_20.safetensors +3 -0
model_layers_21.safetensors +3 -0
model_layers_22.safetensors +3 -0
model_layers_23.safetensors +3 -0
model_layers_24.safetensors +3 -0
model_layers_25.safetensors +3 -0
model_layers_26.safetensors +3 -0
model_layers_27.safetensors +3 -0
model_layers_28.safetensors +3 -0
model_layers_29.safetensors +3 -0
model_layers_3.safetensors +3 -0
model_layers_30.safetensors +3 -0
model_layers_31.safetensors +3 -0
model_layers_32.safetensors +3 -0
model_layers_33.safetensors +3 -0
model_layers_34.safetensors +3 -0
model_layers_35.safetensors +3 -0
model_layers_36.safetensors +3 -0
model_layers_37.safetensors +3 -0
model_layers_38.safetensors +3 -0
model_layers_39.safetensors +3 -0
model_layers_4.safetensors +3 -0
model_layers_40.safetensors +3 -0
model_layers_41.safetensors +3 -0
model_layers_42.safetensors +3 -0
model_layers_43.safetensors +3 -0
model_layers_44.safetensors +3 -0
model_layers_45.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: Qwen/Qwen2.5-14B-Instruct
+base_model_relation: quantized
+tags:
+- dfloat11
+- df11
+- lossless compression
+- 70% size, 100% accuracy
+---
+## DFloat11 Compressed Model: `Qwen/Qwen2.5-14B-Instruct`
+This is a **losslessly compressed** version of [`Qwen/Qwen2.5-14B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
+### 🔍 How It Works
+DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
+Key benefits:
+* **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
+* **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
+* DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
+* At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
+* The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
+### 🔧 How to Use
+1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
+    ```bash
+    pip install -U dfloat11[cuda12]
+    # or if you have CUDA version 11:
+    # pip install -U dfloat11[cuda11]
+    ```
+2. To use the DFloat11 model, run the following example code in Python:
+    ```python
+    import torch
+    from dfloat11 import DFloat11Model
+    from transformers import AutoTokenizer
+    model_id = "DFloat11/Qwen2.5-14B-Instruct-DF11"
+    model = DFloat11Model.from_pretrained(model_id, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+    prompt = "Question: What is a binary tree and its applications? Answer:"
+    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=256,
+            do_sample=True,
+        )
+    print(tokenizer.batch_decode(output, skip_special_tokens=True))
+    ```
+### 📄 Learn More
+* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
+* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
+* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dfloat11_config": {
+    "bytes_per_thread": 8,
+    "pattern_dict": {
+      "lm_head": [],
+      "model.embed_tokens": [],
+      "model.layers.\\d+": [
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj"
+      ]
+    },
+    "threads_per_block": [
+      512
+    ],
+    "version": "0.2.0"
+  },
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 13824,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 70,
+  "model_type": "qwen2",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 131072,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.05,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.51.3"
+}

lm_head.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05c23c266ab8ea4644d474c06e97827a752fb1f71278a9e9de9a82aa30ece442
+size 1056687515

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d196a7586fb627a3f91bd4ba6516d0a540caa7dbd59bf61dba46152ae03615e7
+size 10360

model_embed_tokens.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8353b9640344a9a3e469c469f63dd53c7a10fc48ff6d55e3145ce6819af9d616
+size 1071180380

model_layers_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89e8e247741a45c2806ed365cf56033a1e069111d5c502a05677180ea7f8cb7b
+size 373256706

model_layers_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ff6b3837db0433d7fa31a31253f48d7d69584c2db3b76eeaeb251fa5e2f2c4c
+size 406484748

model_layers_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c94c0f3e035aa6301ba460ed43d175c1f435607cd1ace3402b2997dd4bda952
+size 372391092

model_layers_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8fc7799740d5b3b9da81b9ab89b0893fb28a504608da458d4e60efc46b75f81
+size 372482539

model_layers_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33a2f05170f6084759501607f221776404e68d3dc3549a5af13a6dbd1feb5daf
+size 372257266

model_layers_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10c6dd64dfbb2ef812b6c197a894d623099f503bb1c76e85b95b75eb19fed2b0
+size 372354324

model_layers_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3f82ad92357de639b0394ad09b0395a6cd0edf8adf62587220a26530f3c207d
+size 372422278

model_layers_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:188b9e48683e000a4bdca3e06e3712ba5454cdedfae32133a60942bc7e6e7bae
+size 372477370

model_layers_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f065eda0d8b4170ca6c90d5b74ac76d19344fd0e25b48e6efaae3b8a92b32b88
+size 372648557

model_layers_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e92796bb910725ad4bf28f4f5852f21a9298486017c9380cecbfa044ebae9b8
+size 372665588

model_layers_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f18ac6251648b12869b59658507b1e7ba53e407833f133afd3595569a3f0dcec
+size 372733375

model_layers_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb23cd851fe215da05db5929082df230202423abd09c3516847ee21752d7a619
+size 372792975

model_layers_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32c1700f4af40e3d7df4ec6927211cdd2c4375add31f0548d5f8d380b863ba43
+size 408064978

model_layers_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e5f8080181b742e03948bd71870a958a0eae96944b7fb02956809e304af6777
+size 372860519

model_layers_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6315d4d7faf9d6a433b218484c3f4f23407dd25a285df923543bd9887fec1f2b
+size 372801120

model_layers_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82471f9614e94faeee356b57defcfd84698c4fade71c8961b11359edcb86b275
+size 372777734

model_layers_23.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a66679017a5da11381d933a0c6ef6c02f34992b0cf90ef4d09e797983e74a810
+size 372808801

model_layers_24.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85e3a588571bf0a83d83519b16627118b53e23648a6d9501155c5051bc5ecb6b
+size 372841217

model_layers_25.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd4a0d0c20f1418ea3137c4f140c548494fb400c14a9d0c8d8a57075c690d580
+size 372902112

model_layers_26.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1cee3cc9da1014eac2871fd331653836a3e8d3bc57cdb6f563754c1dff4297c
+size 372861926

model_layers_27.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30af40cb5bfb0840c8eac112feb4eba36d1891792788d9fbfa372f220cd395fa
+size 372893037

model_layers_28.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:527e740b05b83b0455d71db30b2a8daae3ccfd8ca1d42352cfa147925e038f5f
+size 372931412

model_layers_29.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:163d638f51f3a491cd9a5c4b59aeb7c7e665df0577168c5a7b0a33b47c3bdc1f
+size 372905593

model_layers_3.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31e1582a1f4823770adf416b2f80a6429a9a39518c1b483fcadbcd8b03319c2a
+size 402083011

model_layers_30.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e08bef49ac321aab4a18bf3f6b739dee71b94c0831adf719943ae7931188c27
+size 372682758

model_layers_31.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:269e5cb7709b7de674f02100090fae9251cafdedc9e87862aa0c356cf17b6292
+size 372557390

model_layers_32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eac675a0b93391a0e3db03f61630baac58934b8ed2ec5a98068c05e7a1ca192e
+size 372525663

model_layers_33.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e56520584d271b2de80672470fe5ee0c7bc85d78df1abe157e4f067a9f21f2b
+size 372567622

model_layers_34.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69f3beb29c17177f1e4fba35e40dd08b7432ef46325fcaee4bfb6c8d39d4298a
+size 372529440

model_layers_35.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fee8a7fdf7385abc6b8ddbc89f8f2a64c1fb31a235edfae088faaf13f3eccee5
+size 372514582

model_layers_36.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a878f163989c9d5055de12d9be4341536c99d2d0d8886f9411ab2ed2f56fff8
+size 372581465

model_layers_37.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:485583d61114b831ba8d3bbed11e46b1b32560fc13a7d4231ebf3e82f69606d2
+size 372347205

model_layers_38.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d94ed8520cf038fd85bb775e1c8cedca58b62e9d15ae2f0c995c37d17a77757
+size 372291387

model_layers_39.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc2c02ca1bc18e7254992e60a34586fc72ae6336de83680875d03c19328e4297
+size 372303952

model_layers_4.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93ecf2959477739310fe2d0dd0f4d96bf0470d25d2ac6b3a04b711df637a1cc5
+size 399836153

model_layers_40.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26f72f5f5d292dd2430c0275913df1efb5adac9d85418a8f997ade5482dec935
+size 372319664

model_layers_41.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64f626e77db75ebc2fd212ff01fcbf98ce14d1aff47e5bcb45d17f16d2906ae1
+size 372297297

model_layers_42.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4598111fd672c816e58b80895ae2ee8ec3e3f6e60343d45d343f75d1de4648c
+size 372344250

model_layers_43.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c3fb2b0aa4c8320b55da3d9305eb4556f84e077140e714549e3df653c91fd3d
+size 372583922

model_layers_44.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3007b96538a23a77fdba0abd873c69e0b87137d4d88a28ea004b419fc5fa40a2
+size 372679269

model_layers_45.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8919f255f3556263604aabd101b0710e7662ad20516bc2395bbb43761a11843
+size 372710655