LeanQuant commited on 15 days ago

Commit

a3ef3c8

verified ·

1 Parent(s): e8973e6

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +68 -0
config.json +59 -0
generation_config.json +12 -0
lm_head.safetensors +3 -0
model.safetensors +3 -0
model_embed_tokens.safetensors +3 -0
model_layers_0.safetensors +3 -0
model_layers_1.safetensors +3 -0
model_layers_10.safetensors +3 -0
model_layers_11.safetensors +3 -0
model_layers_12.safetensors +3 -0
model_layers_13.safetensors +3 -0
model_layers_14.safetensors +3 -0
model_layers_15.safetensors +3 -0
model_layers_16.safetensors +3 -0
model_layers_17.safetensors +3 -0
model_layers_18.safetensors +3 -0
model_layers_19.safetensors +3 -0
model_layers_2.safetensors +3 -0
model_layers_20.safetensors +3 -0
model_layers_21.safetensors +3 -0
model_layers_22.safetensors +3 -0
model_layers_23.safetensors +3 -0
model_layers_24.safetensors +3 -0
model_layers_25.safetensors +3 -0
model_layers_26.safetensors +3 -0
model_layers_27.safetensors +3 -0
model_layers_28.safetensors +3 -0
model_layers_29.safetensors +3 -0
model_layers_3.safetensors +3 -0
model_layers_30.safetensors +3 -0
model_layers_31.safetensors +3 -0
model_layers_32.safetensors +3 -0
model_layers_33.safetensors +3 -0
model_layers_34.safetensors +3 -0
model_layers_35.safetensors +3 -0
model_layers_36.safetensors +3 -0
model_layers_37.safetensors +3 -0
model_layers_38.safetensors +3 -0
model_layers_39.safetensors +3 -0
model_layers_4.safetensors +3 -0
model_layers_40.safetensors +3 -0
model_layers_41.safetensors +3 -0
model_layers_42.safetensors +3 -0
model_layers_43.safetensors +3 -0
model_layers_44.safetensors +3 -0
model_layers_45.safetensors +3 -0
model_layers_46.safetensors +3 -0
model_layers_47.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: meta-llama/Llama-3.3-70B-Instruct
+base_model_relation: quantized
+tags:
+- dfloat11
+- df11
+- lossless compression
+- 70% size, 100% accuracy
+---
+## DFloat11 Compressed Model: `meta-llama/Llama-3.3-70B-Instruct`
+This is a **losslessly compressed** version of [`meta-llama/Llama-3.3-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
+### 🔍 How It Works
+DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
+Key benefits:
+* **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
+* **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
+* DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
+* At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
+* The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
+### 🔧 How to Use
+1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
+    ```bash
+    pip install -U dfloat11[cuda12]
+    # or if you have CUDA version 11:
+    # pip install -U dfloat11[cuda11]
+    ```
+2. To use the DFloat11 model, run the following example code in Python:
+    ```python
+    import torch
+    from dfloat11 import DFloat11Model
+    from transformers import AutoTokenizer
+    model_id = "DFloat11/Llama-3.3-70B-Instruct-DF11"
+    model = DFloat11Model.from_pretrained(model_id, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+    prompt = "Question: What is a binary tree and its applications? Answer:"
+    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=256,
+            do_sample=True,
+        )
+    print(tokenizer.batch_decode(output, skip_special_tokens=True))
+    ```
+### 📄 Learn More
+* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
+* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
+* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)

config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "dfloat11_config": {
+    "bytes_per_thread": 8,
+    "pattern_dict": {
+      "lm_head": [],
+      "model\\.embed_tokens": [],
+      "model\\.layers\\.\\d+": [
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj"
+      ]
+    },
+    "threads_per_block": [
+      512
+    ],
+    "version": "0.2.0"
+  },
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 8192,
+  "initializer_range": 0.02,
+  "intermediate_size": 28672,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 80,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "vocab_size": 128256
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.51.3"
+}

lm_head.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46c090b4d335c7b0933c67226989e2c73f7ad9bc0eb013a9bc75c755f2d0d995
+size 1423358924

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:039ff0691e1282c8fdd29c9c944af5dc5eec942ec8e621eaf8052ab8def5cba0
+size 16504

model_embed_tokens.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:069ca889f29017156744afc0df278ba632f97fb3e0dbc2a3cc6c4b8464574df3
+size 1425659798

model_layers_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:254bb931c0f8296245d5ac799701d118b480c232534df8239e01038caefc9d8c
+size 1184264874

model_layers_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2bc645366b4864ca8b228308de8d2aae95d865904e3dfb8252b8fb36e4f39050
+size 1163448837

model_layers_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c77924d3fbffccf26b43cc70fe722f3b8984d8a99b847a6c2bcd8714960f1de
+size 1157115996

model_layers_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c642f884ee2de5059463ab469b41dd17a0ef5ae1b13c1df8fb7bb338202dd6d5
+size 1156385598

model_layers_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90d970699de0f638ea7e3ae30e6a9b256a6c419826cf892913fbd498ed15fbce
+size 1156692364

model_layers_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:daf0cf0b0e674f8642af2e2761b70b498898c219d667ccc9d4197b584b2c1ee2
+size 1157021634

model_layers_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12b1a65db878733f86d7d85c4d5b960851c0b007b6061e7832362ca94496a235
+size 1157204005

model_layers_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e42022d45a163771db7f24df14820d2b96cd4edaab5c10a13240e523d0e9ac34
+size 1156818968

model_layers_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ba876cf32dffe02e5feae9e638f57861963fae51b2327e97b1055389c0c050b
+size 1156731338

model_layers_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f23bbcdab675a751a15106f0928e5a117fef164ced1ba27ce0c04de70e1ea36
+size 1156996149

model_layers_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06cd2835997cf30ccda91e3754cba35d84f2d0efb304d3a0ede8763809a5dab7
+size 1156929280

model_layers_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ad6d1b8ec0dfeb9a98ba1aac46c5131a6910b56b81ab6fda48554a4032c9c8a
+size 1156512080

model_layers_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38c93632acc82d7d36a584026fe93d5e12c07f3d43562f341af9f706343c1562
+size 1159907766

model_layers_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d85f8dfe834ed429f1178655a290808506e4368800f3701766be171a130cd00
+size 1156025976

model_layers_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03f899813adbf0296ec832e101628aba715edd49917ed30275a0b827b17f5924
+size 1156434583

model_layers_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d77dd02dcc396de88c7577d69bc9687070e4395a587e4728f3e336fddbdacbc
+size 1156120613

model_layers_23.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a22813bc56092c8e81f9aa6854f5cb1f93d55e482992eea092911abc7523eab
+size 1156266787

model_layers_24.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a460ab01d1aab00281780807c67097a0583d3a054ae491c7d5fd45b7675cbf3
+size 1156178627

model_layers_25.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9149bf70b7c8ae9d2fdb5c2374e314408d3559e589704571a51411d92bcd0a98
+size 1156171670

model_layers_26.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85a37aaadd2bfffc7feeb6e6b848a398746bd39cb5a07e1335881e53fa526516
+size 1156040296

model_layers_27.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb8463b250b2b542e00062d122adc0aa401c1c05ee95da25750636b15ef5c563
+size 1156821884

model_layers_28.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4b6f469f831b5a5cb9c58fc027319fe25081ff064a332b2b839e1dd035d315ba
+size 1156773429

model_layers_29.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbaa0a2e27b8f5f854d7859c7e8094fbca7c0473ede6e469e9599f0b308bc21b
+size 1156749948

model_layers_3.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e68c1a74db63fed8b15ec0cef7a0fcd3154e8fe392008cdcaec3a7fa3db9329
+size 1158316999

model_layers_30.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e28c8e33904fb2ddd66337c915507379f4401d7c7593aa2436ec1beff142121
+size 1157050612

model_layers_31.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:292d9a3d685837edd537d120527791a22ca4fc1b9b441e0b8ea3901e66e0b183
+size 1157366887

model_layers_32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c96ed9271413dd61295934b27aeaf9f44468f1ea5f1ddeb9f11736d309f35de3
+size 1156775505

model_layers_33.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a343bb812026cac39a7847732b669fc476589649c704453301c4e21c3cae714
+size 1157299042

model_layers_34.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ae383541b6404a9225fc0318d88a14de6f5e09d1c8b759ee39de20fb4e3aec0
+size 1157281698

model_layers_35.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27fe6cffdb6fe925ad870ee82eb7e13e4e9f7ece8c4492f07d63fdcd3fdddc7f
+size 1157327678

model_layers_36.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c190ff9afc3c8dc24ab7781917a0d6b301eb2312ff1bdb83ac54d3ff14f077ee
+size 1156911167

model_layers_37.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa995c7793f4ed3d2c51b730f70a62ecd8c55cc979af0534eb6892755e85c402
+size 1156904251

model_layers_38.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76403c150d83e375d2987b48893bc29103604bcb69a0002de69c1a4fdc6dd256
+size 1156730330

model_layers_39.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66d296649a5e6f597d4083b22b1b7e82eaeeb558fc58d2cc534db6d9014d2ff5
+size 1156541498

model_layers_4.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dff52847a2e232c85ba190e9f636ad677a63da93e39daef00e7afcd294e281e0
+size 1157766210

model_layers_40.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e60d9c84bf265c072e37b63c7a115bec877c1206498e9e7ca1f1a03a0ded91a
+size 1156031005

model_layers_41.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50790ab3717f94afc4054caa3c203bef42beb46beb74922280b3a3c4bbbfba86
+size 1156344960

model_layers_42.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d78e121ab4b1f6431528a87c79c83c439ce0bdc0bd0a0ed13117e1e4a5935b1f
+size 1155967004

model_layers_43.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e465c66e1f6fa59ec8340f5e52948d95152ff244f2fa8b03214fe5f3b7af8cd
+size 1155883852

model_layers_44.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3650e6ee5e2dbbf0fdc90249155eef63124237b8519c857b59055785afa359dd
+size 1156120094

model_layers_45.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:093477186dcbf6ebfe7c2785659a23e9de6d8a49e5e2b602cc3e36d3dc73383c
+size 1156086722

model_layers_46.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f1d0a6997c8476eba0c2b31b349874591d6a2367c43e4acaf3c211eca7c67f6
+size 1155270096

model_layers_47.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c3481c3728541bb43760c2e145da5bbc7d4a4f56d545bf5a2ffce100f5bf255
+size 1155692958