--- base_model: - black-forest-labs/FLUX.1-Depth-dev base_model_relation: quantized pipeline_tag: text-to-image tags: - dfloat11 - df11 - lossless compression - 70% size, 100% accuracy --- ## DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Depth-dev` This is a **losslessly compressed** version of [`black-forest-labs/FLUX.1-Depth-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**. ### ๐Ÿ” How It Works DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint. Key benefits: * **No CPU decompression or host-device data transfer**: all operations are handled entirely on the GPU. * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments. * The compression is **fully lossless**, guaranteeing that the modelโ€™s outputs are **bit-for-bit identical** to those of the original model. ### ๐Ÿ”ง How to Use 1. Install or upgrade the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*: ```bash pip install -U dfloat11[cuda12] # or if you have CUDA version 11: # pip install -U dfloat11[cuda11] ``` 2. Install or upgrade the diffusers and image_gen_aux packages. ```bash pip install -U diffusers pip install git+https://github.com/asomoza/image_gen_aux.git ``` 3. To use the DFloat11 model, run the following example code in Python: ```python import torch from diffusers import FluxControlPipeline from diffusers.utils import load_image from image_gen_aux import DepthPreprocessor from dfloat11 import DFloat11Model pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16) DFloat11Model.from_pretrained('DFloat11/FLUX.1-Depth-dev-DF11', device='cpu', bfloat16_model=pipe.transformer) prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts." control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png") processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf") control_image = processor(control_image)[0].convert("RGB") image = pipe( prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0, generator=torch.Generator().manual_seed(42), ).images[0] image.save("output.png") ``` ### ๐Ÿ“„ Learn More * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651) * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11) * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)