--- base_model: - black-forest-labs/FLUX.1-Canny-dev base_model_relation: quantized pipeline_tag: text-to-image tags: - dfloat11 - df11 - lossless compression - 70% size, 100% accuracy --- ## DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Canny-dev` This is a **losslessly compressed** version of [`black-forest-labs/FLUX.1-Canny-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**. ### ๐Ÿ” How It Works DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint. Key benefits: * **No CPU decompression or host-device data transfer**: all operations are handled entirely on the GPU. * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments. * The compression is **fully lossless**, guaranteeing that the modelโ€™s outputs are **bit-for-bit identical** to those of the original model. ### ๐Ÿ”ง How to Use 1. Install or upgrade the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*: ```bash pip install -U dfloat11[cuda12] # or if you have CUDA version 11: # pip install -U dfloat11[cuda11] ``` 2. Install or upgrade the diffusers and controlnet_aux packages. ```bash pip install -U diffusers controlnet_aux ``` 3. To use the DFloat11 model, run the following example code in Python: ```python import torch from controlnet_aux import CannyDetector from diffusers import FluxControlPipeline from diffusers.utils import load_image from dfloat11 import DFloat11Model pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16) pipe.enable_model_cpu_offload() DFloat11Model.from_pretrained('DFloat11/FLUX.1-Canny-dev-DF11', device='cpu', bfloat16_model=pipe.transformer) prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts." control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png") processor = CannyDetector() control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024) image = pipe( prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0, ).images[0] image.save("output.png") ``` ### ๐Ÿ“„ Learn More * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651) * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11) * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)