DFloat11/FLUX.1-Fill-dev-DF11

DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Fill-dev`

This is a losslessly compressed version of black-forest-labs/FLUX.1-Fill-dev using our custom DFloat11 format. The outputs of this compressed model are bit-for-bit identical to the original BFloat16 model, while reducing GPU memory consumption by approximately 30%.

🔍 How It Works

DFloat11 compresses model weights using Huffman coding of BFloat16 exponent bits, combined with hardware-aware algorithmic designs that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are decompressed just before matrix multiplications, then immediately discarded after use to minimize memory footprint.

Key benefits:

No CPU decompression or host-device data transfer: all operations are handled entirely on the GPU.
DFloat11 is much faster than CPU-offloading approaches, enabling practical deployment in memory-constrained environments.
The compression is fully lossless, guaranteeing that the model’s outputs are bit-for-bit identical to those of the original model.

🔧 How to Use

Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
```
pip install -U dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install -U dfloat11[cuda11]
```
Install or upgrade the diffusers package.
```
pip install -U diffusers
```

To use the DFloat11 model, run the following example code in Python:

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
from dfloat11 import DFloat11Model

image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

DFloat11Model.from_pretrained('DFloat11/FLUX.1-Fill-dev-DF11', device='cpu', bfloat16_model=pipe.transformer)

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    guidance_scale=30,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"flux-fill-dev.png")

📄 Learn More

Paper: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
GitHub: https://github.com/LeanModels/DFloat11
HuggingFace: https://huggingface.co/DFloat11

DFloat11
/

FLUX.1-Fill-dev-DF11

DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Fill-dev`

🔍 How It Works

🔧 How to Use

📄 Learn More

Model tree for DFloat11/FLUX.1-Fill-dev-DF11

Collection including DFloat11/FLUX.1-Fill-dev-DF11

DFloat11 | FLUX.1

DFloat11 Compressed Model: black-forest-labs/FLUX.1-Fill-dev

🔍 How It Works

🔧 How to Use

📄 Learn More

Model tree for DFloat11/FLUX.1-Fill-dev-DF11

Collection including DFloat11/FLUX.1-Fill-dev-DF11

DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Fill-dev`