DFloat11 Compressed Model: ByteDance-Seed/BAGEL-7B-MoT
This model uses DFloat11 lossless compression. It's 32% smaller than the original BFloat16 model, yet produces bit-identical outputs and runs efficiently on GPUs.
π Performance Comparison
Metric | BAGEL-7B-MoT (BFloat16) | BAGEL-7B-MoT (DFloat11) |
---|---|---|
Model Size | 29.21 GB | 19.89 GB |
Peak GPU Memory (1024x1024 image generation) |
30.07 GB | 21.76 GB |
Generation Time (on an A100 GPU) |
54 seconds | 58 seconds |
π How It Works
We apply Huffman coding to the exponent bits of BFloat16 model weights, which are highly compressible. We leverage hardware-aware algorithmic designs to enable highly efficient, on-the-fly weight decompression directly on the GPU. Find out more in our research paper.
π§ How to Use
A complete usage guide is available in our GitHub repository (forked from the official Bagel repository): https://github.com/LeanModels/Bagel-DFloat11.
π Learn More
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support