--- library_name: transformers pipeline_tag: text-generation license: apache-2.0 tags: - bd3lm - diffusion - autoregressive - language-modeling --- # Block Diffusion Interpolates Between Autoregressive and Diffusion Language Models (ICLR 2025 Oral) By [Marianne Arriola](https://m-arriola.com/), [Aaron Gokaslan](https://skylion007.github.io), [Justin T Chiu](https://justinchiu.netlify.app), [Zhihan Yang](https://zhihanyang2022.github.io/), [Zhixuan Qi](https://zhixuanqi.com/), [Jiaqi Han](https://hanjq17.github.io/), [Subham Sekhar Sahoo](https://s-sahoo.github.io), [Volodymyr Kuleshov](https://www.cs.cornell.edu/~kuleshov/) [![Paper](https://img.shields.io/badge/Paper_📃-green)](https://arxiv.org/abs/2503.09573) [![GitHub](https://img.shields.io/badge/GitHub_🧑‍💻-blue)](https://github.com/kuleshov-group/bd3lms) [![Blog](https://img.shields.io/badge/Blog_📝%20%20-8A2BE2)](https://m-arriola.com/bd3lms/) [![HuggingFace](https://img.shields.io/badge/HuggingFace_🤗%20-BD3LMs%20-orange)](https://huggingface.co/collections/kuleshov-group/bd3-lms-67be95f81b96b15fec50d53f) We introduce ***BD3-LMs***, a family of **B**lock **D**iscrete **D**enoising **D**iffusion **L**anguage **M**odels that achieve SOTA likelihoods among diffusion models and enable generation of arbitrary-length sequences. BD3-LMs combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block. By tuning the block size, we interpolate between autoregressive and diffusion models which introduces a trade-off between quality and sample efficiency. We propose a recipe of building effective BD3-LMs that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. ## Model Description BD3-LMs are Block Discrete Denoising Diffusion Language Models. They combine the strengths of autoregressive and diffusion language models by decomposing a token sequence into blocks and performing discrete diffusion within each block. ## How to use See our [GitHub README](https://github.com/kuleshov-group/bd3lms), where we provide sample scripts for training, likelihood evaluation, and generation. ## Citation ``` @inproceedings{ arriola2025block, title={Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models}, author={Marianne Arriola and Aaron Gokaslan and Justin T Chiu and Zhihan Yang and Zhixuan Qi and Jiaqi Han and Subham Sekhar Sahoo and Volodymyr Kuleshov}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://arxiv.org/abs/2503.09573} } ```