facebook
/

blt

Model card Files Files and versions Community

blt / README.md

par-meta's picture

Update README.md

1c7101b verified 12 days ago

|

history blame contribute delete

2.19 kB

	---
	license: other
	license_name: fair-noncommercial-research-license
	license_link: https://huggingface.co/facebook/blt/blob/main/LICENSE
	extra_gated_fields:
	First Name: text
	Last Name: text
	Date of birth: date_picker
	Country: country
	Affiliation: text
	I accept the terms and conditions: checkbox
	geo: ip_location
	language:
	- en
	tags:
	- facebook
	- meta-pytorch
	- blt
	---

	# Byte Latent Transformer (BLT)

	This repository contains the model weights for our paper: "Byte Latent Transformer: Patches Scale Better Than Tokens"

	- [Paper Link](https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf)
	- [HF Paper Link](https://huggingface.co/papers/2412.09871)

	## Abstract

	We introduce the Byte Latent Transformer architecture (BLTs), a new byte-level LLM architecture that
	for the first time, matches tokenization-based LLM performance at scale, with significant improvements
	in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve
	as the primary units of computation. Patches are segmented dynamically based on the entropy of the
	next byte, allocating more compute and model capacity where there is more data complexity. The BLT
	architecture includes new attention mechanisms to maximize the information flow between byte and
	patch hidden representations and a new type of byte-sequence memory. We present the first scaling
	study of byte-level models up to 8B parameters and 8T training bytes, showing for the first time
	that we can train a model end-to-end at scale from bytes with no tokenization or other preprocessing.
	Scaling trends reveal training and inference efficiency benefits from dynamically selecting very long
	patches on average, along with qualitative improvements with reasoning and long tail generalization
	from modeling byte-sequences.

	To run the model, see the readme here: https://github.com/facebookresearch/blt

	## Links

	- Code: https://github.com/facebookresearch/blt
	- BLT 1B Weights: https://huggingface.co/facebook/blt-1b
	- BLT 7B Weights: https://huggingface.co/facebook/blt-7b
	- BLT Weight Collection: https://huggingface.co/collections/facebook/blt-6801263d4ac1704702a192a6