EPFL-VILAB
/

flextok_d18_d18_in1k

Model card Files Files and versions Community

flextok_d18_d18_in1k / README.md

roman-bachmann's picture

Initial commit

1af7e96 about 2 months ago

|

history blame contribute delete

3.28 kB

	---
	license: apple-amlr
	---

	# FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

	[`Website`](https://flextok.epfl.ch) \| [`arXiv`](https://arxiv.org/abs/2502.13967) \| [`GitHub`](https://github.com/apple/ml-flextok) \| [`🤗 Demo`](https://huggingface.co/spaces/EPFL-VILAB/FlexTok) \| [`BibTeX`](#citation)

	Official implementation and pre-trained models for: <br>
	[FlexTok: Resampling Images into 1D Token Sequences of Flexible Length](https://arxiv.org/abs/2502.13967), arXiv 2025 <br>
	[Roman Bachmann](https://roman-bachmann.github.io/)\, [Jesse Allardice](https://github.com/JesseAllardice)\, [David Mizrahi](https://dmizrahi.com/)\, [Enrico Fini](https://scholar.google.com/citations?user=OQMtSKIAAAAJ), [Oğuzhan Fatih Kar](https://ofkar.github.io/), [Elmira Amirloo](https://elamirloo.github.io/), [Alaaeldin El-Nouby](https://aelnouby.github.io/), [Amir Zamir](https://vilab.epfl.ch/zamir/), [Afshin Dehghan](https://scholar.google.com/citations?user=wcX-UW4AAAAJ)*


	## Installation
	For install instructions, please see https://github.com/apple/ml-flextok.


	## Usage

	To load the `FlexTok d18-d18 ImageNet-1k` model directly from HuggingFace Hub, call:
	```python
	from flextok.flextok_wrapper import FlexTokFromHub
	model = FlexTokFromHub.from_pretrained('EPFL-VILAB/flextok_d18_d18_in1k').eval()
	```

	The model can also be loaded by downloading the `model.safetensors` checkpoint in this repository manually and loading it using our helper functions:
	```python
	from hydra.utils import instantiate
	from flextok.utils.checkpoint import load_safetensors

	ckpt, config = load_safetensors('/path/to/model.safetensors')
	model = instantiate(config).eval()
	model.load_state_dict(ckpt)
	```

	After loading a FlexTok model, image batches can be encoded using:
	```python
	from flextok.utils.demo import imgs_from_urls
	# Load example images of shape (B, 3, 256, 256), normalized to [-1,1]
	imgs = imgs_from_urls(urls=['https://storage.googleapis.com/flextok_site/nb_demo_images/0.png'])

	# tokens_list is a list of [1, 256] discrete token sequences
	tokens_list = model.tokenize(imgs)
	```

	The list of token sequences can be truncated in a nested fashion:
	```python
	k_keep = 64 # For example, only keep the first 64 out of 256 tokens
	tokens_list = [t[:,:k_keep] for t in tokens_list]
	```

	To decode the tokens with FlexTok's rectified flow decoder, call:
	```python
	# tokens_list is a list of [1, l] discrete token sequences, with l <= 256
	# reconst is a [B, 3, 256, 256] tensor, normalized to [-1,1]
	reconst = model.detokenize(
	tokens_list,
	timesteps=20, # Number of denoising steps
	guidance_scale=7.5, # Classifier-free guidance scale
	perform_norm_guidance=True, # See https://arxiv.org/abs/2410.02416
	)
	```


	## Citation

	If you find this repository helpful, please consider citing our work:
	```
	@article{flextok,
	title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
	author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
	journal={arXiv 2025},
	year={2025},
	}
	```

	## License

	The model weights in this repository are released under the Apple Model License for Research.