Abstract
REOrder discovers task-optimal patch orderings for long-sequence transformers, significantly improving accuracy over traditional ordering methods.
Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.
Community
This paper introduces a method of finding optimal orderings of patches in a linearized sequence for long sequence vision transformers.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling (2025)
- LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers (2025)
- Distilling semantically aware orders for autoregressive image generation (2025)
- A 2D Semantic-Aware Position Encoding for Vision Transformers (2025)
- MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention (2025)
- Stronger ViTs With Octic Equivariance (2025)
- Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper