Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms
Abstract
Tropical attention, a novel attention mechanism operating in the max-plus semiring, enhances Neural Algorithmic Reasoning models by improving out-of-distribution performance and robustness to adversarial attacks compared to softmax attention.
Dynamic programming (DP) algorithms for combinatorial optimization problems work with taking maximization, minimization, and classical addition in their recursion algorithms. The associated value functions correspond to convex polyhedra in the max plus semiring. Existing Neural Algorithmic Reasoning models, however, rely on softmax-normalized dot-product attention where the smooth exponential weighting blurs these sharp polyhedral structures and collapses when evaluated on out-of-distribution (OOD) settings. We introduce Tropical attention, a novel attention function that operates natively in the max-plus semiring of tropical geometry. We prove that Tropical attention can approximate tropical circuits of DP-type combinatorial algorithms. We then propose that using Tropical transformers enhances empirical OOD performance in both length generalization and value generalization, on algorithmic reasoning tasks, surpassing softmax baselines while remaining stable under adversarial attacks. We also present adversarial-attack generalization as a third axis for Neural Algorithmic Reasoning benchmarking. Our results demonstrate that Tropical attention restores the sharp, scale-invariant reasoning absent from softmax.
Community
It is known that softmax attention fade as sequences grow. That blur is why many attention mechanisms stumble on algorithmic and reasoning tasks. Well, we have an Algebraic geometric tropical solution.
1️⃣ We introduce Tropical Attention -- the first Neural Algorithmic reasoner that operates in the Tropical semiring, achieving SOTA OOD performance on executing several combinatorial algorithms.
2️⃣ In the Tropical (max + ) geometry, “addition” is max, “multiplication” is +.
Many algorithms already live here, carving exact polyhedral decision boundaries --> so why force them through exponential probabilities?
Let's ditch softmax, embrace the Tropical geometry.
3️⃣ Tropical Attention runs each head natively in max-plus. Result:
Strong OOD length generalization with sharp attention maps even in several algorithmic tasks, including the notorious Quickselect algorithm (Another settlement for the challenge identified by Michael Galkin).
4️⃣ We benchmarked on 11 canonical combinatorial tasks. Tropical attention beat vanilla & adaptive softmax attention on all three OOD axes, Length, value and Adversarial attack generalization.
5️⃣ We also show that each Tropical attention head can function as a tropical gate in a tropical circuit, simulating any max-plus circuit.
6️⃣ Our message: Better reasoning might come not from bigger models, but from choosing the right algebra/geometry.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper