arxiv:2505.09264

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Published on May 14

· Submitted by

csgaobb on May 16

Upvote

Authors:

Bin-Bin Gao

Abstract

Unsupervised reconstruction networks using self-attention transformers have achieved state-of-the-art performance for multi-class (unified) anomaly detection with a single model. However, these self-attention reconstruction models primarily operate on target features, which may result in perfect reconstruction for both normal and anomaly features due to high consistency with context, leading to failure in detecting anomalies. Additionally, these models often produce inaccurate anomaly segmentation due to performing reconstruction in a low spatial resolution latent space. To enable reconstruction models enjoying high efficiency while enhancing their generalization for unified anomaly detection, we propose a simple yet effective method that reconstructs normal features and restores anomaly features with just One Normal Image Prompt (OneNIP). In contrast to previous work, OneNIP allows for the first time to reconstruct or restore anomalies with just one normal image prompt, effectively boosting unified anomaly detection performance. Furthermore, we propose a supervised refiner that regresses reconstruction errors by using both real normal and synthesized anomalous images, which significantly improves pixel-level anomaly segmentation. OneNIP outperforms previous methods on three industry anomaly detection benchmarks: MVTec, BTAD, and VisA. The code and pre-trained models are available at https://github.com/gaobb/OneNIP.

View arXiv page View PDF GitHub repository Add to collection

Community

csgaobb

Paper author Paper submitter 1 day ago

OneNIP mainly consists of Unsupervised Reconstruction, Unsupervised Restoration, and Supervised Refiner. Unsupervised Reconstruction and Unsupervised Restoration share the same encoder-decoder architectures and weights. Supervised Refiner is implemented by two transposed convolution blocks, each followed by a 1×1 convolution layer.

Unsupervised Reconstruction reconstructs normal tokens;
Unsupervised Restoration restores pseudo anomaly tokens to the corresponding normal tokens;
Supervised Refiner refines reconstruction/restoration errors to achieve more accurate anomaly segmentation.

librarian-bot

about 4 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.09264 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.09264 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.09264 in a Space README.md to link it from this page.