Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing
Abstract
RankNovo is a deep reranking framework that enhances de novo peptide sequencing using multiple models and axial attention, achieving superior performance and generalization.
De novo peptide sequencing is a critical task in proteomics. However, the performance of current deep learning-based methods is limited by the inherent complexity of mass spectrometry data and the heterogeneous distribution of noise signals, leading to data-specific biases. We present RankNovo, the first deep reranking framework that enhances de novo peptide sequencing by leveraging the complementary strengths of multiple sequencing models. RankNovo employs a list-wise reranking approach, modeling candidate peptides as multiple sequence alignments and utilizing axial attention to extract informative features across candidates. Additionally, we introduce two new metrics, PMD (Peptide Mass Deviation) and RMD (residual Mass Deviation), which offer delicate supervision by quantifying mass differences between peptides at both the sequence and residue levels. Extensive experiments demonstrate that RankNovo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, RankNovo exhibits strong zero-shot generalization to unseen models whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. Our work presents a novel reranking strategy that fundamentally challenges existing single-model paradigms and advances the frontier of accurate de novo sequencing. Our source code is provided on GitHub.
Community
Reranking bio sequences using MSA transformer to obtain one most desired one
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment (2025)
- Bidirectional Hierarchical Protein Multi-Modal Representation Learning (2025)
- ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings (2025)
- Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation (2025)
- Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations (2025)
- Foundation model for mass spectrometry proteomics (2025)
- iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper