Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Abstract
The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.
Community
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Language Fusion for Parameter-Efficient Cross-lingual Transfer (2025)
- Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study (2025)
- LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy (2025)
- Lost in Literalism: How Supervised Training Shapes Translationese in LLMs (2025)
- MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities (2025)
- Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks (2025)
- Multilingual Language Model Pretraining using Machine-translated Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper