arxiv:2505.00949

Llama-Nemotron: Efficient Reasoning Models

Published on May 2

· Submitted by

akhaliq on May 5

#2 Paper of the day

Upvote

Authors:

Soumye Singhal ,

Abstract

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter 4 days ago

stefan-it

4 days ago

So nowadays, 8B models are considered as "nano" 😂

KatoStevenMubiru

3 days ago

This is incredibly impressive work by the NVIDIA team, particularly the focus on both high-level reasoning and inference efficiency in the Llama-Nemotron family, along with the commendable open release of models, data, and codebases!

At the AI Studio in Uganda (backed by the Office of the President), we're leading Project Crane, developing sovereign, culturally-grounded LLMs for our nation's diverse context (40+ languages). We're currently fine-tuning Gemma 3 models with support from the Google DeepMind team and building the Ugandan Cultural Context Benchmark (UCCB) for evaluation.

The efficiency focus of Llama-Nemotron, especially the smaller Nano (8B) variant, and its open license are highly relevant to our goals of deploying capable models on local, potentially resource-constrained infrastructure (Afriqloud). Adapting such powerful reasoning models for low-resource languages and specific cultural nuances is the core challenge we're tackling.

We're closely following this work and exploring how models like Llama-Nemotron could potentially become part of the "Crane" family in the future. It's inspiring to see architectures optimized for both performance and accessibility. Congratulations on this significant contribution to the open-source community! We hope initiatives like ours in Africa can benefit from and potentially contribute back to ecosystems built around powerful open models like Llama-Nemotron.

Kato Steven Mubiru
Project Lead, Crane Gemma ( AI Studio, Uganda)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Llama-Nemotron: Efficient Reasoning Models

Abstract

Community

Models citing this paper 4

Datasets citing this paper 1

Spaces citing this paper 15

Collections including this paper 3