Nora
Nora is an open vision-language-action model trained on robot manipulation episodes from the Open X-Embodiment dataset. The model takes language instructions and camera images as input and generates robot actions. Nora is trained directly from Qwen 2.5 VL-3B. All Nora checkpoints, as well as our training codebase are released under an MIT License.
Model Description
- Model type: Vision-language-action (language, image => robot actions)
- Language(s) (NLP): english
- License: MIT
- Finetuned from model : Qwen 2.5 VL-3B
Model Sources
- Repository: https://github.com/declare-lab/nora
- Paper : https://www.arxiv.org/abs/2504.19854
- Demo: https://declare-lab.github.io/nora
Usage
Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis.
Getting Started For Inference
To get started with loading and running Nora for inference, we provide a lightweight interface that with minimal dependencies.
git clone https://github.com/declare-lab/nora
cd inference
pip install -r requirements.txt
For example, to load Nora for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot:
# Load VLA
from inference.nora import Nora
nora = Nora(device='cuda')
# Get Inputs
image: Image.Image = camera(...)
instruction: str = <INSTRUCTION>
# Predict Action (7-DoF; un-normalize for BridgeData V2)
actions = nora.inference(
image=image, # Dummy image
instruction=instruction,
unnorm_key='bridge_orig' # Optional, specify if needed
)
# Execute...
robot.act(action, ...)
- Downloads last month
- 7