Visual Document Retrieval
PEFT
Safetensors
ColPali
vidore
multimodal_embedding
multilingual_embedding
Text-to-Visual Document (T→VD) retrieval

Support for an onnx version?

#1
by anudit - opened

I tried converting the model but running into some errors when I try and use it on the rust side.

!pip install onnx onnxruntime
from transformers import AutoTokenizer, AutoModel
import torch
import onnx
import onnxruntime as ort
import numpy as np

NAME="colnomic-embed-multimodal-3b"

tokenizer = AutoTokenizer.from_pretrained(f"nomic-ai/{NAME}")
model = AutoModel.from_pretrained(f"nomic-ai/{NAME}", trust_remote_code=True)

model_out = f"/content/model.onnx"

# Set the model in evaluation mode
model.eval()

# Example input for export
inputs = tokenizer("Example input text", return_tensors="pt")

# Define the export function
torch.onnx.export(
    model,                                           # The model to export
    (inputs["input_ids"], inputs["attention_mask"]), # Model input
    model_out,                                       # The path to save the ONNX file
    export_params=True,                              # Store the trained parameter weights
    opset_version=20,                                # The ONNX version to use
    input_names=['input_ids', 'attention_mask'],     # Model's input names
    output_names=['output'],                         # Model's output names
    dynamic_axes={
        'input_ids': {0: 'batch_size', 1: 'sequence_length'},    # Dynamic axes for input_ids
        'attention_mask': {0: 'batch_size', 1: 'sequence_length'}, # Dynamic axes for attention_mask
        'output': {0: 'batch_size', 1: 'sequence_length'}        # Dynamic axes for output
    }
)

print("## ONNX Model Exported")

# Verify the ONNX model
print("## Verifying Onnx")

ort_session = ort.InferenceSession(model_out)

if "token_type_ids" in inputs:
    del inputs["token_type_ids"]

# Prepare inputs for ONNX inference
ort_inputs = {k: v.cpu().detach().numpy() for k, v in inputs.items()}
ort_outs = ort_session.run(None, ort_inputs)

print("ONNX output shape:", ort_outs[0].shape)
print("ONNX output:", ort_outs[0])

with torch.no_grad():
    pytorch_outputs = model(**inputs)
    pytorch_output_array = pytorch_outputs.last_hidden_state.cpu().numpy()

print("PyTorch output shape:", pytorch_output_array.shape)
print("PyTorch output:", pytorch_output_array)

# Compare the outputs
if np.allclose(pytorch_output_array, ort_outs[0], atol=1e-5):
    print("The ONNX model output matches the PyTorch model output!")
else:
    print("The ONNX model output does NOT match the PyTorch model output.")
Nomic AI org

What errors do you get? are you able to successfully get the onnx outputs?

Nomic AI org
This comment has been hidden (marked as Resolved)

FWIW, I tried the above script and got this error:

ValueError: Target modules (.*(model).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$|.*(custom_text_proj).*$) not found in the base model. Please check the target modules and try again.

Edit: you cannot use AutoModel. This model additional weights that are not defined in any built-in HF transformers model. You have to use ColQwen2_5 explicitly.
Edit 2: This script doesn't seem to be specifically designed for a vision model though.

@Cebtenzzre @zpn Yeah this is a script I used for a text embedding model, would love to get an official onnx version like in v1.5.
Maybe https://huggingface.co/docs/optimum/en/exporters/onnx/usage_guides/export_a_model helps ?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment