Tuwhy
/

Llama-3.2V-11B-Sherlock-Offline

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Introduction

Sherlock is a training framework focus on improving Vision-Language Models reasoning and self-correction capabilities.

GitHub repo: https://github.com/DripNowhy/Sherlock

Project Page: https://dripnowhy.github.io/Sherlock/

arXiv: https://arxiv.org/abs/2505.22651

Downloads last month: 10

Safetensors

Model size

10.7B params

Tensor type

BF16

·

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tuwhy/Llama-3.2V-11B-Sherlock-Offline

Base model

meta-llama/Llama-3.2-11B-Vision-Instruct

Finetuned

(131)

this model

Dataset used to train Tuwhy/Llama-3.2V-11B-Sherlock-Offline

Collection including Tuwhy/Llama-3.2V-11B-Sherlock-Offline

Sherlock

Series model of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models" • 5 items • Updated 4 days ago • 1