---
task_categories:
- visual-question-answering
language:
- en
tags:
- remyx
- SpatialReasoning
- spatial-reasoning
- test-time-compute
- thinking
- reasoning
- multimodal
- vlm
- vision-language
- distance-estimation
- quantitative-spatial-reasoning
pretty_name: SpaceOm
license: apache-2.0
---

# SpaceOm (Coming Soon)


![image/gif](https://cdn-uploads.huggingface.co/production/uploads/647777304ae93470ffc28913/5cPsHwrmzqPOjd7zUgzss.gif)

## Model Overview

OpenAI's plan to release a SOTA text-in, text-out toggleable reasoning LLM means the most performant Vision-Language Model (VLM) will likely be based
on this llm backbone. 

Meanwhile, updated methods of reasoning synthesis which include improvements to localization & captioning using "Describe Anything"
as well as the step-by-step instructions are [in the works](https://github.com/andrewliao11/Q-Spatial-Bench-code/blob/main/prompt_templates/spatial_prompt_steps.txt).

Check out [SpaceThinker](https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B) for more on the cutting-edge of quantitative spatial reasoning.