Image-Text-to-Text
Transformers
Safetensors
huangzhiyuan's picture
First commit.
f1a226b
|
raw
history blame
624 Bytes
metadata
license: apache-2.0
base_model:
  - OpenGVLab/InternVL2-8B

SpiritSight Agent: Advanced GUI Agent with One Look

Introduction

SpiritSight id a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.

Inference

conda create -n spiritsight-agent python=3.9

pip install -r requirements.txt
pip install flash-attn==2.3.6 --no-build-isolation

python infer_SSAgent-8B.py

Acknowledgments

We thank the following amazing projects that truly inspired us: