Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ryukijano 's Collections
Vision_transformer_robotics
VILA
Midi-composer
Diffusion models
Neural Rendering
Deep Reinforcement Learning
Own Work
Deep learning
LLMs
Computer vision
Audio
Multi modal foundational models
Text_to_video diffusion
Vision_language_models
Text-3D
2D->3D
Audio->3D
Segmentation

Computer vision

updated Dec 4, 2024
Upvote
-

  • Unsupervised Universal Image Segmentation

    Paper • 2312.17243 • Published Dec 28, 2023 • 20

  • Denoising Vision Transformers

    Paper • 2401.02957 • Published Jan 5, 2024 • 31

  • timm/ViT-B-16-SigLIP

    Zero-Shot Image Classification • Updated Oct 25, 2023 • 18.6k • 31

  • Runtime error
    19
    19

    Slimsam

    🌖

    Small yet powerful mask generation application ⚡️


  • InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

    Paper • 2402.05937 • Published Feb 8, 2024 • 14

  • microsoft/OmniParser

    Image-Text-to-Text • Updated Dec 2, 2024 • 826 • 1.66k

  • meta-llama/Llama-3.2-11B-Vision-Instruct

    Image-Text-to-Text • Updated Dec 4, 2024 • 520k • • 1.43k

  • Runtime error
    52
    52

    LSM

    🦀

    LargeSpatialModel: End-to-end Unposed Images to Semantic 3D


  • Running on Zero
    55
    55

    Mini Dust3r

    🌖

    Run a web app for creating 3D models


  • Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt

    Image-to-3D • Updated Oct 30, 2024 • 6.05k • 18

  • Running on Zero
    888
    888

    OminiControl

    🌍

    Generate images based on text prompts and condition images

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs