CVPR Demo Track

non-profit

http://cvpr2022.thecvf.com/

Activity Feed Request to join this org

AI & ML interests

CVPR Demo Track @ CVPR 2022

Recent Activity

lkeab authored a paper 11 days ago

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

paulengstler authored a paper about 1 month ago

SynCity: Training-Free Generation of 3D Worlds

samirg authored a paper about 2 months ago

Should VLMs be Pre-trained with Image Data?

View all activity

CVPR's activity

abidlabs

posted an update 1 day ago

Post

1861

HOW TO ADD MCP SUPPORT TO ANY 🤗 SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.

That's it! Now your LLM will be able to talk to you 🤯

abidlabs

posted an update 2 days ago

Post

2151

Hi folks! Excited to share a new feature from the Gradio team along with a tutorial.

If you don't already know, Gradio is an open-source Python library used to build interfaces for machine learning models. Beyond just creating UIs, Gradio also exposes API capabilities and now, Gradio apps can be launched Model Context Protocol (MCP) servers for LLMs.

If you already know how to use Gradio, there are only two additional things you need to do:
* Add standard docstrings to your function (these will be used to generate the descriptions for your tools for the LLM)
* Set mcp_server=True in launch()

Here's a complete example (make sure you already have the latest version of Gradio installed):

import gradio as gr

def letter_counter(word, letter):
    """Count the occurrences of a specific letter in a word.
    
    Args:
        word: The word or phrase to analyze
        letter: The letter to count occurrences of
        
    Returns:
        The number of times the letter appears in the word
    """
    return word.lower().count(letter.lower())

demo = gr.Interface(
    fn=letter_counter,
    inputs=["text", "text"],
    outputs="number",
    title="Letter Counter",
    description="Count how many times a letter appears in a word"
)

demo.launch(mcp_server=True)

This is a very simple example, but you can add the ability to generate Ghibli images or speak emotions to any LLM that supports MCP. Once you have an MCP running locally, you can copy-paste the same app to host it on [Hugging Face Spaces](https://huggingface.co/spaces/) as well.

All free and open-source of course! Full tutorial: https://www.gradio.app/guides/building-mcp-server-with-gradio

2 replies

Yosun

posted an update 6 days ago

Post

456

Is it possible to pay for more ZeroGPU usage quota?

1 reply

victor

posted an update 10 days ago

Post

2826

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

abidlabs

posted an update 29 days ago

Post

3768

JOURNEY TO 1 MILLION DEVELOPERS

5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.

Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobabooga’s Text WebUI, Dall-E Mini, and LLaMA-Factory.

How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:

1. Invest in good primitives, not high-level abstractions
2. Embed virality directly into your library
3. Focus on a (growing) niche
4. Your only roadmap should be rapid iteration
5. Maximize ways users can consume your library's outputs

1. Invest in good primitives, not high-level abstractions

When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:

Read the rest here: https://x.com/abidlabs/status/1907886

paulengstler

authored a paper about 1 month ago

SynCity: Training-Free Generation of 3D Worlds

Paper • 2503.16420 • Published Mar 20 • 25

ZhangYuanhan

authored a paper about 2 months ago

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 43

victor

posted an update 3 months ago

Post

6028

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

6 replies

jojo23333

authored a paper 3 months ago

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Paper • 2501.14275 • Published Jan 24 • 1

victor

posted an update 3 months ago

Post

3136

Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot

ZhangYuanhan

authored a paper 3 months ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 26

yumingj

authored a paper 3 months ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 91

hysts

updated a Space 4 months ago

CVPR2022 Papers

🔥

Find CVPR 2022 papers by title

victor

posted an update 5 months ago

Post

2222

Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🤯.

Check this dataset for the results: victor/qwq-misguided-attention

2 replies

victor

posted an update 5 months ago

Post

2610

Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer 🔥
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights 🚀.

victor

posted an update 5 months ago

Post

1855

Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!