open-source-metrics (Hugging Face OSS Metrics)

clem

posted an update about 11 hours ago

Post

789

What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?

3 replies

·

fdaudens

posted an update 5 days ago

Post

2850

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

1 reply

·

clem

posted an update 5 days ago

Post

1457

LeRobot-worldwide-hackathon is already scheduled in 30 cities all over the world!

Check if there's one in your city here: LeRobot-worldwide-hackathon/worldwide-map

clem

posted an update 5 days ago

Post

1403

The

meta-llama org just crossed 40,000 followers on Hugging Face. Grateful for all their impact on the field sharing the Llama weights openly and much more!

We need more of this from all other big tech to make the AI more open, collaborative and beneficial to all!

abidlabs

posted an update 5 days ago

Post

3481

HOW TO ADD MCP SUPPORT TO ANY 🤗 SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.

That's it! Now your LLM will be able to talk to you 🤯

fdaudens

posted an update 6 days ago

Post

426

I just gave my chatbots a massive upgrade: they can now generate audio from text, modify images — you name it. Here’s how:

The Gradio team shipped MCP support. That means you can plug any AI app built with it into Claude or Cursor using the Model Context Protocol (MCP) — think of it like a USB port for LLMs.

I put it to the test:
- Whipped up a quick text-to-speech app with Kokoro on HF (with an LLM riding shotgun, naturally)
- Added "mcp_server=True" in the code
- Connected it to Claude

Now I can generate audio from any text. The possibilities are next-level: you can potentially plug any of the 500K+ AI apps on Hugging Face to your favorite LLM.

Is this the new UI for AI?

- My tts app (feel free to use/duplicate it): fdaudens/kokoro-mcp
- Blog post: https://huggingface.co/blog/gradio-mcp

abidlabs

posted an update 6 days ago

Post

2409

Hi folks! Excited to share a new feature from the Gradio team along with a tutorial.

If you don't already know, Gradio is an open-source Python library used to build interfaces for machine learning models. Beyond just creating UIs, Gradio also exposes API capabilities and now, Gradio apps can be launched Model Context Protocol (MCP) servers for LLMs.

If you already know how to use Gradio, there are only two additional things you need to do:
* Add standard docstrings to your function (these will be used to generate the descriptions for your tools for the LLM)
* Set mcp_server=True in launch()

Here's a complete example (make sure you already have the latest version of Gradio installed):

import gradio as gr

def letter_counter(word, letter):
    """Count the occurrences of a specific letter in a word.
    
    Args:
        word: The word or phrase to analyze
        letter: The letter to count occurrences of
        
    Returns:
        The number of times the letter appears in the word
    """
    return word.lower().count(letter.lower())

demo = gr.Interface(
    fn=letter_counter,
    inputs=["text", "text"],
    outputs="number",
    title="Letter Counter",
    description="Count how many times a letter appears in a word"
)

demo.launch(mcp_server=True)

This is a very simple example, but you can add the ability to generate Ghibli images or speak emotions to any LLM that supports MCP. Once you have an MCP running locally, you can copy-paste the same app to host it on [Hugging Face Spaces](https://huggingface.co/spaces/) as well.

All free and open-source of course! Full tutorial: https://www.gradio.app/guides/building-mcp-server-with-gradio

2 replies

·

fdaudens

posted an update 7 days ago

Post

1696

Want to know which AI models are least likely to hallucinate — and how to keep yours from spiking hallucinations by 20%?

A new benchmark called Phare, by Giskard, tested leading models across multiple languages, revealing three key findings:

1️⃣ Popular models aren't necessarily factual. Some models ranking highest in user satisfaction benchmarks like LMArena are actually more prone to hallucination.

2️⃣ The way you ask matters - a lot. When users present claims confidently ("My teacher said..."), models are 15% less likely to correct misinformation vs. neutral framing ("I heard...").

3️⃣ Telling models to "be concise" can increase hallucination by up to 20%.

What's also cool is that the full dataset is public - use them to test your own models or dive deeper into the results! H/t @davidberenstein1957 for the link.

- Study: https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms
- Leaderboard: https://phare.giskard.ai/
- Dataset: giskardai/phare

Xenova

posted an update 9 days ago

Post

5222

Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗

Check it out! 👇
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer

fdaudens

posted an update 13 days ago

Post

2107

@thomwolf and @m-ric teaming up as the perfect instructor duo for DeepLearning.ai’s new course: Building Code Agents with Hugging Face smolagents!

https://www.deeplearning.ai/short-courses/building-code-agents-with-hugging-face-smolagents/

victor

posted an update 14 days ago

Post

3077

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

sayakpaul

authored a paper 14 days ago

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Paper • 2504.16080 • Published 14 days ago • 15

davanstrien

posted an update 14 days ago

Post

1932

Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)

clem

posted an update 14 days ago

Post

3938

Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!

3 replies

·

clem

posted an update 15 days ago

Post

2912

Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces

3 replies

·

pagezyhf

posted an update 15 days ago

Post

1926

If you haven't had the chance to test the latest open model from Meta, Llama 4 Maverick, go try it on AMD MI 300 on Hugging Face!

amd/llama4-maverick-17b-128e-mi-amd

albertvillanova

posted an update 15 days ago

Post

2469

smolagents v1.14.0 is out! 🚀
🔌 MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
🪨 Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. 💼

Full release 👉 https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI

clem

posted an update 20 days ago

Post

1461

You can now bill your inference costs from all our inference partners (together, fireworks, fal, sambanova, cerebras, hyperbolic,...) to your Hugging Face organization.

Useful to drive more company-wide usage of AI without the billing headaches!

1 reply

·

Xenova

posted an update 20 days ago

Post

2539

Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯

Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️

Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.

For now, try out the demo! 👇
webml-community/Zyphra-ZR1-WebGPU

1 reply

·

fdaudens

posted an update 20 days ago

Post

1558

Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think — could MCPs be a real opportunity for journalism?

1 reply

·

Hugging Face OSS Metrics

AI & ML interests

Recent Activity

open-source-metrics's activity

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

AI & ML interests

Recent Activity

Team members 49

open-source-metrics's activity