huggingPartyParis

community

https://partiful.com/e/oWOMGoPxB5D37qw5F8yN

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

andreamaduzzi authored a paper 19 days ago

Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training

loubnabnl authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

dovpie authored a paper about 1 month ago

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

View all activity

HuggingPartyParis's activity

m-ric

posted an update 1 day ago

Post

2184

I've made an open version of Google's NotebookLM, and it shows the superiority of the open source tech task! 💪

The app's workflow is simple. Given a source PDF or URL, it extracts the content from it, then tasks Meta's Llama 3.3-70B with writing the podcast script, with a good prompt crafted by @gabrielchua ("two hosts, with lively discussion, fun notes, insightful question etc.")
Then it hands off the text-to-speech conversion to Kokoro-82M, and there you go, you have two hosts discussion any article.

The generation is nearly instant, because:
> Llama 3.3 70B is running at 1,000 tokens/seconds with Cerebras inference
> The audio is generated in streaming mode by the tiny (yet powerful) Kokoro, generating voices faster than real-time.

And the audio generation runs for free on Zero GPUs, hosted by HF on H200s.

Overall, open source solutions rival the quality of closed-source solutions at close to no cost!

Try it here 👉👉 m-ric/open-notebooklm

2 replies

clem

posted an update 2 days ago

Post

3631

nvidia dominating the top trending open datasets these days!

http://hf.co/datasets

clem

posted an update 4 days ago

Post

3867

What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?

6 replies

clem

posted an update 9 days ago

Post

1582

LeRobot-worldwide-hackathon is already scheduled in 30 cities all over the world!

Check if there's one in your city here: LeRobot-worldwide-hackathon/worldwide-map

clem

posted an update 9 days ago

Post

1478

The

meta-llama org just crossed 40,000 followers on Hugging Face. Grateful for all their impact on the field sharing the Llama weights openly and much more!

We need more of this from all other big tech to make the AI more open, collaborative and beneficial to all!

clem

posted an update 18 days ago

Post

3983

Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!

3 replies

clem

posted an update 19 days ago

Post

2939

Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces

3 replies

m-ric

posted an update 23 days ago

Post

2737

New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑

InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.

➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.

💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨.

They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑

2 replies

clem

posted an update 24 days ago

Post

1471

You can now bill your inference costs from all our inference partners (together, fireworks, fal, sambanova, cerebras, hyperbolic,...) to your Hugging Face organization.

Useful to drive more company-wide usage of AI without the billing headaches!

1 reply

nv-nguyen

authored a paper about 1 month ago

BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

Paper • 2504.02812 • Published Apr 3 • 5

eliebak

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

clem

posted an update about 1 month ago

Post

2656

Llama 4 is in transformers!

Fun example using the instruction-tuned Maverick model responding about two images, using tensor parallel for maximum speed.

From https://huggingface.co/blog/llama4-release

1 reply

clem

posted an update about 1 month ago

Post

1997

Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March.

People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!

Kudos to all the small AI builders out there!

2 replies

clem

posted an update about 1 month ago

Post

1355

Now in Enterprise Hub organizations, you can centralize your billing not only for HF usage but also inference through our inference partners.

Will prevent some headaches for your finance & accounting teams haha (so feel free to share that with them).

3 replies

clem

posted an update about 1 month ago

Post

4031

Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.

Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.

With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.

This is incredibly exciting. Let’s go, open science and open-source AI!

5 replies

m-ric

posted an update about 1 month ago

Post

2331

🚀 DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. 📚

👉 But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
📍 Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
📑 Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasks—compared to 76,000 tasks for larger models like OS-Atlas—UI-R1 shows significant efficiency and improved performance:
📈 Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
🔍 Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here 👉 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)

clem

posted an update about 1 month ago

Post

2423

What's this cool purple banner haha 😶😶😶

4 replies

osanseviero

authored a paper about 1 month ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 50

clem

posted an update about 1 month ago

Post

2254

Very interesting security section by @yjernite @lvwerra @reach-vb @dvilasuero & the team replicating R1. Broadly applicable to most open-source models & some to APIs (but APIs have a lot more additional risks because you're not in control of the underlying system):

https://huggingface.co/blog/open-r1/update-4#is-it-safe

1 reply

clem

posted an update about 2 months ago

Post

1581

A repository is created every ~15 secs on Hugging Face so @kramp added a "Getting Started" to make it easier & a model release checklist: https://huggingface.co/docs/hub/model-release-checklist

What are you uploading today?

1 reply

AI & ML interests

Recent Activity

Team members 961

HuggingPartyParis's activity