We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
You can apply for yourself, or your entire organization. Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.
Have questions? Join the conversation below 👇 or open a discussion on the Xet team page xet-team/README
Ever wanted 45 min with one of AI’s most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future:
1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."
2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."
3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."
4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.
We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on Hugging Face.
Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."
Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:
1) Architecture choices: > No more softcaping, replace by QK-Norm > Both Pre AND Post Norm > Wider MLP than Qwen2.5, ~ same depth > SWA with 5:1 and 1024 (very small and cool ablation on the paper!) > No MLA to save KV cache, SWA do the job!
2) Long context > Only increase the rope in the global layer (to 1M) > Confirmation that it's harder to do long context for smol models, no 128k for the 1B > Pretrained with 32k context? seems very high > No yarn nor llama3 like rope extension
3) Distillation > Only keep te first 256 logits for the teacher > Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better) > On policy distillation yeahh (by @agarwl_ et al), not sure if the teacher gap behave the same here, curious if someone have more info?
4) Others > Checkpoint with QAT, that's very cool > RL using improve version of BOND, WARM/WARP good excuse to look at @ramealexandre papers > Only use Zero3, no TP/PP if i understand correctly ? > Training budget relatively similar than gemma2
We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪
Together with the models, we are releasing:
📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi