HuggingFaceGECLM

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

clefourrier authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

anton-l authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

thomwolf authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all activity

HuggingFaceGECLM's activity

thomwolf

posted an update 4 days ago

Post

2151

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

clefourrier

posted an update 5 days ago

Post

1609

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

clefourrier

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 204

anton-l

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 204

thomwolf

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 204

thomwolf

authored a paper about 2 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 56

anton-l

posted an update 3 months ago

Post

2598

Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2

thomwolf

posted an update 3 months ago

Post

6086

We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi

2 replies

clefourrier

authored a paper 3 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 18

thomwolf

posted an update 3 months ago

Post

1610

Exponentially growing number of open-source AI models over the course of the past 30 months – from a few thousands to over 1 million and more

Interactive data viz: huggingface/open-source-ai-year-in-review-2024

thomwolf

posted an update 3 months ago

Post

1575

Most liked and most downloaded open-source AI models from 2022 to 2024

Interactive viz: https://aiworld.eu/embed/model/model/treemap
Discussion: huggingface/open-source-ai-year-in-review-2024

thomwolf

posted an update 4 months ago

Post

1770

Interesting long read from @evanmiller-anthropic on having a better founded statistical approach to Language Model Evaluations:
https://www.anthropic.com/research/statistical-approach-to-model-evals

Worth a read if you're into LLM evaluations!

Cc @clefourrier

1 reply

SaylorTwift

posted an update 4 months ago

Post

594

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

thomwolf

posted an update 4 months ago

Post

1487

Very exciting new mistralai/Pixtral-Large-Instruct-2411 model from Mistral-AI

Impressive performances, huge congrats @patrickvonplaten @sgvaze @pandora-s @devendrachaplot @sophiamyang and team!

Very nice to have SOTA Multilingual OCR and Chart understanding in an open-weights model