ClΓ©mentine Fourrier

clefourrier

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Long Range Graph Benchmark's profile picture Evaluation datasets's profile picture BigScience: LMs for Historical Texts's profile picture HuggingFaceBR4's profile picture Cohere For AI's profile picture Huggingface Projects's profile picture Open Graph Benchmark's profile picture HuggingFaceGECLM's profile picture Pretrained Graph Transformers's profile picture Graph Datasets's profile picture BigCode's profile picture Hugging Face H4's profile picture InternLM's profile picture Vectara's profile picture GAIA's profile picture Hugging Face Smol Cluster's profile picture plfe's profile picture Open LLM Leaderboard's profile picture Qwen's profile picture Secure Learning Lab's profile picture Open Life Science AI's profile picture LLM360's profile picture TTS Eval (OLD)'s profile picture hallucinations-leaderboard's profile picture Bias Leaderboard Development's profile picture Leaderboard Organization's profile picture Demo Leaderboard's profile picture Demo leaderboard with an integrated backend's profile picture gg-hf's profile picture AIM-Harvard's profile picture Clinical & Biomedical ML Leaderboards's profile picture Women on Hugging Face's profile picture LMLLO2's profile picture Lighthouz AI's profile picture Open Arabic LLM Leaderboard's profile picture mx-test's profile picture IBM Granite's profile picture HuggingFaceFW's profile picture HF-contamination-detection's profile picture TTS AGI's profile picture Leader Board Test Org's profile picture Social Post Explorers's profile picture hsramall's profile picture Open RL Leaderboard's profile picture The Fin AI's profile picture La Leaderboard's profile picture Open Hebrew LLM's Leaderboard's profile picture gg-tt's profile picture HuggingFaceEval's profile picture HP Inc.'s profile picture Novel Challenge's profile picture Open LLM Leaderboard Archive's profile picture LLHF's profile picture SLLHF's profile picture lbhf's profile picture Inception's profile picture nltpt's profile picture Lighteval testing org's profile picture ClΓ©Max's profile picture Hugging Face Science's profile picture test_org's profile picture Coordination Nationale pour l'IA's profile picture LeMaterial's profile picture open-llm-leaderboard-react's profile picture Prompt Leaderboard's profile picture wut?'s profile picture UBC-NLP Collaborations's profile picture smolagents's profile picture Your Bench's profile picture leaderboard explorer's profile picture Open R1's profile picture SIMS's profile picture OpenEvals's profile picture

clefourrier's activity

posted an update 4 days ago
view post
Post
1589
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.
reacted to m-ric's post with πŸ”₯ about 1 month ago
view post
Post
3130
Now you can launch a code agent directly from your terminal!
✨ πšœπš–πš˜πš•πšŠπšπšŽπš—πš "πšˆπš˜πšžπš› πšπšŠπšœπš”" directly launches a CodeAgent
▢️ This also works with web agents (replace πšœπš–πš˜πš•πšŠπšπšŽπš—πš with πš πšŽπš‹πšŠπšπšŽπš—πš) thanks to @merve !

πŸ’Ύ Another treat from smolagents release 1.7.0:
Now agents have a memory mechanism, enabling many possibilities like replaying the last run with πšŠπšπšŽπš—πš.πš›πšŽπš™πš•πšŠπš’(), thank you @clefourrier !

Check the release notes here πŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.7.0
reacted to BrigitteTousi's post with ❀️ 2 months ago
view post
Post
1316
Community fine-tuned models are more carbon efficient than the models they are derived from! πŸ₯³πŸŒΏ

@alozowski @clefourrier @SaylorTwift @albertvillanova evaluated COβ‚‚ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...πŸ‘€

Blog Post: https://huggingface.co/blog/leaderboard-emissions-analysis

Leaderboard: open-llm-leaderboard/open_llm_leaderboard
reacted to fdaudens's post with ❀️ 3 months ago
view post
Post
1799
Keeping up with open-source AI in 2024 = overwhelming.

Here's help: We're launching our Year in Review on what actually matters, starting today!

Fresh content dropping daily until year end. Come along for the ride - first piece out now with @clem 's predictions for 2025.

Think of it as your end-of-year AI chocolate calendar.

Kudos to @BrigitteTousi @clefourrier @Wauplin @thomwolf for making it happen. We teamed up with aiworld.eu for awesome visualizations to make this digestibleβ€”it's a charm to work with their team.

Check it out: huggingface/open-source-ai-year-in-review-2024
reacted to thomwolf's post with 🧠πŸ”₯ 4 months ago
reacted to malhajar's post with πŸ”₯ 4 months ago
view post
Post
4779
πŸ‡«πŸ‡· Lancement officiel de l'OpenLLM French Leaderboard : initiative open-source pour rΓ©fΓ©rencer l’évaluation des LLMs francophones

AprΓ¨s beaucoup d’efforts et de sueurs avec Alexandre Lavallee, nous sommes ravis d’annoncer que le OpenLLMFrenchLeaderboard est en ligne sur Hugging Face (space url: le-leadboard/OpenLLMFrenchLeaderboard) la toute premiΓ¨re plateforme dΓ©diΓ©e Γ  l’évaluation des grands modΓ¨les de langage (LLM) en franΓ§ais. πŸ‡«πŸ‡·βœ¨

Ce projet de longue haleine est avant tout une œuvre de passion mais surtout une nécessité absolue. Il devient urgent et vital d'oeuvrer à plus de transparence dans ce domaine stratégique des LLM dits multilingues. La première pièce à l'édifice est donc la mise en place d'une évaluation systématique et systémique des modèles actuels et futurs.

Votre modΓ¨le IA franΓ§ais est-il prΓͺt Γ  se dΓ©marquer ? Soumettez le dans notre espace, et voyez comment vous vous comparez par rapport aux autres modΓ¨les.

❓ Comment Γ§a marche :
Soumettez votre LLM français pour évaluation, et nous le testerons sur des benchmarks de référence spécifiquement adaptés pour la langue française — notre suite de benchmarks comprend :

- BBH-fr : Raisonnement complexe
- IFEval-fr : Suivi d'instructions
- GPQA-fr : Connaissances avancΓ©es
- MUSR-fr : Raisonnement narratif
- MATH_LVL5-fr : CapacitΓ©s mathΓ©matiques
- MMMLU-fr : ComprΓ©hension multitΓ’che

Le processus est encore manuel, mais nous travaillons sur son automatisation, avec le soutien de la communautΓ© Hugging Face.

@clem , on se prΓ©pare pour une mise Γ  niveau de l’espace ? πŸ˜πŸ‘€

Ce n'est pas qu'une question de chiffres—il s'agit de créer une IA qui reflète vraiment notre langue, notre culture et nos valeurs. OpenLLMFrenchLeaderboard est notre contribution personnelle pour façonner l'avenir des LLM en France.
  • 1 reply
Β·
reacted to fdaudens's post with ❀️ 9 months ago
view post
Post
1890
Look at that πŸ‘€

Actual benchmarks have become too easy for recent models, much like grading high school students on middle school problems makes little sense. So the team worked on a new version of the Open LLM Leaderboard with new benchmarks.

Stellar work from @clefourrier @SaylorTwift and the team!

πŸ‘‰ Read the blog post: open-llm-leaderboard/blog
πŸ‘‰ Explore the leaderboard: open-llm-leaderboard/open_llm_leaderboard
  • 1 reply
Β·
reacted to alvdansen's post with πŸ‘ 9 months ago
view post
Post
6890
I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.
Β·
reacted to fffiloni's post with ❀️ 10 months ago
view post
Post
19538
πŸ‡«πŸ‡·
Quel impact de l’IA sur les filiΓ¨res du cinΓ©ma, de l’audiovisuel et du jeu vidΓ©o?
Etude prospective Γ  destination des professionnels
β€” CNC & BearingPoint | 09/04/2024

Si l’Intelligence Artificielle (IA) est utilisΓ©e de longue date dans les secteurs du cinΓ©ma, de l’audiovisuel et du jeu vidΓ©o, les nouvelles applications de l’IA gΓ©nΓ©rative bousculent notre vision de ce dont est capable une machine et possΓ¨dent un potentiel de transformation inΓ©dit. Elles impressionnent par la qualitΓ© de leurs productions et suscitent par consΓ©quent de nombreux dΓ©bats, entre attentes et apprΓ©hensions.

Le CNC a donc dΓ©cider de lancer un nouvel Observatoire de l’IA Afin de mieux comprendre les usages de l’IA et ses impacts rΓ©els sur la filiΓ¨re de l’image. Dans le cadre de cet Observatoire, le CNC a souhaitΓ© dresser un premier Γ©tat des lieux Γ  travers la cartographie des usages actuels ou potentiels de l’IA Γ  chaque Γ©tape du processus de crΓ©ation et de diffusion d’une Ε“uvre, en identifiant les opportunitΓ©s et risques associΓ©s, notamment en termes de mΓ©tiers et d’emploi. Cette Γ©tude CNC / Bearing Point en a prΓ©sentΓ© les principaux enseignements le 6 mars, lors de la journΓ©e CNC Β« CrΓ©er, produire, diffuser Γ  l’heure de l’intelligence artificielle Β».

Le CNC publie la version augmentΓ©e de la cartographie des usages de l’IA dans les filiΓ¨res du cinΓ©ma, de l’audiovisuel et du jeu vidΓ©o.

Lien vers la cartographie complète: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891
Β·
reacted to alielfilali01's post with πŸ”₯ 10 months ago
reacted to giux78's post with ❀️ 10 months ago
view post
Post
1496
@FinancialSupport and I just released a new version of the Italian LLMs leaderboard https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard
using the super useful https://huggingface.co/demo-leaderboard template from @clefourrier .
We’ve evaluated over 50 models (base, merged, fine-tuned, etc.) from:
- Major companies like Meta, Mistral, Google ...
- University groups such as https://huggingface.co/sapienzanlp or https://huggingface.co/swap-uniba
- Italian Companies like https://huggingface.co/MoxoffSpA , https://huggingface.co/FairMind or https://huggingface.co/raicrits
- Various communities and individuals
All models were tested on #Italian benchmarks #mmlu #arc-c #hellaswag, which we contributed to the opensource lm-evaluation-harness library from https://huggingface.co/EleutherAI.
Plus, you can now submit your model for automatic evaluation, thanks to to https://huggingface.co/seeweb sponsored computation.
Curious about the top Italian models? Check out the leaderboard and submit your model!

https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard

reacted to singhsidhukuldeep's post with ❀️ 10 months ago
view post
Post
1332
πŸŽ‰ A new LLM is launched! πŸš€
After checking if it's open-source or not, πŸ€”
you rush to see the benchmarks... πŸƒβ€β™‚οΈπŸ’¨

Which benchmark does everyone check first? πŸ”

MMLU (Massive Multitask Language Understanding)? πŸ“š

Benchmarks like MMLU reaching saturation... most of the time the performance does not translate to real-world use cases! πŸŒβ—

Meet MMLU-Pro, released by TIGER-Lab on @huggingface ! 🐯🌍

πŸ§ͺ 12,217 questions across biology, business, chemistry, computer science, economics, engineering, health, history, law, mathematics, philosophy, physics, and psychology carefully validated by humans πŸ§‘β€πŸ”¬

πŸ”Ÿ Goes to 10 options per question instead of 4, this increase in options will make the evaluation more realistic and reduce random guessing 🎯

πŸ“Š 56% of questions come from MMLU, 34% from STEM websites, and the rest from TheoremQA and SciBench πŸ“ˆ

πŸ€– LLMs with weak chain-of-thought reasoning tend to perform lower, indicating it is more challenging and representative of real-world expectations πŸ§ πŸ’‘

Any guess who tops it and who bombs it? πŸ€”πŸ“‰πŸ“ˆ

GPT-4o drops by 17% (from 0.887 to 0.7149) πŸ“‰
Llama-3-70B drops by 27% (from 0.820 to 0.5541) πŸ“‰

πŸ”— TIGER-Lab/MMLU-Pro
  • 2 replies
Β·
reacted to georgewritescode's post with ❀️πŸ”₯ 11 months ago
view post
Post
2343
Excited to bring our benchmarking leaderboard of >100 LLM API endpoints to HF!

Speed and price are often just as important as quality when building applications with LLMs. We bring together all the data you need to consider all three when you need to pick a model and API provider.

Coverage:
β€£ Quality (Index of evals, MMLU, Chatbot Arena, HumanEval, MT-Bench)
β€£ Throughput (tokens/s: median, P5, P25, P75, P95)
β€£ Latency (TTFT: median, P5, P25, P75, P95)
β€£ Context window
β€£ OpenAI library compatibility

Link to Space: ArtificialAnalysis/LLM-Performance-Leaderboard

Blog post: https://huggingface.co/blog/leaderboard-artificial-analysis
reacted to osanseviero's post with πŸ”₯ 11 months ago
view post
Post
12244
Diaries of Open Source. Part 15 πŸ€—

πŸ•΅οΈβ€β™€οΈIdefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

πŸ’ΎSnowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

πŸ€–CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
πŸ¦‰ DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
πŸ‘€Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science βš—οΈhttps://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
πŸ“New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
πŸ”₯Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql
Β·
reacted to isidentical's post with ❀️ 11 months ago
reacted to abidlabs's post with ❀️ 11 months ago
view post
Post
3641
Open Models vs. Closed APIs for Software Engineers
-----------------------------------------------------------------------

If you're an ML researcher / scientist, you probably don't need much convincing to use open models instead of closed APIs -- open models give you reproducibility and let you deeply investigate the model's behavior.

But what if you are a software engineer building products on top of LLMs? I'd argue that open models are a much better option even if you are using them as APIs. For at least 3 reasons:

1) The most obvious reason is reliability of your product. Relying on a closed API means that your product has a single point-of-failure. On the other hand, there are at least 7 different API providers that offer Llama3 70B already. As well as libraries that abstract on top of these API providers so that you can make a single request that goes to different API providers depending on availability / latency.

2) Another benefit is eventual consistency going local. If your product takes off, it will be more economical and lower latency to have a dedicated inference endpoint running on your VPC than to call external APIs. If you've started with an open-source model, you can always deploy the same model locally. You don't need to modify prompts or change any surrounding logic to get consistent behavior. Minimize your technical debt from the beginning.

3) Finally, open models give you much more flexibility. Even if you keep using APIs, you might want to tradeoff latency vs. cost, or use APIs that support batches of inputs, etc. Because different API providers have different infrastructure, you can use the API provider that makes the most sense for your product -- or you can even use multiple API providers for different users (free vs. paid) or different parts of your product (priority features vs. nice-to-haves)
reacted to alozowski's post with πŸ”₯❀️ 11 months ago
view post
Post
2847
Do I need to make it a tradition to post here every Friday? Well, here we are again!

This week, I'm happy to share that we have two official Mistral models on the Leaderboard! πŸ”₯ You can check them out: mistralai/Mixtral-8x22B-Instruct-v0.1 and mistralai/Mixtral-8x22B-v0.1

The most exciting thing here? mistralai/Mixtral-8x22B-Instruct-v0.1 model got a first place among pretrained models with an impressive average score of 79.15!πŸ₯‡ Not far behind is the Mixtral-8x22B-v0.1, achieving second place with an average score of 74.47! Well done, Mistral AI! πŸ‘

Check out my screenshot here or explore it yourself at the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

The second news is that CohereForAI/c4ai-command-r-plus model in 4-bit quantization got a great average score of 70.08. Cool stuff, Cohere! 😎 (and I also have the screenshot for this, don't miss it)

The last news, which might seem small but is still significant, the Leaderboard frontpage now supports Python 3.12.1. This means we're on our way to speed up the Leaderboard's performance! πŸš€

If you have any comments or suggestions, feel free to also tag me on X (Twitter), I'll try to help – [at]ailozovskaya

Have a nice weekend! ✨
  • 2 replies
Β·