@burtenshaw on Hugging Face: "The open LLM leaderboard is completed, retired, dead, ‘ascended to a higher…"

burtenshaw

posted an update 2 days ago

Post

1646

The open LLM leaderboard is completed, retired, dead, ‘ascended to a higher plane’. And in its shadow we have an amazing range of leaderboards built and maintained by the community.

In this post, I just want to list some of those great leaderboards that you should bookmark for staying up to date:

- Chatbot Arena LLM Leaderboard is the first port of call for checking out the best model. It’s not the fastest because humans will need to use the models to get scores, but it’s worth the wait. lmarena-ai/chatbot-arena-leaderboard

- OpenVLM Leaderboard is great for getting scores on vision language models opencompass/open_vlm_leaderboard

- Ai2 are doing a great job on RewardBench and I hope they keep it up because reward models are the unsexy workhorse of the field. allenai/reward-bench

- The GAIA leaderboard is great for evaluating agent applications. gaia-benchmark/leaderboard

🤩 This seems like such a sustainable way of building for the long term, where rather than leaning on a single company to evaluate all LLMs, we share the load.

ashercn97

2 days ago

Dont forget the mteb leaderboard

LeroyDyer

about 7 hours ago

in truth i dont think these leader boards have value !

As they only contain the model providers models and evals which are heavily Pre prepped !

The Main leaderboard was the definitive Leaderboard For all models !
The only think was it just needs to change evaluation datasets every few months :
The main leadwer board only accepts your first submission ! So intruth it is valid !
so when newer models are eval,uated it is expected that they are up to date with current lllm phylospohys and testing metrics !

AS you will see the providers are not top of the leader board but the sub devlopers in the open source community !

Again there was a peoblem with the main leader board !!!

Only models of a dspecific sizew should have been opn there ! IE BIG MODELS SHOULD BE SEPERATED ! and perhaps moved to a paid leaderboard as evalauating these models do take resources and costs !

I am not pleased they disbanded this most impotant resource despite its failures !!

Very dissapointed that hugging face have even allowed it to fail !!!

this i think is very bad for thew site !
The leader board should have been its own section of the site and not directly as space connected to spaces !

Now we actually do not have leaderboard to measure or compete with other models !

Only the fake clones which are just for the faking of results and thew inhuibility to compare your own llm training with the so caled officaL FIGURES !
wow !

Completly Unbeliveablke that also you have sugfghested these non open source sites to replace the original !

( even the lists of best model produced by the leaderboards were sponsered !!!!! ) ( my model was actually the top mistral 7b model ... it did not feature on the best models list !! ( in faxt it was probably a friends list only thing !)

LeroyDyer

about 7 hours ago

i will say again hiugging face complete mess up there !

Join the conversation