I’m excited to share that I’ve completed the Hugging Face Agents Course and earned my certificate.
Over the past few months, I explored how to build intelligent, autonomous agents using cutting-edge tools like smolagents, LlamaIndex, and LangGraph. The course covered everything from the fundamentals of agents to advanced topics like fine-tuning for function-calling, observability, evaluation, and even agents in games.
Some key content included:
1. Introduction to AI Agents
2. Agentic RAG use cases
3. Multi-framework implementation: smolagents, LlamaIndex, and LangGraph
4. Building, testing, and certifying a complete agent project
This was a hands-on, practical experience that deepened my understanding of how to design reliable, tool-using LLM agents. Looking forward to leveraging these skills in real-world applications in healthcare, logistics, and beyond.
Many thanks to the Hugging Face team for putting this together. Let’s build safe and useful agents!
I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.
After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...
I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.
Choosing an original problem forced me to: 🤔 Think about the problem setting 🧬 Generate data 🤏 Choose the right base model 🏆 Design reward functions (and experiencing reward hacking) 🔄 Run multiple rounds of training, hoping that my model would learn something.
In this work, we tackle some major challenges in Arabic multi-label emotion classification especially the issues of class imbalance and label correlation that often hurt model performance, particularly for minority emotions.
Our approach:
Stacked contextual embeddings from fine-tuned ArabicBERT, MarBERT, and AraBERT models.
A meta-learning strategy that builds richer representations.
A hybrid loss function combining class weighting, label correlation matrices, and contrastive learning to better handle class imbalances.
🔍 Extensive experiments show significant improvements across Precision, Recall, F1-Score, Jaccard Accuracy, and Hamming Loss. 🌟 The hybrid loss function in particular helped close the gap between majority and minority classes!
We also performed ablation studies to break down each component’s contribution and the results consistently validated our design choices.
This framework isn't just for Arabic it offers a generalizable path for improving multi-label emotion classification in other low-resource languages and domains.
Big thanks to my co-authors: Muhammad Azeem Aslam, Wang Jun, Nisar Ahmed, Li Yanan, Hu Hongfei, Wang Shiyu, and Xin Liu!
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it? Spoiler: Wordle turned out to be a surprisingly effective benchmark. So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways 1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents. 3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
3 replies
·
reacted to BrigitteTousi's
post with 🚀about 2 months ago
Another impressive model that joined the ranking today is ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !
Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥
> Three new models: 3B, 10B, 28B with res 224, 448 💙 > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯
🚀 Excited to share our technical report on the Southeast Asian multilingual model Sailor2 and its latest updates!
Our 49-page report details Sailor2's development journey, including multilingual data cleaning, small model data mixture simulations, multi-stage continual pre-training, multi-stage post-training, and multi-cultural multi-lingual evaluations. Sailor2 aims to streamline the multilingual model pre-training process efficiently for the community.
🧭 We highlight Sailor2's impressive performance in low-resource language translation scenarios and its cultural understanding advantages in Southeast Asia, promoting practical applications for regional languages.
Model updates include: 💡 More precise outputs: Reduced redundancy in model outputs through refined post-training data and optimization techniques. 🌈 Handling longer texts: Expanded to handle up to 128K context length in Southeast Asian languages through long-text training. ⚡️ Faster inference: Achieved 2.5x faster inference speed with speculative decoding. 🌪️ More model sizes: Introduced new sizes of 3B and 14B through model pruning.
🌟 All models are Apache-licensed for commercial use; development tools (code, resources) are open-source.
🚀 HuggingFace Spaces Ranking Tracker - Your Complete AI Trend Analytics!
Introducing the Spaces Ranking Tracker, a comprehensive analytics dashboard that tracks and analyzes every AI application in the HuggingFace ecosystem.
✨ Key Features: • Real-time tracking of daily ranking changes over 30 days • Detailed analysis of top 100 trending spaces • User-based integrated score visualization • One-click access to space details • Interactive rank change graphs
📊 Dashboard Components: 1. Main Dashboard - Daily rank trend graphs - Top 20 creators' combined score chart - Detailed space information cards - Real-time trending score updates
2. Space Detailed Analysis - Creation date, current rank, and trending score - 30-day ranking history - Direct space access - Custom color coding for intuitive rank display
🎯 How to Use: • Monitor latest AI community trends • Track your project's performance • Discover popular AI demos • Analyze competing projects • Follow AI ecosystem dynamics
3. Interactive Features - Custom filtering options - Sorting by various metrics - Detailed performance statistics - Comprehensive trending scores - Historical data tracking
Stay on top of every movement in the HuggingFace ecosystem with daily ranking updates! 👉 Try it now!
There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community! deepseek-ai deepseek-ai/DeepSeek-R1
✨ MIT License : enabling distillation for custom models ✨ 32B & 70B models match OpenAI o1-mini in multiple capabilities ✨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'