Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too πŸ§ πŸ€–πŸ’₯ | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Organizations

MLX Community's profile picture

Jaward's activity

posted an update 3 days ago
view post
Post
1625
This is the most exciting of this week’s release for me: Gemini Robotics - A SOTA generalist Vision-Language-Action model that brings intelligence to the physical world. It comes with a verifiable real-world knowledge Embodied Reasoning QA benchmark. Cool part is that the model can be specialized with fast adaptation to new tasks and have such adaptations transferred to new robot embodiment like humanoids. Looking forward to the model and data on hf, it’s about time I go full physical:)
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf
posted an update 5 days ago
view post
Post
1958
Super Interesting Paper!
Proposes neural networks (CRNNs) that can learn to produce traveling waves in their hidden state in response to visual stimuli, thus enabling the transfer and integration of spatial information across neural connections. In other words they showed that neural networks have wave-like properties that blends and processes visual information over time, cool seeing a union of AI and physics in this way.
Paper: https://arxiv.org/pdf/2502.06034
Code: https://github.com/KempnerInstitute/traveling-waves-integrate
posted an update 7 days ago
posted an update 16 days ago
replied to their post 25 days ago
view reply

bro if you had read the repo you would see that this implementation is for educational purpose, it's not done because it's easy. Not to mention unsloth is using trl's GRPO trainer which is super slow on cpu and does not scale for models under 500M params, I tried it both on cpu and gpu. This custom implementation cuts most of the heavy lifting allowing you to train and scale faster even on cpu, plus a bunch of custom configs with a simplified GRPO trainer in under 500 lines of code. There's a lot one can learn from it.

posted an update 27 days ago
view post
Post
3871
Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own β€œaha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
  • 2 replies
Β·
posted an update about 1 month ago
view post
Post
3471
ByteDance drops OmniHumanπŸ”₯
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/
Β·
posted an update about 1 month ago
view post
Post
1511
The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalizeβ€” trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300
  • 1 reply
Β·
reacted to mlabonne's post with 🧠 about 2 months ago
view post
Post
6160
πŸ†• LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

πŸ’» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
posted an update 2 months ago
posted an update 2 months ago
posted an update 2 months ago
view post
Post
2335
damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf
posted an update 3 months ago
view post
Post
3098
nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb
replied to their post 3 months ago
view reply

btw the background songs in the videos are actually what I listen to during implementation

posted an update 3 months ago
view post
Post
1832
Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb
  • 1 reply
Β·
posted an update 3 months ago
view post
Post
613
In Honour of This Year's NeurIPs Test of Time Paper Awardees
This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:
1. Generative Adversarial Nets (Goodfellow et al)
2. Sequence to Sequence Learning with Neural Networks (Ilya et al)
Let's explore how these papers helped pioneered breakthroughs in today's AI:

Full Article: https://huggingface.co/blog/Jaward/nip
posted an update 3 months ago
view post
Post
654
Lightweight implementation of the seminal paper β€œSequence to Sequence Learning with Neural Networks”

Built, trained and eval a 2 layer deep seq2seq LSTM-based model (~10M params) on German-English corpus of Multi30K dataset. In honor of
ilya sutskever et al for winning this year’s NeurIPSConf Test of Time paper award 🫑

Code: https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb
posted an update 3 months ago
view post
Post
493
Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

As a young researcher, I've often pondered the limitations of backpropagation, especially when mapped with how learning occurs in the human brain. While backpropagation has been the workhorse of deep learning, it isn't without flaws. In this post, I aim to share some thoughts on these shortcomings from first principles.

Full article
https://huggingface.co/blog/Jaward/rethinking-backpropagation
posted an update 4 months ago
view post
Post
2445
Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb
posted an update 4 months ago