Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Organizations

MLX Community's profile picture

Jaward's activity

posted an update 3 days ago
view post
Post
1604
This is the most exciting of this week’s release for me: Gemini Robotics - A SOTA generalist Vision-Language-Action model that brings intelligence to the physical world. It comes with a verifiable real-world knowledge Embodied Reasoning QA benchmark. Cool part is that the model can be specialized with fast adaptation to new tasks and have such adaptations transferred to new robot embodiment like humanoids. Looking forward to the model and data on hf, it’s about time I go full physical:)
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf
posted an update 5 days ago
view post
Post
1957
Super Interesting Paper!
Proposes neural networks (CRNNs) that can learn to produce traveling waves in their hidden state in response to visual stimuli, thus enabling the transfer and integration of spatial information across neural connections. In other words they showed that neural networks have wave-like properties that blends and processes visual information over time, cool seeing a union of AI and physics in this way.
Paper: https://arxiv.org/pdf/2502.06034
Code: https://github.com/KempnerInstitute/traveling-waves-integrate
posted an update 7 days ago
posted an update 16 days ago
replied to their post 25 days ago
view reply

bro if you had read the repo you would see that this implementation is for educational purpose, it's not done because it's easy. Not to mention unsloth is using trl's GRPO trainer which is super slow on cpu and does not scale for models under 500M params, I tried it both on cpu and gpu. This custom implementation cuts most of the heavy lifting allowing you to train and scale faster even on cpu, plus a bunch of custom configs with a simplified GRPO trainer in under 500 lines of code. There's a lot one can learn from it.

posted an update 27 days ago
view post
Post
3870
Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb
  • 2 replies
·
posted an update about 1 month ago
view post
Post
3470
ByteDance drops OmniHuman🔥
This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion.
Project: https://omnihuman-lab.github.io/
·