Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
10
78
subaru kimura
joddy
Follow
0 followers
·
14 following
AI & ML interests
None yet
Recent Activity
liked
a model
12 days ago
Qwen/Qwen3-0.6B
liked
a dataset
about 1 month ago
Muennighoff/flores200
reacted
to
m-ric
's
post
with 👍
2 months ago
𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳 🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. 👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " 🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. 🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! ⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron
View all activity
Organizations
joddy
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
12 days ago
Qwen/Qwen3-0.6B
Text Generation
•
Updated
13 days ago
•
337k
•
226
liked
a dataset
about 1 month ago
Muennighoff/flores200
Updated
Jan 7, 2024
•
1.45k
•
15
liked
a model
4 months ago
microsoft/phi-4
Text Generation
•
Updated
Feb 24
•
547k
•
•
2.03k
liked
7 models
6 months ago
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text
•
Updated
Jan 12
•
734k
•
421
allenai/Molmo-7B-D-0924
Image-Text-to-Text
•
Updated
Apr 4
•
31.4k
•
525
nvidia/NVLM-D-72B
Image-Text-to-Text
•
Updated
Jan 14
•
15.5k
•
770
google/gemma-2-9b
Text Generation
•
Updated
Aug 7, 2024
•
61.7k
•
654
microsoft/Florence-2-large
Image-Text-to-Text
•
Updated
Dec 8, 2024
•
422k
•
1.54k
gpt-omni/mini-omni2
Any-to-Any
•
Updated
Oct 24, 2024
•
178
•
271
vikhyatk/moondream1
Text Generation
•
Updated
Feb 7, 2024
•
72.7k
•
486
liked
a model
7 months ago
foduucom/web-form-ui-field-detection
Object Detection
•
Updated
Sep 8, 2023
•
48
liked
a model
8 months ago
Qwen/Qwen2.5-72B-Instruct
Text Generation
•
Updated
Jan 12
•
152k
•
•
816
liked
a model
10 months ago
lmms-lab/LLaVA-NeXT-Video-7B-DPO
Video-Text-to-Text
•
Updated
Feb 21
•
1.43k
•
27
liked
a model
11 months ago
Unbabel/XCOMET-XXL
Translation
•
Updated
Apr 7
•
33
liked
a dataset
12 months ago
HuggingFaceFW/fineweb
Viewer
•
Updated
Jan 31
•
25B
•
855k
•
2.14k
liked
2 datasets
about 1 year ago
Anthropic/hh-rlhf
Viewer
•
Updated
May 26, 2023
•
169k
•
15k
•
1.33k
allenai/real-toxicity-prompts
Viewer
•
Updated
Sep 30, 2022
•
99.4k
•
2.76k
•
83
liked
a model
over 1 year ago
google-t5/t5-11b
Translation
•
Updated
Jan 2, 2023
•
200k
•
62
liked
a Space
almost 2 years ago
Runtime error
190
190
Controlnet Segment Anything
😻
liked
a model
almost 2 years ago
facebook/sam-vit-huge
Feature Extraction
•
Updated
Jan 11, 2024
•
474k
•
165
Load more