subaru kimura

joddy

AI & ML interests

None yet

Recent Activity

liked a model 12 days ago

Qwen/Qwen3-0.6B

liked a dataset about 1 month ago

Muennighoff/flores200

reacted to m-ric's post with 👍 2 months ago

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳 🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years. 👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons " 🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic. 🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening! ⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this) Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

View all activity

Organizations

joddy's activity

liked a model 12 days ago

Qwen/Qwen3-0.6B

Text Generation • Updated 13 days ago • 337k • 226

liked a dataset about 1 month ago

Muennighoff/flores200

Updated Jan 7, 2024 • 1.45k • 15

liked a model 4 months ago

microsoft/phi-4

Text Generation • Updated Feb 24 • 547k • • 2.03k

liked 7 models 6 months ago

liked a model 7 months ago

foduucom/web-form-ui-field-detection

Object Detection • Updated Sep 8, 2023 • 48

liked a model 8 months ago

Qwen/Qwen2.5-72B-Instruct

Text Generation • Updated Jan 12 • 152k • • 816

liked a model 10 months ago

lmms-lab/LLaVA-NeXT-Video-7B-DPO

Video-Text-to-Text • Updated Feb 21 • 1.43k • 27

liked a model 11 months ago

Unbabel/XCOMET-XXL

Translation • Updated Apr 7 • 33

liked a dataset 12 months ago

HuggingFaceFW/fineweb

Viewer • Updated Jan 31 • 25B • 855k • 2.14k

liked 2 datasets about 1 year ago

Anthropic/hh-rlhf

Viewer • Updated May 26, 2023 • 169k • 15k • 1.33k

allenai/real-toxicity-prompts

Viewer • Updated Sep 30, 2022 • 99.4k • 2.76k • 83

liked a model over 1 year ago

google-t5/t5-11b

Translation • Updated Jan 2, 2023 • 200k • 62

liked a Space almost 2 years ago

190

Controlnet Segment Anything

😻

liked a model almost 2 years ago

facebook/sam-vit-huge

Feature Extraction • Updated Jan 11, 2024 • 474k • 165