264 53 148

Ross Wightman

rwightman

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

new activity 11 days ago

laion/relaion2B-multi-research:Request: DOI

new activity 11 days ago

pixparse/pdfa-eng-wds:Non-English language pdfs

new activity 12 days ago

pixparse/docvqa-wds:Where are the answers to the questions in the dataset?

View all activity

Organizations

rwightman's activity

New activity in laion/relaion2B-multi-research 11 days ago

Request: DOI

#1 opened 11 days ago by

elliottd

New activity in pixparse/pdfa-eng-wds 11 days ago

Non-English language pdfs

#5 opened 11 days ago by

NoOneCodesAI

New activity in pixparse/docvqa-wds 12 days ago

Where are the answers to the questions in the dataset?

#1 opened 12 days ago by

pawanagarwal

liked 4 models 15 days ago

New activity in timm/ViT-SO400M-14-SigLIP2-378 18 days ago

Model size seems odd

#1 opened 18 days ago by

bbb42

reacted to csabakecskemeti's post with 🤗🚀 20 days ago

Post

2757

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

replied to csabakecskemeti's post 20 days ago

Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).

The MI100 does have native bfloat16 which is a big win over V100.

I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving

replied to csabakecskemeti's post 20 days ago

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..

upvoted a paper 23 days ago