
Ross Wightman
AI & ML interests
Recent Activity
Organizations
rwightman's activity
Request: DOI
Non-English language pdfs
Where are the answers to the questions in the dataset?
Model size seems odd

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).
The MI100 does have native bfloat16 which is a big win over V100.
I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..


timm/ViT-gopt-16-SigLIP2-384

timm/ViT-gopt-16-SigLIP2-256

timm/ViT-SO400M-16-SigLIP2-512

timm/ViT-SO400M-16-SigLIP2-384

timm/ViT-SO400M-16-SigLIP2-256

timm/ViT-SO400M-14-SigLIP2-378
