ivanfioravanti (Ivan Fioravanti)

reacted to stefan-it's post with 👍 2 months ago

Post

3178

After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

1 reply

·

reacted to nyuuzyou's post with 🔥 4 months ago

Post

2552

CS2 Highlights Video Dataset - nyuuzyou/cs2-highlights

A collection of 4,857 high-quality Counter-Strike 2 gameplay highlights featuring:

- Professional and competitive gameplay recordings at 1080p resolution
- Complete metadata including Steam IDs and clip titles
- Preview thumbnails for all videos
- Both 60 FPS (842 clips) and 120 FPS (4,015 clips) content
- Gameplay from Faceit and official competitive modes

This extensive highlights collection provides a valuable resource for developing and evaluating video-based AI applications, especially in esports and competitive gaming contexts. Released under Creative Commons Zero (CC0) license.

reacted to onekq's post with 🚀 4 months ago

Post

3055

🐋 DeepSeek 🐋v3 achieves a solid 7 point jump than v2.5, surpassing GPT-4o, but is still behind 🍓 o1 🍓and Claude 3.5.

onekq-ai/WebApp1K-models-leaderboard

reacted to Jaward's post with 👀 4 months ago

Post

3147

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

reacted to nyuuzyou's post with 🔥 4 months ago

Post

2450

🎨 KLING AI Dataset - nyuuzyou/klingai

A collection of 12,782 AI-generated media items featuring:
- High-quality image and video generations at various resolutions
- Complete metadata including user IDs, prompts, and generation parameters
- Content generated using text-to-image, text-to-video, and image-to-video modalities
- Full generation settings and technical parameters

posted an update 4 months ago

Post

1656

Probably most of you already knows this trick but just in case:
🤔 Unable to connect to Hugging Face Spaces Dev Mode through local Cursor? 💡 Don't worry there is an easy trick!

- right click Connect with VS Code
- copy link in your browser
- vscode://vscode-remote/...
- replace vscode with cursor and go
- cursor://vscode-remote/...

reacted to AdinaY's post with 🔥 4 months ago

Post

3048

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

reacted to victor's post with 🔥 5 months ago

Post

2223

Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🤯.

Check this dataset for the results: victor/qwq-misguided-attention

2 replies

·

reacted to m-ric's post with ❤️ 5 months ago

Post

2396

Single most important thing to do today: 𝗴𝗼 𝘁𝗿𝘆 𝗤𝘄𝗤 𝗼𝗻 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗖𝗵𝗮𝘁!

👉 https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview

2 replies

·

reacted to KnutJaegersberg's post with 👀 5 months ago

Post

1837

DrNicefellow/Qwen-QwQ-32B-Preview-4.25bpw-exl2

Rumor has it this is currently the best model for 24 GB VRAM local usage.

DrNicefellow/Qwen-QwQ-32B-Preview-4.25bpw-exl2

reacted to elliesleightholm's post with 🤗 6 months ago

Post

2805

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

·

reacted to cfahlgren1's post with ❤️ 6 months ago

Post

925

observers 🔭 - automatically log all OpenAI compatible requests to a dataset💽

• supports any OpenAI compatible endpoint 💪
• supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts

reacted to merve's post with ❤️ 6 months ago

Post

1524

Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression 🤯
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with 🤗 transformers

replied to m-ric's post about 1 year ago

Amazing idea! Thanks for sharing!

reacted to m-ric's post with ❤️ about 1 year ago

Post

2072

[𝐍𝐞𝐰 𝐏𝐚𝐩𝐞𝐫] 𝐀𝐥𝐥 𝐭𝐨𝐤𝐞𝐧𝐬 𝐬𝐡𝐨𝐮𝐥𝐝 𝐧𝐨𝐭 𝐫𝐞𝐪𝐮𝐢𝐫𝐞 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐞𝐟𝐟𝐨𝐫𝐭 𝐭𝐨 𝐜𝐨𝐦𝐩𝐮𝐭𝐞! ⇒ 𝐌𝐢𝐱𝐭𝐮𝐫𝐞 𝐨𝐟 𝐝𝐞𝐩𝐭𝐡𝐬 🫧🐠

Google Researchers were unhappy with the way current decoding generally works: all tokens go through the same layers, thus requiring exactly the same effort to compute.

Whereas in reality, completing the answer to a difficult math problem for instance should be more computationally intense than completing the text of the Declaration of Independence: 𝗻𝗼𝘁 𝗮𝗹𝗹 𝘁𝗼𝗸𝗲𝗻𝘀 𝗮𝗿𝗲 𝗰𝗿𝗲𝗮𝘁𝗲𝗱 𝗲𝗾𝘂𝗮𝗹!

➡️ 𝗧𝗵𝗲𝘆 𝗵𝗮𝗱 𝘁𝗵𝗶𝘀 𝗴𝗲𝗻𝗶𝘂𝘀 𝗶𝗱𝗲𝗮: 💡 𝗵𝗮𝘃𝗶𝗻𝗴 𝗮 𝘁𝗼𝗸𝗲𝗻 𝗴𝗼 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝗮 𝗯𝗹𝗼𝗰𝗸 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗼𝗽𝘁𝗶𝗼𝗻𝗮𝗹. The token can go through the block (thus undergoing expensive self-attention computation) or avoid it through a skip connection.
The routing decision is taken on the block level: each block selects from the total sequence the top-k tokens that will go through it, and the others tokens will skip it. 𝘛𝘩𝘪𝘴 𝘢𝘭𝘭𝘰𝘸𝘴 𝘵𝘰 𝘤𝘩𝘰𝘰𝘴𝘦 𝘵𝘩𝘦 𝘦𝘹𝘢𝘤𝘵 𝙘𝙖𝙥𝙖𝙘𝙞𝙩𝙮 𝘰𝘧 𝘢 𝘣𝘭𝘰𝘤𝘬, 𝘪.𝘦. 𝘵𝘩𝘦 𝘱𝘳𝘰𝘱𝘰𝘳𝘵𝘪𝘰𝘯 𝘰𝘧 𝘵𝘰𝘬𝘦𝘯𝘴 𝘵𝘩𝘢𝘵 𝘨𝘰 𝘵𝘩𝘳𝘰𝘶𝘨𝘩 𝘪𝘵, 𝘸𝘩𝘪𝘤𝘩 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘪𝘯𝘧𝘭𝘶𝘦𝘯𝘤𝘦𝘴 𝘵𝘩𝘦 𝘤𝘰𝘮𝘱𝘶𝘵𝘢𝘵𝘪𝘰𝘯𝘢𝘭 𝘪𝘯𝘵𝘦𝘯𝘴𝘪𝘵𝘺 𝘰𝘧 𝘵𝘩𝘦 𝘧𝘰𝘳𝘸𝘢𝘳𝘥 𝘱𝘢𝘴𝘴.

This yields Mixture-of-Depths (MoD), with spectacular results.

✨ 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:
🎚️ 𝗖𝗮𝗽𝗮𝗰𝗶𝘁𝘆 𝗰𝗮𝗻 𝗯𝗲 𝘁𝘂𝗻𝗲𝗱 𝗮𝗹𝗹 𝘁𝗵𝗲 𝘄𝗮𝘆 𝗱𝗼𝘄𝗻 𝘁𝗼 𝟭𝟮.𝟱% for every second block: thus 87.5% of tokens just skip the block!
🚀 For the same training time and performance, >𝟲𝟬% 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝘀𝗽𝗲𝗲𝗱!
🤝 𝗖𝗮𝗻 𝗯𝗲 𝗰𝗼𝗺𝗯𝗶𝗻𝗲𝗱 𝘄𝗶𝘁𝗵 𝗠𝗶𝘅𝘁𝘂𝗿𝗲-𝗼𝗳-𝗘𝘅𝗽𝗲𝗿𝘁𝘀 for further improvements.

📄 𝗣𝗮𝗽𝗲𝗿 𝗵𝗲𝗿𝗲 👉 Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (2404.02258)
📚 I added it to my paper collection 👉 m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7

1 reply

·

reacted to Sentdex's post with ❤️ about 1 year ago

Post

Hi, welcome to my first post here!

I am slowly wrangling about 5 years of reddit comments (2015-2020). It's a total of billions samples that can be filtered as comment-reply pairs, chains of discussion, filtered by subreddit, up/down votes, controversy, sentiment, and more.

Any requests or ideas for curated datasets from here? I'll also tinker with uploading the entire dataset potentially in chunks or something, but it's quite a few terabytes in total, so I'll need to break it up still. I have some ideas for datasets I personally want too, but curious if anyone has something they'd really like to see that sounds interesting too.

7 replies

·

reacted to osanseviero's post with 👍 about 1 year ago

Post

Mixture of experts: beware 🛡️⚔️

New paper by DeepMind: Buffer Overflow in MoE Buffer Overflow in Mixture of Experts (2402.05526)

The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.

So if in the same batch we have
- User A benign query
- User B malicious query
The response for A might be altered!😱

How is this possible?
One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).

This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.

How to mitigate this?
- Randomize batch order (and even run twice if some queries are very sensitive)
- Use a large capacity slack
- Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)

Very cool paper!!

621 replies

·

posted an update about 1 year ago

Post

Repo with scripts to create your own moe models using Apple mlx is constantly updated by @mzbac here: https://github.com/mzbac/mlx-moe

It's an amazing resource to learn inner workings of lora on moe with mlx.
It uses https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k as default dataset, but can be easily tweak to use any model or dataset fro HF

Have fun with it!

1 reply

·

reacted to alielfilali01's post with 🤗 about 1 year ago

Post

Hi friends, i'am happy to share with you all a tool that i built a week ago or so, i'am talking here about the "LLM Training Cost Calculator" - a handy tool now available on Hugging Face Spaces! This interactive Gradio app provides an easy-to-use interface for estimating the training costs of large language models (LLMs).

(I've been asked to provide a report about the cost of finetuning each model etc... so i decided to do the lazy job and build a tool for it, Prof later can choose whatever config he likes 😆)

🔍 But Why this is important?
As LLMs continue to grow in size and complexity, understanding the computational and financial requirements is crucial for planning and managing AI projects. I believe this tool simplifies this process, giving you insights into potential expenses based on the number of parameters and tokens in your dataset.

🌟 Features:
- Input the number of parameters (in billions) and tokens (in trillions).
- Adjust for GPU utilization rates and overhead costs.
- Get an instant estimate of your training costs.
+ Choose your GPU (A100 80GB PCle, A100 80GB SXM, V100, H100 SXM, H100 PCle)

📈 Coming Soon:
Plans are in place to expand the calculator's capabilities to include fine-tuning costs for models using LoRA or QLoRA. You'll be able to input a model ID from the Hugging Face Hub, select your fine-tuning strategy, and specify quantization details if using QLoRA.

I believe this tool will be a valuable asset to the AI community, helping to plan and allocate resources more effectively 🤗.

Should you have any suggestions or feedback, please don't hesitate to contribute your thoughts in the comments below. Together, we can refine and enhance this resource for all.

🔗 Try it here : https://huggingface.co/spaces/Ali-C137/LLM-Training-Cost-Calculator

PS : All thanks to Gradio, Hugging Face and the community ofc 🔥 😉

reacted to awni's post with 👍 about 1 year ago

Post

First HF social post:

pip install -U mlx

2 replies

·

Ivan Fioravanti PRO

AI & ML interests

Recent Activity

Organizations

ivanfioravanti's activity