Jim Lai's picture

Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

updated a model about 8 hours ago
grimjim/MagTie-v1-12B-GGUF
published a model about 9 hours ago
grimjim/MagTie-v1-12B-GGUF
updated a model 1 day ago
grimjim/MagTie-v1-12B
View all activity

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Debased AI's profile picture Anthracite's profile picture Anthracite Core's profile picture

Posts 22

view post
Post
1634
I recently have been looking at a paper titled "Why Warmup the Learning Rate? Underlying Mechanisms and Improvements", by Dayal Singh Kalra and Maissam Barkeshli, and was struck by "warmup" being analogous to simulated annealing.
https://arxiv.org/abs/2406.09405
Taking the physical analogy further, the "warmup" is a stochastic process to knock the system out of current local minima, allowing easier transition toward newer minima. It works because it reduces "fit" and therefore "friction".

Articles 1

Article
2

Exploring SLERP Abliteration