For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference – with more coming soon), we've started enabling Pay as you go (=PAYG)
What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.
You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.
🚀 Excited to share our technical report on the Southeast Asian multilingual model Sailor2 and its latest updates!
Our 49-page report details Sailor2's development journey, including multilingual data cleaning, small model data mixture simulations, multi-stage continual pre-training, multi-stage post-training, and multi-cultural multi-lingual evaluations. Sailor2 aims to streamline the multilingual model pre-training process efficiently for the community.
🧭 We highlight Sailor2's impressive performance in low-resource language translation scenarios and its cultural understanding advantages in Southeast Asia, promoting practical applications for regional languages.
Model updates include: 💡 More precise outputs: Reduced redundancy in model outputs through refined post-training data and optimization techniques. 🌈 Handling longer texts: Expanded to handle up to 128K context length in Southeast Asian languages through long-text training. ⚡️ Faster inference: Achieved 2.5x faster inference speed with speculative decoding. 🌪️ More model sizes: Introduced new sizes of 3B and 14B through model pruning.
🌟 All models are Apache-licensed for commercial use; development tools (code, resources) are open-source.
I couldn't help but notice that our productivity has room for improvement. To address this, we will be engaging in a company-wide morale-building activity designed to boost teamwork, enthusiasm, and *most importantly* results.
I know you're all as excited as I am for this fun and absolutely required initiative. Participation is not just encouraged, it's mandatory. Think of it as a team-bonding experience you never signed up for but will absolutely tolerate.
More details to follow, but for now, mark your calendars and prepare for an engaging experience that will definitely make us all better, stronger, and more synchronized, or at least give us something to talk about later.
I am presenting Decoder-Only Transformer (DOT) Policy a simple Behavioral Control policy that outperforms SOTA models on two simple benchmark tasks:
✅ PushT (pushing an object to a goal) – 84% success on keypoints, 74% on images (previous best: 75% / 69%) ✅ ALOHA Insert (precise bimanual insertion) – 30% success (previous best: ~21%)
The best part? DOT is much smaller (sometimes 100 times less parameters) than previous SOTA models, trains faster, and avoids complexity: 🚫 No generative models (Diffusion, VAE, GANs) 🚫 No discretization/tokenization of actions 🚫 No reinforcement learning or multi-stage training ✅ Just learns from human demos, plain and simple
This is still early — more complex real-life tasks need testing, and no guarantees it will actually work well there, but I think it's interesting to share. Sometimes, simpler approaches can be just as effective (or even better) than complex ones.