Announcing NorthStar-0.1 β A Reinforcement-Learning Pipeline for Instructional Control using Gemma-3
April 29, 2025 Full Blog
Tzafon introduces NorthStar-0.1, a smaller-scale proof-of-concept in our family of multi-agent models for web-based decision-making, built using the new Gemma-3 model as a high-capacity supervisor and fine-tuned for goal specification and instrumental agent control.
While most LLMs focus on single-agent instruction following, we continue to explore scalable training for multi-agent reasoning. NorthStar-0.1 represents a deliberately minimal version of our larger Northstar-1 setup, using just one supervisor agent (Gemma-3) paired with a minimal instrumental actor to validate improvements in goal-setting, reward modeling, and delegation under partial observability.
We believe even smaller models can benefit from strong supervision signals if the architecture is structured to exploit multi-agent patterns. Below, we outline the pipeline and decisions that went into building NorthStar-0.1, which acts as a stepping stone toward full Northstar deployments.
- Downloads last month
- 0