Announcing NorthStar-0.1 — A Reinforcement-Learning Pipeline for Instructional Control using Gemma-3

April 29, 2025 Full Blog

Tzafon introduces NorthStar-0.1, a smaller-scale proof-of-concept in our family of multi-agent models for web-based decision-making, built using the new Gemma-3 model as a high-capacity supervisor and fine-tuned for goal specification and instrumental agent control.

While most LLMs focus on single-agent instruction following, we continue to explore scalable training for multi-agent reasoning. NorthStar-0.1 represents a deliberately minimal version of our larger Northstar-1 setup, using just one supervisor agent (Gemma-3) paired with a minimal instrumental actor to validate improvements in goal-setting, reward modeling, and delegation under partial observability.

We believe even smaller models can benefit from strong supervision signals if the architecture is structured to exploit multi-agent patterns. Below, we outline the pipeline and decisions that went into building NorthStar-0.1, which acts as a stepping stone toward full Northstar deployments.

Tzafon
/

NorthStar-0.1

You need to agree to share your contact information to access this model

Announcing NorthStar-0.1 — A Reinforcement-Learning Pipeline for Instructional Control using Gemma-3

Model tree for Tzafon/NorthStar-0.1