state sae (64 |-> 8192 pick 16) 1 sae per head trying to reconstuct S @ ones(64,1)
residual sae (768 |-> 32768 pick 64) sae on residual stream after every time-mix and channel-mix block
training and reported losses are normalized MSE as defined in "Scaling and evaluating sparse autoencoders" by OpenAI
training code: https://github.com/fffffgggg54/StateSAE
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for fffffgggg54/RWKV7-0.1B-World2.8-SAE
Base model
SmerkyG/RWKV7-Goose-0.1B-World2.8-HF