state sae (64 |-> 8192 pick 16) 1 sae per head trying to reconstuct S @ ones(64,1)

image/png

residual sae (768 |-> 32768 pick 64) sae on residual stream after every time-mix and channel-mix block

image/png

training and reported losses are normalized MSE as defined in "Scaling and evaluating sparse autoencoders" by OpenAI

training code: https://github.com/fffffgggg54/StateSAE

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fffffgggg54/RWKV7-0.1B-World2.8-SAE

Finetuned
(1)
this model

Dataset used to train fffffgggg54/RWKV7-0.1B-World2.8-SAE