This model is decensored using a technique I developed called DeLMAT: Decensoring Language Models through Activation Tuning. It's similar to the ablation / "abliteration" scripts that are out there, but works by training a LoRA adapter and calculating a loss based on the distance from the mean refusal activation and the distance between the mean acceptance activation.
The training script is released under the MIT license: https://github.com/nkpz/DeLMAT
Rather than simply attempting to cancel out the refusal direction, DeLMAT guides the model toward an acceptance. In other words, instead of simply forgetting how to refuse requests, it learns to emphatically accept requests.
- Downloads last month
- 87
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.