nkpz/DeepHermes-3-Llama-3-8B-Preview-Uncensored-DeLMAT

This model is decensored using a technique I developed called DeLMAT: Decensoring Language Models through Activation Tuning. It's similar to the ablation / "abliteration" scripts that are out there, but works by training a LoRA adapter and calculating a loss based on the distance from the mean refusal activation and the distance between the mean acceptance activation.

The training script is released under the MIT license: https://github.com/nkpz/DeLMAT

Rather than simply attempting to cancel out the refusal direction, DeLMAT guides the model toward an acceptance. In other words, instead of simply forgetting how to refuse requests, it learns to emphatically accept requests.

nkpz
/

DeepHermes-3-Llama-3-8B-Preview-Uncensored-DeLMAT

Model tree for nkpz/DeepHermes-3-Llama-3-8B-Preview-Uncensored-DeLMAT