--- base_model: - deepseek-ai/DeepSeek-R1-Distill-Llama-8B --- # DeepSeek-R1-Distill-Llama-8B-ENK-Aligned ## Overview **DeepSeek-R1-Distill-Llama-8B-ENK-Aligned** is a safety-aligned version of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B). It has been aligned using the **Enkrypt AI Safety Alignment dataset**, which was generated with the **SAGE** process: > **SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming** > Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024) > [[arXiv:2408.11851]](https://arxiv.org/abs/2408.11851) This alignment significantly **reduces toxicity, harmfulness, and jailbreak vulnerabilities** across various safety topics while **maintaining model performance**. ## Red Team Results ![Safety Comparison](assets/safety_comparison.png) ## Performance Results | Model | MMLU-Pro Score | |--------|----------------| | DeepSeek-R1-Distill-Llama-8B (Base) | **44.71** | | DeepSeek-R1-Distill-Llama-8B-ENK-Aligned | **46.43** | ## Training Configuration The model was trained using the **SimPO (Simple Preference Optimization)** approach with the following hyperparameters: ```yaml cpo_config: loss_type: 'simpo' max_prompt_length: 1800 max_length: 3600 per_device_train_batch_size: 8 gradient_accumulation_steps: 1 learning_rate: 1.8e-6 optim: 'adamw_torch' lr_scheduler_type: 'cosine' gradient_checkpointing: True beta: 5 num_train_epochs: 1 bf16: False simpo_gamma: 0.8 warmup_ratio: 0.1 cpo_alpha: 0.0 ``` ## Key Improvements - **Enhanced Safety**: Significant reduction in harmful or toxic outputs. - **Improved Robustness**: Stronger resistance to adversarial jailbreak prompts. - **Minimal Performance Tradeoff**: Slight improvement in MMLU-Pro despite additional alignment constraints. ## Use Cases This model is ideal for applications requiring **safe, aligned, and high-performance language generation**, including: - **Conversational AI**: Ensuring responsible and aligned assistant behavior. - **Content Moderation**: Filtering harmful content while maintaining contextual understanding. - **Education & Research**: Deploying AI in sensitive environments with reduced risks. --- For questions or contributions, reach out to the **Enkrypt AI** team!