DeepSeek-R1-Distill-Qwen-7B-News-Classifier
Model Description
DeepSeek-R1-Distill-Qwen-7B-News-Classifier is a fine-tuned version of DeepSeek-R1-Distill-Qwen-7B, specially optimized for news classification tasks. The base model is a distilled version from DeepSeek-R1 using Qwen2.5-Math-7B as its foundation.
Demo
Training Details
Training Data
The model was fine-tuned on a custom dataset of 300 news classification examples in ShareGPT format. Each example contains:
- A news headline with a classification request prefix (e.g., "新闻分类:" or similar)
- The expected category output with reasoning chain
Training Procedure
- Framework: LLaMA Factory
- Fine-tuning Method: LoRA with LoRA+ optimizer
- LoRA Parameters:
- LoRA+ learning rate ratio: 16
- Target modules: all linear layers
- Base learning rate: 5e-6
- Gradient accumulation steps: 2
- Training epochs: 3
Evaluation Results
The model was evaluated on a test set and achieved the following metrics:
- BLEU-4: 29.67
- ROUGE-1: 56.56
- ROUGE-2: 31.31
- ROUGE-L: 39.86
These scores indicate strong performance for the news classification task, with good alignment between model outputs and reference classifications.
Citation
If you use this model in your research, please cite:
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
Acknowledgements
This model was fine-tuned using the LLaMA Factory framework. We appreciate the contributions of the DeepSeek AI team for the original distilled model.
- Downloads last month
- 11