Add pipeline tag, license, link to code, and chain-of-thought tag
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,16 +1,22 @@
|
|
1 |
---
|
|
|
2 |
library_name: transformers
|
3 |
-
|
|
|
|
|
4 |
---
|
|
|
5 |
**Repository for:**
|
6 |
|
7 |
**ThinkEdit-deepseek-llama3-8b**
|
8 |
|
9 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
|
10 |
|
11 |
-
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
|
12 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
13 |
|
|
|
|
|
14 |
---
|
15 |
|
16 |
## Introduction
|
@@ -19,8 +25,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
|
|
19 |
|
20 |
**ThinkEdit** is a lightweight weight-editing method that:
|
21 |
|
22 |
-
- Identifies
|
23 |
-
- Edits only
|
24 |
- Removes the "short reasoning" direction from their output
|
25 |
- Boosts performance, especially on cases with short reasoning traces
|
26 |
|
@@ -75,12 +81,12 @@ The usage of ThinkEdit models is exactly the same as the original deepseek-disti
|
|
75 |
|
76 |
```bibtex
|
77 |
@misc{sun2025thinkedit,
|
78 |
-
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
|
79 |
author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
|
80 |
year={2025},
|
81 |
eprint={2503.22048},
|
82 |
archivePrefix={arXiv},
|
83 |
primaryClass={cs.CL},
|
84 |
-
url={https://arxiv.org/abs/2503.22048},
|
85 |
}
|
86 |
```
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
tags:
|
6 |
+
- chain-of-thought
|
7 |
---
|
8 |
+
|
9 |
**Repository for:**
|
10 |
|
11 |
**ThinkEdit-deepseek-llama3-8b**
|
12 |
|
13 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
|
14 |
|
15 |
+
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
|
16 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
17 |
|
18 |
+
Code: https://github.com/Trustworthy-ML-Lab/ThinkEdit
|
19 |
+
|
20 |
---
|
21 |
|
22 |
## Introduction
|
|
|
25 |
|
26 |
**ThinkEdit** is a lightweight weight-editing method that:
|
27 |
|
28 |
+
- Identifies ~2% of "short reasoning" attention heads
|
29 |
+
- Edits only ~0.1% of total parameters
|
30 |
- Removes the "short reasoning" direction from their output
|
31 |
- Boosts performance, especially on cases with short reasoning traces
|
32 |
|
|
|
81 |
|
82 |
```bibtex
|
83 |
@misc{sun2025thinkedit,
|
84 |
+
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
|
85 |
author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
|
86 |
year={2025},
|
87 |
eprint={2503.22048},
|
88 |
archivePrefix={arXiv},
|
89 |
primaryClass={cs.CL},
|
90 |
+
url={https://arxiv.org/abs/2503.22048},
|
91 |
}
|
92 |
```
|