Add pipeline tag, license and link to the code
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,7 +1,10 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
tags: []
|
|
|
|
|
4 |
---
|
|
|
5 |
**Repository for:**
|
6 |
|
7 |
**ThinkEdit-deepseek-qwen-14b**
|
@@ -9,7 +12,8 @@ tags: []
|
|
9 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-llama3-8b.)
|
10 |
|
11 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
12 |
-
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
|
|
13 |
|
14 |
---
|
15 |
|
@@ -19,8 +23,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
|
|
19 |
|
20 |
**ThinkEdit** is a lightweight weight-editing method that:
|
21 |
|
22 |
-
- Identifies
|
23 |
-
- Edits only
|
24 |
- Removes the "short reasoning" direction from their output
|
25 |
- Boosts performance, especially on cases with short reasoning traces
|
26 |
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
tags: []
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
license: other
|
6 |
---
|
7 |
+
|
8 |
**Repository for:**
|
9 |
|
10 |
**ThinkEdit-deepseek-qwen-14b**
|
|
|
12 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-llama3-8b.)
|
13 |
|
14 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
15 |
+
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)\
|
16 |
+
**Code:** https://github.com/Trustworthy-ML-Lab/ThinkEdit
|
17 |
|
18 |
---
|
19 |
|
|
|
23 |
|
24 |
**ThinkEdit** is a lightweight weight-editing method that:
|
25 |
|
26 |
+
- Identifies ~2% of "short reasoning" attention heads
|
27 |
+
- Edits only ~0.1% of total parameters
|
28 |
- Removes the "short reasoning" direction from their output
|
29 |
- Boosts performance, especially on cases with short reasoning traces
|
30 |
|