Add text-generation pipeline tag, link to code and license (#1)
Browse files- Add text-generation pipeline tag, link to code and license (99cff5c69181be68896d35ddb1a4f8aae7fba44b)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,7 +1,10 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
tags: []
|
|
|
|
|
4 |
---
|
|
|
5 |
**Repository for:**
|
6 |
|
7 |
**ThinkEdit-deepseek-qwen-1.5b**
|
@@ -11,6 +14,8 @@ tags: []
|
|
11 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
12 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
13 |
|
|
|
|
|
14 |
---
|
15 |
|
16 |
## Introduction
|
@@ -19,8 +24,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
|
|
19 |
|
20 |
**ThinkEdit** is a lightweight weight-editing method that:
|
21 |
|
22 |
-
- Identifies
|
23 |
-
- Edits only
|
24 |
- Removes the "short reasoning" direction from their output
|
25 |
- Boosts performance, especially on cases with short reasoning traces
|
26 |
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
tags: []
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
license: mit
|
6 |
---
|
7 |
+
|
8 |
**Repository for:**
|
9 |
|
10 |
**ThinkEdit-deepseek-qwen-1.5b**
|
|
|
14 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
15 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
16 |
|
17 |
+
Github: https://github.com/Trustworthy-ML-Lab/ThinkEdit
|
18 |
+
|
19 |
---
|
20 |
|
21 |
## Introduction
|
|
|
24 |
|
25 |
**ThinkEdit** is a lightweight weight-editing method that:
|
26 |
|
27 |
+
- Identifies ~2% of "short reasoning" attention heads
|
28 |
+
- Edits only ~0.1% of total parameters
|
29 |
- Removes the "short reasoning" direction from their output
|
30 |
- Boosts performance, especially on cases with short reasoning traces
|
31 |
|