Add pipeline tag, license, link to code, and chain-of-thought tag

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -1,16 +1,22 @@
1
  ---
 
2
  library_name: transformers
3
- tags: []
 
 
4
  ---
 
5
  **Repository for:**
6
 
7
  **ThinkEdit-deepseek-llama3-8b**
8
 
9
  (We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
10
 
11
- **Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
12
  **Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
13
 
 
 
14
  ---
15
 
16
  ## Introduction
@@ -19,8 +25,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
19
 
20
  **ThinkEdit** is a lightweight weight-editing method that:
21
 
22
- - Identifies \~2% of "short reasoning" attention heads
23
- - Edits only \~0.1% of total parameters
24
  - Removes the "short reasoning" direction from their output
25
  - Boosts performance, especially on cases with short reasoning traces
26
 
@@ -75,12 +81,12 @@ The usage of ThinkEdit models is exactly the same as the original deepseek-disti
75
 
76
  ```bibtex
77
  @misc{sun2025thinkedit,
78
- title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
79
  author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
80
  year={2025},
81
  eprint={2503.22048},
82
  archivePrefix={arXiv},
83
  primaryClass={cs.CL},
84
- url={https://arxiv.org/abs/2503.22048},
85
  }
86
  ```
 
1
  ---
2
+ license: mit
3
  library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - chain-of-thought
7
  ---
8
+
9
  **Repository for:**
10
 
11
  **ThinkEdit-deepseek-llama3-8b**
12
 
13
  (We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
14
 
15
+ **Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
16
  **Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
17
 
18
+ Code: https://github.com/Trustworthy-ML-Lab/ThinkEdit
19
+
20
  ---
21
 
22
  ## Introduction
 
25
 
26
  **ThinkEdit** is a lightweight weight-editing method that:
27
 
28
+ - Identifies ~2% of "short reasoning" attention heads
29
+ - Edits only ~0.1% of total parameters
30
  - Removes the "short reasoning" direction from their output
31
  - Boosts performance, especially on cases with short reasoning traces
32
 
 
81
 
82
  ```bibtex
83
  @misc{sun2025thinkedit,
84
+ title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
85
  author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
86
  year={2025},
87
  eprint={2503.22048},
88
  archivePrefix={arXiv},
89
  primaryClass={cs.CL},
90
+ url={https://arxiv.org/abs/2503.22048},
91
  }
92
  ```