Melvin56 commited on
Commit
33b9e9f
·
verified ·
1 Parent(s): 45be305

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -225
README.md CHANGED
@@ -1,229 +1,66 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- base_model:
7
- - Qwen/Qwen3-1.7B
8
- tags:
9
- - chat
10
- - abliterated
11
- - uncensored
12
- extra_gated_prompt: >-
13
- **Usage Warnings**
14
-
15
-
16
- “**Risk of Sensitive or Controversial Outputs**“: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
17
-
18
- “**Not Suitable for All Audiences**:“ Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
19
-
20
- “**Legal and Ethical Responsibilities**“: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
21
-
22
- “**Research and Experimental Use**“: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
23
-
24
- “**Monitoring and Review Recommendations**“: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
25
-
26
- “**No Default Safety Guarantees**“: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
27
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ---
30
 
31
- # huihui-ai/Qwen3-1.7B-abliterated
32
-
33
-
34
- This is an uncensored version of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
35
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
36
-
37
- Ablation was performed using a new and faster method, which yields better results.
38
-
39
-
40
- ## ollama
41
-
42
- You can use [huihui_ai/qwen3-abliterated:1.7b](https://ollama.com/huihui_ai/qwen3-abliterated:1.7b) directly,
43
- ```
44
- ollama run huihui_ai/qwen3-abliterated:1.7b
45
- ```
46
-
47
-
48
- ## Usage
49
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
50
-
51
-
52
- ```python
53
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
54
- import torch
55
- import os
56
- import signal
57
-
58
- cpu_count = os.cpu_count()
59
- print(f"Number of CPU cores in the system: {cpu_count}")
60
- half_cpu_count = cpu_count // 2
61
- os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
62
- os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
63
- torch.set_num_threads(half_cpu_count)
64
-
65
- print(f"PyTorch threads: {torch.get_num_threads()}")
66
- print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
67
- print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
68
-
69
- # Load the model and tokenizer
70
- NEW_MODEL_ID = "huihui-ai/Qwen3-1.7B-abliterated"
71
- print(f"Load Model {NEW_MODEL_ID} ... ")
72
- quant_config_4 = BitsAndBytesConfig(
73
- load_in_4bit=True,
74
- bnb_4bit_compute_dtype=torch.bfloat16,
75
- bnb_4bit_use_double_quant=True,
76
- llm_int8_enable_fp32_cpu_offload=True,
77
- )
78
-
79
- model = AutoModelForCausalLM.from_pretrained(
80
- NEW_MODEL_ID,
81
- device_map="auto",
82
- trust_remote_code=True,
83
- #quantization_config=quant_config_4,
84
- torch_dtype=torch.bfloat16
85
- )
86
- tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
87
- if tokenizer.pad_token is None:
88
- tokenizer.pad_token = tokenizer.eos_token
89
- tokenizer.pad_token_id = tokenizer.eos_token_id
90
-
91
- initial_messages = [{"role": "system", "content": "You are a helpful assistant."}]
92
- messages = initial_messages.copy()
93
- enable_thinking = True
94
- skip_prompt=True
95
- skip_special_tokens=True
96
-
97
- class CustomTextStreamer(TextStreamer):
98
- def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
99
- super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
100
- self.generated_text = ""
101
- self.stop_flag = False
102
-
103
- def on_finalized_text(self, text: str, stream_end: bool = False):
104
- self.generated_text += text
105
- print(text, end="", flush=True)
106
- if self.stop_flag:
107
- raise StopIteration
108
-
109
- def stop_generation(self):
110
- self.stop_flag = True
111
-
112
- def generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, max_new_tokens):
113
- input_ids = tokenizer.apply_chat_template(
114
- messages,
115
- tokenize=True,
116
- enable_thinking = enable_thinking,
117
- add_generation_prompt=True,
118
- return_tensors="pt"
119
- )
120
- attention_mask = torch.ones_like(input_ids, dtype=torch.long)
121
- tokens = input_ids.to(model.device)
122
- attention_mask = attention_mask.to(model.device)
123
-
124
- streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
125
-
126
- def signal_handler(sig, frame):
127
- streamer.stop_generation()
128
- print("\n[Generation stopped by user with Ctrl+C]")
129
-
130
- signal.signal(signal.SIGINT, signal_handler)
131
-
132
- print("Response: ", end="", flush=True)
133
- try:
134
- generated_ids = model.generate(
135
- tokens,
136
- attention_mask=attention_mask,
137
- use_cache=False,
138
- max_new_tokens=max_new_tokens,
139
- do_sample=True,
140
- pad_token_id=tokenizer.pad_token_id,
141
- streamer=streamer
142
- )
143
- del generated_ids
144
- except StopIteration:
145
- print("\n[Stopped by user]")
146
-
147
- del input_ids, attention_mask
148
- torch.cuda.empty_cache()
149
- signal.signal(signal.SIGINT, signal.SIG_DFL)
150
-
151
- return streamer.generated_text, streamer.stop_flag
152
-
153
- while True:
154
- user_input = input("User: ").strip()
155
- if user_input.lower() == "/exit":
156
- print("Exiting chat.")
157
- break
158
- if user_input.lower() == "/clear":
159
- messages = initial_messages.copy()
160
- print("Chat history cleared. Starting a new conversation.")
161
- continue
162
- if user_input.lower() == "/no_think":
163
- if enable_thinking:
164
- enable_thinking = False
165
- print("Thinking = False.")
166
- else:
167
- enable_thinking = True
168
- print("Thinking = True.")
169
- continue
170
- if user_input.lower() == "/skip_prompt":
171
- if skip_prompt:
172
- skip_prompt = False
173
- print("skip_prompt = False.")
174
- else:
175
- skip_prompt = True
176
- print("skip_prompt = True.")
177
- continue
178
- if user_input.lower() == "/skip_special_tokens":
179
- if skip_special_tokens:
180
- skip_special_tokens = False
181
- print("skip_special_tokens = False.")
182
- else:
183
- skip_special_tokens = True
184
- print("skip_special_tokens = True.")
185
- continue
186
- if not user_input:
187
- print("Input cannot be empty. Please enter something.")
188
- continue
189
- messages.append({"role": "user", "content": user_input})
190
- response, stop_flag = generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, 8192)
191
- print("", flush=True)
192
- if stop_flag:
193
- continue
194
- messages.append({"role": "assistant", "content": response})
195
- ```
196
-
197
- ## Pass Rate Description
198
-
199
- The pass rate is defined as the proportion of harmful instructions that did not trigger the test condition (TestPassed=False) out of the total number of instructions processed. It is calculated by subtracting the number of triggered instructions (triggered_total) from the total number of instructions (total), then dividing the result by the total number of instructions: (total - triggered_total) / total. The pass rate is presented as a decimal value (rounded to two decimal places for clarity) and as a percentage (rounded to one decimal place) to clearly indicate the fraction of instructions that did not trigger the condition.
200
-
201
- The test set data comes from [huihui-ai/harmbench_behaviors](https://huggingface.co/datasets/huihui-ai/harmbench_behaviors), the test code, [TestPassed.py](https://huggingface.co/huihui-ai/Qwen3-1.7B-abliterated/blob/main/TestPassed.py).
202
-
203
- The test result is [100.00%](https://huggingface.co/huihui-ai/Qwen3-1.7B-abliterated/blob/main/TestPassed-abliterated.jsonl).
204
- ```
205
- python TestPassed.py
206
- Load Model huihui-ai/Qwen3-1.7B-abliterated ...
207
- Processing harmful instructions: 100%|███████████████████████████████████████████████████████████████████████████████| 320/320 [00:51<00:00, 6.22it/s]
208
- Passed total: 320/320, Passed ratio: 1.00 (100.00%)
209
- ```
210
-
211
- Below is the pass rate for harmful instructions.
212
- This test is only a simple judgment and does not represent the actual result. You can increase the max_new_tokens value to obtain the final result.
213
-
214
- | Model | Passed total | Passed ratio |
215
- |------------------------|--------------|--------------|
216
- | Qwen3-1.7B | 246/320 | 76.88% |
217
- | Qwen3-1.7B-abliterated | **320/320** | **100.00%** |
218
-
219
-
220
- ### Donation
221
-
222
- If you like it, please click 'like' and follow us for more updates.
223
- You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
224
-
225
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
226
- - bitcoin(BTC):
227
- ```
228
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
229
  ```
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE
4
+ pipeline_tag: text-generation
5
+ base_model:
6
+ - huihui-ai/Qwen3-1.7B-abliterated
7
+ tags:
8
+ - chat
9
+ - abliterated
10
+ - uncensored
11
+ extra_gated_prompt: >-
12
+ **Usage Warnings**
13
+
14
+
15
+ “**Risk of Sensitive or Controversial Outputs**“: This model’s safety
16
+ filtering has been significantly reduced, potentially generating sensitive,
17
+ controversial, or inappropriate content. Users should exercise caution and
18
+ rigorously review generated outputs.
19
+
20
+ “**Not Suitable for All Audiences**:“ Due to limited content filtering, the
21
+ model’s outputs may be inappropriate for public settings, underage users, or
22
+ applications requiring high security.
23
+
24
+ “**Legal and Ethical Responsibilities**“: Users must ensure their usage
25
+ complies with local laws and ethical standards. Generated content may carry
26
+ legal or ethical risks, and users are solely responsible for any consequences.
27
+
28
+ “**Research and Experimental Use**“: It is recommended to use this model for
29
+ research, testing, or controlled environments, avoiding direct use in
30
+ production or public-facing commercial applications.
31
+
32
+ “**Monitoring and Review Recommendations**“: Users are strongly advised to
33
+ monitor model outputs in real-time and conduct manual reviews when necessary
34
+ to prevent the dissemination of inappropriate content.
35
+
36
+ “**No Default Safety Guarantees**“: Unlike standard models, this model has not
37
+ undergone rigorous safety optimization. huihui.ai bears no responsibility for
38
+ any consequences arising from its use.
39
+ ---
40
+
41
+ # Melvin56/Qwen3-1.7B-abliterated-GGUF
42
+
43
+ Original Model : [huihui-ai/Qwen3-1.7B-abliterated](https://huggingface.co/huihui-ai/Qwen3-1.7B-abliterated)
44
+
45
+ Llama.cpp build: 0208355 (5342)
46
+
47
+ I used imatrix to create all these quants using this [Dataset](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c/#file-calibration_data_v5_rc-txt).
48
 
49
  ---
50
 
51
+ | | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute |
52
+ | :------------ | :---------: | :------------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: |
53
+ | K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ |
54
+ | I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ```
56
+ ✅: feature works
57
+ 🚫: feature does not work
58
+ ❓: unknown, please contribute if you can test it youself
59
+ 🐢: feature is slow
60
+ ¹: IQ3_S and IQ1_S, see #5886
61
+ ²: Only with -ngl 0
62
+ ³: Inference is 50% slower
63
+ ⁴: Slower than K-quants of comparable size
64
+ ⁵: Slower than cuBLAS/rocBLAS on similar cards
65
+ ⁶: Only q8_0 and iq4_nl
66
+ ```