Transformers
Keras
English
nicholasKluge commited on
Commit
7713f55
·
verified ·
1 Parent(s): 78b1f35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -66
README.md CHANGED
@@ -6,87 +6,52 @@ datasets:
6
  - AiresPucrs/toxic-comments
7
  library_name: transformers
8
  ---
9
- # Toxicity-classifier
10
 
11
- ## Model Overview
12
 
13
- The toxicity classifier is used to differentiate between non-toxic and toxic comments.
14
 
15
- The model was trained with a dataset composed of toxic and non-toxic comments extracted from web forums.
 
16
 
17
- ## Details
18
- - **Size:** 4,689,681 parameters
19
- - **Model type:** Transformer
20
- - **Number of Epochs:** 20
21
- - **Batch Size:** 16
22
- - **Optimizer:** Adam
23
- - **Learning Rate:** 0.001
24
- - **Hardware:** Tesla V4
25
- - **Emissions:** Not measured
26
- - **Total Energy Consumption:** Not measured
27
 
28
- ## How to Use
 
 
 
 
 
29
 
30
- ⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️
31
 
32
- ```python
33
- import tensorflow as tf
34
 
35
- toxicity_model = tf.keras.models.load_model('toxicity_model.keras')
36
-
37
- with open('toxic_vocabulary.txt', encoding='utf-8') as fp:
38
- vocabulary = [line.strip() for line in fp]
39
- fp.close()
40
 
41
  vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
42
- output_mode="int",
43
- output_sequence_length=100,
44
- vocabulary=vocabulary)
45
 
46
  strings = [
47
- 'I think you should shut up your big mouth',
48
- 'I do not agree with you'
49
  ]
50
 
51
  preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
52
 
53
  for i, string in enumerate(strings):
54
- print(f'{string}\n')
55
- print(f'Toxic 🤬 {round((1 - preds[i][0]) * 100, 2)}% | Not toxic 😊 {round(preds[i][0] * 100, 2)}\n')
56
- print("_" * 50)
57
-
58
- ```
59
-
60
- This will output the following:
61
- ```
62
- I think you should shut up your big mouth
63
-
64
- Toxic 🤬 95.73% | Not toxic 😊 4.27
65
- __________________________________________________
66
- I do not agree with you
67
-
68
- Toxic 🤬 0.99% | Not toxic 😊 99.01
69
- __________________________________________________
70
- ```
71
-
72
- ## Training Data
73
-
74
- - **Dataset:** [Toxic Comment Classification Challenge Dataset](https://huggingface.co/datasets/AiresPucrs/toxic-comments)
75
-
76
- ## Cite as
77
-
78
- ```latex
79
- @misc{teenytinycastle,
80
- doi = {10.5281/zenodo.7112065},
81
- url = {https://github.com/Nkluge-correa/teeny-tiny_castle},
82
- author = {Nicholas Kluge Corr{\^e}a},
83
- title = {Teeny-Tiny Castle},
84
- year = {2024},
85
- publisher = {GitHub},
86
- journal = {GitHub repository},
87
- }
88
- ```
89
-
90
- ## License
91
 
92
- This model is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
 
6
  - AiresPucrs/toxic-comments
7
  library_name: transformers
8
  ---
9
+ # Toxicity Classifier (Teeny-Tiny Castle)
10
 
11
+ This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research.
12
 
13
+ ## How to Use
14
 
15
+ ```python
16
+ from huggingface_hub import hf_hub_download
17
 
18
+ # Download the model (this will be the target of our attack)
19
+ hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
20
+                 filename="toxicity-classifier/toxicity-model.keras",
21
+                 local_dir="./",
22
+                 repo_type="model"
23
+ )
 
 
 
 
24
 
25
+ # Download the tokenizer file
26
+ hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",
27
+                 filename="toxic-vocabulary.txt",
28
+                 local_dir="./",
29
+                 repo_type="model"
30
+ )
31
 
32
+ toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras')
33
 
34
+ # If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt
 
35
 
36
+ with open('toxic-vocabulary.txt', encoding='utf-8') as fp:
37
+ vocabulary = [line.strip() for line in fp]
38
+ fp.close()
 
 
39
 
40
  vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
41
+                                         output_mode="int",
42
+                                         output_sequence_length=100,
43
+                                         vocabulary=vocabulary)
44
 
45
  strings = [
46
+     'I think you should shut up your big mouth',
47
+     'I do not agree with you'
48
  ]
49
 
50
  preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
51
 
52
  for i, string in enumerate(strings):
53
+     print(f'{string}\n')
54
+     print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% | Not toxic 😊 {preds[i][0] * 100:.2f}\n')
55
+     print("_" * 50)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ ```