--- license: apache-2.0 language: - en datasets: - AiresPucrs/toxic-comments library_name: transformers --- # Toxicity Classifier (Teeny-Tiny Castle) This model is part of a tutorial tied to the [Teeny-Tiny Castle](https://github.com/Nkluge-correa/TeenyTinyCastle), an open-source repository containing educational tools for AI Ethics and Safety research. ## How to Use ```python from huggingface_hub import hf_hub_download # Download the model (this will be the target of our attack) hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",                 filename="toxicity-classifier/toxicity-model.keras",                 local_dir="./",                 repo_type="model" ) # Download the tokenizer file hf_hub_download(repo_id="AiresPucrs/toxicity-classifier",                 filename="toxic-vocabulary.txt",                 local_dir="./",                 repo_type="model" ) toxicity_model = tf.keras.models.load_model('./toxicity-classifier/toxicity-model.keras') # If you cloned the model repo, the path is toxicity_model/toxic_vocabulary.txt with open('toxic-vocabulary.txt', encoding='utf-8') as fp: vocabulary = [line.strip() for line in fp] fp.close() vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,                                         output_mode="int",                                         output_sequence_length=100,                                         vocabulary=vocabulary) strings = [     'I think you should shut up your big mouth',     'I do not agree with you' ] preds = toxicity_model.predict(vectorization_layer(strings),verbose=0) for i, string in enumerate(strings):     print(f'{string}\n')     print(f'Toxic 🤬 {(1 - preds[i][0]) * 100:.2f)}% | Not toxic 😊 {preds[i][0] * 100:.2f}\n')     print("_" * 50) ```