cgus commited on
Commit
4570d21
·
verified ·
1 Parent(s): 8497bf8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -2
README.md CHANGED
@@ -1,10 +1,35 @@
1
  ---
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
- library_name: transformers
 
 
5
  ---
6
  # Granite Guardian 3.2 5B
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ## Model Summary
9
 
10
  **Granite Guardian 3.2 5B** is a thinned down version of Granite Guardian 3.1 8B designed to detect risks in prompts and responses.
@@ -251,4 +276,4 @@ The model performance is evaluated on sample conversations taken from the [DICES
251
  primaryClass={cs.CL},
252
  url={https://arxiv.org/abs/2412.07724},
253
  }
254
- ```
 
1
  ---
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
+ library_name: exllamav2
5
+ base_model:
6
+ - ibm-granite/granite-guardian-3.2-5b
7
  ---
8
  # Granite Guardian 3.2 5B
9
 
10
+ ## Quants
11
+ [4bpw h6 (main)](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/main)
12
+ [4.5bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/4.5bpw-h6)
13
+ [5bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/5bpw-h6)
14
+ [6bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/6bpw-h6)
15
+ [8bpw h8](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/8bpw-h8)
16
+
17
+ ## Quantization notes
18
+ Made with Exllamav2 0.2.8 with the default dataset. Granite3 models require Exllamav2 0.2.7 or newer.
19
+ Exl2 models don't support native RAM offloading, so the model has to fully fit into GPU VRAM.
20
+ It's also required to use Nvidia RTX on Windows or Nvidia RTX/AMD ROCm on Linux.
21
+
22
+ Just in case if you downloaded the model and it answers only Yes/No, it's [intended behavior](https://github.com/ibm-granite/granite-guardian/tree/main#scope-of-use).
23
+ It's hardcoded in the model's Jinja2 template that can be viewed in tokenizer_config.json file.
24
+ By default in chat mode it evaluates if user's or assistant's message is harmful in general sense according to the model's risk definitions.
25
+ But it allows to choose a different predefined option, to set custom harm definitions or detect risks in RAG or function calling pipelines.
26
+ If you're using TabbyAPI you can either set risk_name or risk_definition via [template variables](https://github.com/theroyallab/tabbyAPI/wiki/04.-Chat-Completions#template-variables).
27
+ For example, you can switch to violence detection by adding: ``"template_vars": {"guardian_config": {"risk_name": "violence"}}`` to v1/chat/completions request.
28
+ For more information refer to Granite Guardian [documentation](https://github.com/ibm-granite/granite-guardian) and its Jinja2 template.
29
+
30
+ # Original model card
31
+ # Granite Guardian 3.2 5B
32
+
33
  ## Model Summary
34
 
35
  **Granite Guardian 3.2 5B** is a thinned down version of Granite Guardian 3.1 8B designed to detect risks in prompts and responses.
 
276
  primaryClass={cs.CL},
277
  url={https://arxiv.org/abs/2412.07724},
278
  }
279
+ ```