Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,35 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-generation
|
4 |
-
library_name:
|
|
|
|
|
5 |
---
|
6 |
# Granite Guardian 3.2 5B
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
## Model Summary
|
9 |
|
10 |
**Granite Guardian 3.2 5B** is a thinned down version of Granite Guardian 3.1 8B designed to detect risks in prompts and responses.
|
@@ -251,4 +276,4 @@ The model performance is evaluated on sample conversations taken from the [DICES
|
|
251 |
primaryClass={cs.CL},
|
252 |
url={https://arxiv.org/abs/2412.07724},
|
253 |
}
|
254 |
-
```
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-generation
|
4 |
+
library_name: exllamav2
|
5 |
+
base_model:
|
6 |
+
- ibm-granite/granite-guardian-3.2-5b
|
7 |
---
|
8 |
# Granite Guardian 3.2 5B
|
9 |
|
10 |
+
## Quants
|
11 |
+
[4bpw h6 (main)](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/main)
|
12 |
+
[4.5bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/4.5bpw-h6)
|
13 |
+
[5bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/5bpw-h6)
|
14 |
+
[6bpw h6](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/6bpw-h6)
|
15 |
+
[8bpw h8](https://huggingface.co/cgus/granite-guardian-3.2-5b-exl2/tree/8bpw-h8)
|
16 |
+
|
17 |
+
## Quantization notes
|
18 |
+
Made with Exllamav2 0.2.8 with the default dataset. Granite3 models require Exllamav2 0.2.7 or newer.
|
19 |
+
Exl2 models don't support native RAM offloading, so the model has to fully fit into GPU VRAM.
|
20 |
+
It's also required to use Nvidia RTX on Windows or Nvidia RTX/AMD ROCm on Linux.
|
21 |
+
|
22 |
+
Just in case if you downloaded the model and it answers only Yes/No, it's [intended behavior](https://github.com/ibm-granite/granite-guardian/tree/main#scope-of-use).
|
23 |
+
It's hardcoded in the model's Jinja2 template that can be viewed in tokenizer_config.json file.
|
24 |
+
By default in chat mode it evaluates if user's or assistant's message is harmful in general sense according to the model's risk definitions.
|
25 |
+
But it allows to choose a different predefined option, to set custom harm definitions or detect risks in RAG or function calling pipelines.
|
26 |
+
If you're using TabbyAPI you can either set risk_name or risk_definition via [template variables](https://github.com/theroyallab/tabbyAPI/wiki/04.-Chat-Completions#template-variables).
|
27 |
+
For example, you can switch to violence detection by adding: ``"template_vars": {"guardian_config": {"risk_name": "violence"}}`` to v1/chat/completions request.
|
28 |
+
For more information refer to Granite Guardian [documentation](https://github.com/ibm-granite/granite-guardian) and its Jinja2 template.
|
29 |
+
|
30 |
+
# Original model card
|
31 |
+
# Granite Guardian 3.2 5B
|
32 |
+
|
33 |
## Model Summary
|
34 |
|
35 |
**Granite Guardian 3.2 5B** is a thinned down version of Granite Guardian 3.1 8B designed to detect risks in prompts and responses.
|
|
|
276 |
primaryClass={cs.CL},
|
277 |
url={https://arxiv.org/abs/2412.07724},
|
278 |
}
|
279 |
+
```
|