RichardErkhov commited on
Commit
fdaf721
·
verified ·
1 Parent(s): d151b26

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ diablo-italian-base-1.3b - bnb 4bits
11
+ - Model creator: https://huggingface.co/osiria/
12
+ - Original model: https://huggingface.co/osiria/diablo-italian-base-1.3b/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: mit
20
+ language:
21
+ - it
22
+ pipeline_tag: text-generation
23
+ ---
24
+ --------------------------------------------------------------------------------------------------
25
+
26
+ <body>
27
+ <span class="vertical-text" style="background-color:lightgreen;border-radius: 3px;padding: 3px;"> </span>
28
+ <br>
29
+ <span class="vertical-text" style="background-color:orange;border-radius: 3px;padding: 3px;">  </span>
30
+ <br>
31
+ <span class="vertical-text" style="background-color:lightblue;border-radius: 3px;padding: 3px;">    Model: DIABLO 1.3B 🔥</span>
32
+ <br>
33
+ <span class="vertical-text" style="background-color:tomato;border-radius: 3px;padding: 3px;">    Lang: IT</span>
34
+ <br>
35
+ <span class="vertical-text" style="background-color:lightgrey;border-radius: 3px;padding: 3px;">  </span>
36
+ <br>
37
+ <span class="vertical-text" style="background-color:#CF9FFF;border-radius: 3px;padding: 3px;"> </span>
38
+ </body>
39
+
40
+ --------------------------------------------------------------------------------------------------
41
+
42
+ <h3>Model description</h3>
43
+
44
+ This model is a <b>causal</b> language model for the <b>Italian</b> language, based on a GPT-like <b>[1]</b> architecture (more specifically, the model has been obtained by modifying Meta's XGLM architecture <b>[2]</b> and exploiting its 1.7B checkpoint).
45
+
46
+ The model has ~1.3B parameters and a vocabulary of 50.335 tokens. It is a foundation model, pre-trained for causal language modeling, so it is mainly suitable for basic natural language generation, and you will have to fine-tune it in order to use it on more specific downstream tasks.
47
+
48
+ <h3>Quick usage</h3>
49
+
50
+ In order to use the model for inference on GPU, the following pipeline is needed:
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+ import torch
55
+ from transformers import pipeline
56
+
57
+ tokenizer = AutoTokenizer.from_pretrained("osiria/diablo-italian-base-1.3b")
58
+ model = AutoModelForCausalLM.from_pretrained("osiria/diablo-italian-base-1.3b", torch_dtype=torch.float16)
59
+
60
+ device = torch.device("cuda")
61
+ model = model.to(device)
62
+
63
+ pipeline_nlg = pipeline("text-generation", model = model, tokenizer = tokenizer, device = 0)
64
+ pipeline_nlg("Ciao, mi chiamo Marco Rossi e")
65
+
66
+ # [{'generated_text': 'Ciao, mi chiamo Marco Rossi e sono un blogger italiano.'}]
67
+ ```
68
+
69
+
70
+ <h3>Limitations</h3>
71
+
72
+ The model might behave erratically when presented with prompts which are too far away from its pre-training and, because of the probabilistic nature of its generation, it might occasionally produce biased or offensive content with respect to gender, race, ideologies, and political or religious beliefs.
73
+ These limitations imply that the model and its outputs should be used with caution, and should not be involved in situations that require the generated text to be fair or true.
74
+
75
+ <h3>References</h3>
76
+
77
+ [1] https://arxiv.org/abs/2005.14165
78
+
79
+ [2] https://arxiv.org/abs/2112.10668
80
+
81
+ <h3>License</h3>
82
+
83
+ The model is released under <b>MIT</b> license
84
+