QuantFactory
/

PersianMind-v1.0-GGUF

+---
+license: cc-by-nc-sa-4.0
+language:
+- multilingual
+- fa
+- en
+library_name: transformers
+tags:
+- text-generation-inference
+inference: false
+metrics:
+- bleu
+- comet
+- accuracy
+- perplexity
+- spearmanr
+pipeline_tag: text-generation
+co2_eq_emissions:
+  emissions: 232380
+  source: "PersianMind: A Cross-Lingual Persian-English Large Language Model. https://arxiv.org/abs/2401.06466"
+  training_type: "fine-tuning"
+  hardware_used: "4 RTX3090 24GB GPUs"
+  geographical_location: "Tehran, Iran"
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/PersianMind-v1.0-GGUF
+This is quantized version of [universitytehran/PersianMind-v1.0](https://huggingface.co/universitytehran/PersianMind-v1.0) created using llama.cpp
+# Original Model Card
+<p align="center">
+  <img src="PersianMind.jpg" alt="PersianMind logo" width=200/>
+</p>
+# <span style="font-variant:small-caps;">PersianMind</span>
+<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
+The model achieves state-of-the-art results on Persian subset of the [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) benchmark
+and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
+It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.
+## Model Description
+- **Developed by:** [Pedram Rostami](mailto:[email protected]), [Ali Salemi](mailto:[email protected]), and [Mohammad Javad Dousti](mailto:[email protected])
+- **Model type:** Language model
+- **Languages:** English and Persian
+- **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (non-commercial use only.)
+## How to Get Started with the Model
+Use the code below to get started with the model.
+Note that you need to install <code><b>sentencepiece</b></code> and <code><b>accelerate</b></code> libraries along with <code><b>PyTorch</b></code> and <code><b>🤗Transformers</b></code> to run this code.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = AutoModelForCausalLM.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=True,
+    device_map={"": device},
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+)
+TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: "
+CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \
+    "NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \
+    "providing recommendations, and helping with decision making. You can ask it anything you want and " \
+    "it will do its best to give you accurate and relevant information."
+PROMPT = "در مورد هوش مصنوعی توضیح بده."
+model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT)
+input_tokens = tokenizer(model_input, return_tensors="pt")
+input_tokens = input_tokens.to(device)
+generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1)
+model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+print(model_output[len(model_input):])
+```
+### How to Quantize the Model
+Quantized models can be run on resource-constrained devices.
+To quantize the model, you should install the <code><b>bitsandbytes</b></code> library.
+In order to quantize the model in 8-bit (`INT8`), use the code below.
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+    device_map="auto",
+    low_cpu_mem_usage=True,
+    load_in_8bit=True
+)
+```
+Alternatively, you can quantize the model in 4-bit (`NormalFloat4`) with the following code.
+```python
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+### Evaluating Quantized Models
+| Model                                                              | <span style="font-variant:small-caps;">Belebele</span> (Persian) | Fa→En Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | En→Fa Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | Model Size | Tokens/sec |
+| :----------------------------------------------------------------: | :--------------------------------------------------------------: | :------------------------------------------------------------------------: | :------------------------------------------------------------------------: | :--------: | :--------: |
+| <span style="font-variant:small-caps;">PersianMind</span> (`BF16`) |        73.9                                                      |                                   83.61                                    |                                     79.44                                  |   13.7G    |   25.35    |
+| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) |        73.7                                                      |                                   82.32                                    |                                     78.61                                  |    7.2G    |   11.36    |
+| <span style="font-variant:small-caps;">PersianMind</span> (`NF4`) |        70.2                                                      |                                   82.07                                    |                                     80.36                                  |    3.9G    |   24.36    |
+We evaluated quantized models in various tasks against the original model.
+Specifically, we evaluated all models using the reading comprehension multiple-choice
+question-answering benchmark of [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model.
+Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
+For this, we utilized the Persian-English subset of the [<span style="font-variant:small-caps;">Flores</span>-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
+reported our results using the <span style="font-variant:small-caps;">Comet</span> metric.
+Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks.
+To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint()` function.
+## License
+<span style="font-variant:small-caps;">PersianMind</span> is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
+It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
+Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page.
+If you suspect any violations, please reach out to us.
+## Citation
+If you find this model helpful, please ensure to cite the following paper.
+**BibTeX:**
+```bibtex
+@misc{persianmind,
+  title={{PersianMind: A Cross-Lingual Persian-English Large Language Model}},
+  author={Rostami, Pedram and Salemi, Ali and Dousti, Mohammad Javad},
+  year={2024}
+  eprint={2401.06466},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL}
+}
+```