avemio
/

German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI

@@ -1,7 +1,7 @@
 ---
 license: llama3.1
 datasets:
-- avemio/German_RAG-CPT-HESSIAN-AI
 language:
 - en
 - de
@@ -18,25 +18,25 @@ tags:
 ---
-<img src="https://www.German_RAG.ai/wp-content/uploads/2024/12/German_RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German_RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
-# German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI
 <!-- Provide a quick summary of what the model is/does. -->
-**German_RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
-Our German_RAG-LLAMA-CPT model are trained on this **[German_RAG-CPT](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) dataset.**
 ## Model Details
 The core models released in this batch are the following:
 | Size | Training Tokens |
 |------|--------|
-| [German_RAG-LLAMA-CPT](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI)   | 507.47 million |
-| [German_RAG-LLAMA-SFT](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-SFT-HESSIAN-AI) |  2.03 billion  |
-| [German_RAG-LLAMA-ORPO](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-ORPO-HESSIAN-AI) |  2.0577 billion  |
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
@@ -46,19 +46,19 @@ The core models released in this batch are the following:
 - **Model type:** a Transformer style autoregressive language model.
 - **Language(s) (NLP):** German, English
 - **License:** The code and model are released under Apache 2.0.
-- **Contact:** [German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)
 ### Model Sources
 <!-- Provide the basic links for the model. -->
-- **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German_RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
 - **Repositories:**
     - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
     - Evaluation code:
-        - [German_RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-HARD-BENCHMARK.git)
-        - [German_RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-EASY-BENCHMARK.git)
 - **Technical blog post:**
 <!-- - **Press release:** TODO -->
@@ -73,7 +73,7 @@ Now, proceed as usual with HuggingFace:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "avemio/German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -93,7 +93,7 @@ We are providing a comprehensive Google Colab notebook to guide users through th
 ## Model Details
 ### Data
-For training data details, please see the [German_RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) documentation.
 #### Description
 CPT – Continued Pre-Training
@@ -109,7 +109,7 @@ The summarization task teaches models to distill complex information into clear,
 ### Architecture
-| Parameter            | German_RAG-LLAMA-CPT                                                                                   |
 |-----------------------|-----------------------------------------------------------------------------------------------|
 | **d_model**          | 4096                                                                                          |
 | **num heads**        | 32                                                                                            |
@@ -127,7 +127,7 @@ The summarization task teaches models to distill complex information into clear,
 ### Hyperparameters
-| Parameter                 | German_RAG-LLAMA-CPT       |
 |---------------------------|--------------------|
 | **warmup steps**          | 50                |
 | **peak LR**               | 5.0E-07           |
@@ -138,19 +138,19 @@ The summarization task teaches models to distill complex information into clear,
 ## Environmental Impact
-German_RAG-LLAMA-CPT, running on NVIDIA A100 with 40 GPUs for 3 days, has an approximate power consumption as follows:
 It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
 | Model          | GPU Type            | Power Consumption From GPUs |
 |----------------|---------------------|-----------------------------|
-| German_RAG-LLAMA-CPT   | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00858 MWh                     |
 ## Bias, Risks, and Limitations
 Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
 Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
-Otherwise, many facts from German_RAG-LLAMA-CPT or any LLM will often not be true, so they should be checked.
@@ -158,9 +158,9 @@ Otherwise, many facts from German_RAG-LLAMA-CPT or any LLM will often not be tru
 ## Model Card Contact
-For errors in this model card, please contact ([German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)).
-## The German_RAG AI Team
 [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
 [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
 [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)

 ---
 license: llama3.1
 datasets:
+- avemio/German-RAG-CPT-HESSIAN-AI
 language:
 - en
 - de
 ---
+<img src="https://www.German-RAG.ai/wp-content/uploads/2024/12/German-RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German-RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI
 <!-- Provide a quick summary of what the model is/does. -->
+**German-RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
+Our German-RAG-LLAMA-CPT model are trained on this **[German-RAG-CPT](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) dataset.**
 ## Model Details
 The core models released in this batch are the following:
 | Size | Training Tokens |
 |------|--------|
+| [German-RAG-LLAMA-CPT](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI)   | 507.47 million |
+| [German-RAG-LLAMA-SFT](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-SFT-HESSIAN-AI) |  2.03 billion  |
+| [German-RAG-LLAMA-ORPO](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-ORPO-HESSIAN-AI) |  2.0577 billion  |
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 - **Model type:** a Transformer style autoregressive language model.
 - **Language(s) (NLP):** German, English
 - **License:** The code and model are released under Apache 2.0.
+- **Contact:** [German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)
 ### Model Sources
 <!-- Provide the basic links for the model. -->
+- **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German-RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
 - **Repositories:**
     - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
     - Evaluation code:
+        - [German-RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-HARD-BENCHMARK.git)
+        - [German-RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-EASY-BENCHMARK.git)
 - **Technical blog post:**
 <!-- - **Press release:** TODO -->
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "avemio/German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 ## Model Details
 ### Data
+For training data details, please see the [German-RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) documentation.
 #### Description
 CPT – Continued Pre-Training
 ### Architecture
+| Parameter            | German-RAG-LLAMA-CPT                                                                                   |
 |-----------------------|-----------------------------------------------------------------------------------------------|
 | **d_model**          | 4096                                                                                          |
 | **num heads**        | 32                                                                                            |
 ### Hyperparameters
+| Parameter                 | German-RAG-LLAMA-CPT       |
 |---------------------------|--------------------|
 | **warmup steps**          | 50                |
 | **peak LR**               | 5.0E-07           |
 ## Environmental Impact
+German-RAG-LLAMA-CPT, running on NVIDIA A100 with 40 GPUs for 3 days, has an approximate power consumption as follows:
 It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
 | Model          | GPU Type            | Power Consumption From GPUs |
 |----------------|---------------------|-----------------------------|
+| German-RAG-LLAMA-CPT   | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00858 MWh                     |
 ## Bias, Risks, and Limitations
 Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
 Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
+Otherwise, many facts from German-RAG-LLAMA-CPT or any LLM will often not be true, so they should be checked.
 ## Model Card Contact
+For errors in this model card, please contact ([German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)).
+## The German-RAG AI Team
 [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
 [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
 [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)