avemio-digital commited on
Commit
2f30029
·
verified ·
1 Parent(s): dba633a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: llama3.1
3
  datasets:
4
- - avemio/German_RAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
@@ -18,25 +18,25 @@ tags:
18
  ---
19
 
20
 
21
- <img src="https://www.German_RAG.ai/wp-content/uploads/2024/12/German_RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German_RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
- # German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
- **German_RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
- Our German_RAG-LLAMA-CPT model are trained on this **[German_RAG-CPT](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
- | [German_RAG-LLAMA-CPT](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI) | 507.47 million |
38
- | [German_RAG-LLAMA-SFT](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-SFT-HESSIAN-AI) | 2.03 billion |
39
- | [German_RAG-LLAMA-ORPO](https://huggingface.co/avemio/German_RAG-LLAMA-3.1-8B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
@@ -46,19 +46,19 @@ The core models released in this batch are the following:
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
- - **Contact:** [German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
- - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German_RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
- - [German_RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-HARD-BENCHMARK.git)
61
- - [German_RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-EASY-BENCHMARK.git)
62
 
63
  - **Technical blog post:**
64
  <!-- - **Press release:** TODO -->
@@ -73,7 +73,7 @@ Now, proceed as usual with HuggingFace:
73
  ```python
74
  from transformers import AutoModelForCausalLM, AutoTokenizer
75
 
76
- model_name = "avemio/German_RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI"
77
 
78
  tokenizer = AutoTokenizer.from_pretrained(model_name)
79
 
@@ -93,7 +93,7 @@ We are providing a comprehensive Google Colab notebook to guide users through th
93
  ## Model Details
94
 
95
  ### Data
96
- For training data details, please see the [German_RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) documentation.
97
 
98
  #### Description
99
  CPT – Continued Pre-Training
@@ -109,7 +109,7 @@ The summarization task teaches models to distill complex information into clear,
109
  ### Architecture
110
 
111
 
112
- | Parameter | German_RAG-LLAMA-CPT |
113
  |-----------------------|-----------------------------------------------------------------------------------------------|
114
  | **d_model** | 4096 |
115
  | **num heads** | 32 |
@@ -127,7 +127,7 @@ The summarization task teaches models to distill complex information into clear,
127
  ### Hyperparameters
128
 
129
 
130
- | Parameter | German_RAG-LLAMA-CPT |
131
  |---------------------------|--------------------|
132
  | **warmup steps** | 50 |
133
  | **peak LR** | 5.0E-07 |
@@ -138,19 +138,19 @@ The summarization task teaches models to distill complex information into clear,
138
 
139
  ## Environmental Impact
140
 
141
- German_RAG-LLAMA-CPT, running on NVIDIA A100 with 40 GPUs for 3 days, has an approximate power consumption as follows:
142
 
143
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
144
 
145
  | Model | GPU Type | Power Consumption From GPUs |
146
  |----------------|---------------------|-----------------------------|
147
- | German_RAG-LLAMA-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00858 MWh |
148
  ## Bias, Risks, and Limitations
149
 
150
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
151
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
152
 
153
- Otherwise, many facts from German_RAG-LLAMA-CPT or any LLM will often not be true, so they should be checked.
154
 
155
 
156
 
@@ -158,9 +158,9 @@ Otherwise, many facts from German_RAG-LLAMA-CPT or any LLM will often not be tru
158
  ## Model Card Contact
159
 
160
 
161
- For errors in this model card, please contact ([German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)).
162
 
163
- ## The German_RAG AI Team
164
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
165
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
166
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)
 
1
  ---
2
  license: llama3.1
3
  datasets:
4
+ - avemio/German-RAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
 
18
  ---
19
 
20
 
21
+ <img src="https://www.German-RAG.ai/wp-content/uploads/2024/12/German-RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German-RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
+ # German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
+ **German-RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
+ Our German-RAG-LLAMA-CPT model are trained on this **[German-RAG-CPT](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
+ | [German-RAG-LLAMA-CPT](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI) | 507.47 million |
38
+ | [German-RAG-LLAMA-SFT](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-SFT-HESSIAN-AI) | 2.03 billion |
39
+ | [German-RAG-LLAMA-ORPO](https://huggingface.co/avemio/German-RAG-LLAMA-3.1-8B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
 
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
+ - **Contact:** [German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
+ - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German-RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
+ - [German-RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-HARD-BENCHMARK.git)
61
+ - [German-RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German-RAG-LLM-EASY-BENCHMARK.git)
62
 
63
  - **Technical blog post:**
64
  <!-- - **Press release:** TODO -->
 
73
  ```python
74
  from transformers import AutoModelForCausalLM, AutoTokenizer
75
 
76
+ model_name = "avemio/German-RAG-LLAMA-3.1-8B-CPT-HESSIAN-AI"
77
 
78
  tokenizer = AutoTokenizer.from_pretrained(model_name)
79
 
 
93
  ## Model Details
94
 
95
  ### Data
96
+ For training data details, please see the [German-RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German-RAG-CPT-HESSIAN-AI) documentation.
97
 
98
  #### Description
99
  CPT – Continued Pre-Training
 
109
  ### Architecture
110
 
111
 
112
+ | Parameter | German-RAG-LLAMA-CPT |
113
  |-----------------------|-----------------------------------------------------------------------------------------------|
114
  | **d_model** | 4096 |
115
  | **num heads** | 32 |
 
127
  ### Hyperparameters
128
 
129
 
130
+ | Parameter | German-RAG-LLAMA-CPT |
131
  |---------------------------|--------------------|
132
  | **warmup steps** | 50 |
133
  | **peak LR** | 5.0E-07 |
 
138
 
139
  ## Environmental Impact
140
 
141
+ German-RAG-LLAMA-CPT, running on NVIDIA A100 with 40 GPUs for 3 days, has an approximate power consumption as follows:
142
 
143
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
144
 
145
  | Model | GPU Type | Power Consumption From GPUs |
146
  |----------------|---------------------|-----------------------------|
147
+ | German-RAG-LLAMA-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00858 MWh |
148
  ## Bias, Risks, and Limitations
149
 
150
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
151
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
152
 
153
+ Otherwise, many facts from German-RAG-LLAMA-CPT or any LLM will often not be true, so they should be checked.
154
 
155
 
156
 
 
158
  ## Model Card Contact
159
 
160
 
161
+ For errors in this model card, please contact ([German-RAG@avemio.digital](mailto:German-RAG@avemio.digital)).
162
 
163
+ ## The German-RAG AI Team
164
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
165
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
166
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)