|
--- |
|
library_name: transformers |
|
tags: |
|
- climate |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
SPARK-mini-base is a 3.8B parameter, domain specific, language model trained on an extensive dataset curated from documents generated by the nuclear power industry. |
|
|
|
The model was developed by continuously-pretraining Microsoft's Phi-3-mini-4k-instruct with over 35B tokens of high quality data curated from millions of public documents originating within the nuclear power domain. |
|
SPARK-mini-base was trained by Nuclearn AI, and is released as a research artifact, demonstration tool, and domain specific base LLM for further fine tuning by downstream practitioners working within or tangetial to the nuclear industry domain. |
|
|
|
SPARK-mini-base is trained using next token prediction objective without any alignment - it requires multishot prompting to respond properly. An instruction tuned version is available at [SPARK-mini-instruct](https://huggingface.co/NuclearnAI/SPARK-mini-instruct). |
|
|
|
## Uses |
|
|
|
SPARK-mini-base is a base LLM with no alignment process (SFT, RLHF, etc) applied and like other base models, must be multi-shot prompted for adequate performance. For a model with instruction based alignment, please see [SPARK-mini-instruct](https://huggingface.co/NuclearnAI/SPARK-mini-instruct). |
|
|
|
Nuclearn targets a few specific use cases with this open-source model release: |
|
|
|
1. Accelerating the work of technical staff at national research labs or regulatory agencies by providing a domain specific language model from which futher use cases can be fine tuned. |
|
2. Improving the performance of systems deployed in the Nuclear industry that currently utilize language models as feature extractors or model trunks in predictive AI systems. |
|
3. Accessibilty for practitioners without hardware accelerator or cloud connection capablities. |
|
|
|
### Direct Use |
|
|
|
SPARK-mini-base is a base model without alignment - multishot prompting is required. Prompting techniques should follow techniques applicable to other base language models without alignment. See huggingface prompting [docs](https://huggingface.co/docs/transformers/main/en/tasks/prompting#base-vs-instructchat-models). |
|
|
|
SPARK-mini-base is trained with 'prompt pre-training' as demonstrated in [*Galactica: A Large Language Model for Science*](https://arxiv.org/pdf/2211.09085) for steerability in different dimensions important to end users. |
|
|
|
### License |
|
|
|
License: [CC-BY-NC](https://creativecommons.org/licenses/by-nc/4.0/deed.en) with exceptions made below for unrestricted use. |
|
|
|
The license permits free use by a limited number of commercial entities including: |
|
|
|
1. Operating nuclear utilties |
|
2. Regulatory Bodies (Commercial or Government) |
|
3. Research Labs and Research Focused groups (e.g. National Laboratories and Electric Power Specific Research Groups) |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- This model has been trained extensively on Nuclear Power related information, but like every LM, still makes factual and logical mistakes. |
|
- The model should not be used for production use cases without futher training or applicable guardrails. |
|
- Intentional bias has been trained into the model for steerability |
|
- Base model is trained without text formatting. Further fine tuning will be needed for formatted responses (see SPARK-mini-instruct). |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
# Requires transformers 4.41 for Phi3 compatibility |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "nuclearnai/SPARK-mini-base" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name |
|
).to("cuda") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
model_name |
|
) |
|
|
|
# Generate using min_p sampling |
|
prompt = """The ECCS is""" |
|
|
|
# Note that no chat template is used for base model |
|
input_ids = tokenizer.encode( |
|
prompt, |
|
return_tensors="pt", |
|
add_special_tokens=False, |
|
).to("cuda") |
|
|
|
output = model.generate( |
|
input_ids=input_ids, |
|
min_p=0.2, |
|
temperature=0.7, |
|
do_sample=True, |
|
max_new_tokens=100, |
|
) |
|
|
|
print(tokenizer.decode(output[0], skip_special_tokens=False)) |
|
|
|
``` |
|
|
|
Output: |
|
``` |
|
The ECCS is designed to cool the reactor core and to provide additional shutdown capability following initiation of the following accident conditions: 1. Loss-of-coolant accident (LOCA) including a pipe break or a spurious relief or safety valve opening in the RCS which would result in a discharge larger than that which could be made up by the normal make-up system. 2. Loss-of-secondary-coolant accident including a pipe |
|
``` |
|
## Training Details |
|
|
|
### Training Data |
|
|
|
All training data for SPARK-mini-base is obtained from publically available sources, but is not being released. |
|
|
|
Specific details on the training data, or sharing the training data will be made available on a case by case basis by contacting Nuclearn at [email protected] |
|
|
|
### Training Procedure |
|
|
|
Training procedure follows best practices for continuous pretraining of base LLMs. |
|
|
|
The model was trained in bf16 using DeepSpeed Zero3 on a multinode, private A100 server cluster. |
|
|
|
## Evaluation |
|
|
|
SPARK-mini-base was evaluated on a set of private benchmarks created specifically for testing specific Nuclear Industry knowledge. |
|
|
|
#### Completions (HellaSWAG for Nuclear) |
|
- Modeled after the HellaSWAG Benchmark |
|
- Various completions of complex Nuclear plant operational scenarios and fact passages. |
|
|
|
#### Multiple Choice QA (MMLU for Nuclear) |
|
- Modeled after the MMLU benchmark |
|
- Multiple Choice question and answer on Nuclear Plant Operations, Systems, Engineering, etc... |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** A100-80GB SXM4 |
|
- **Cloud Provider:** Nuclearn Training Cluster |
|
|
|
### Model Architecture and Objective |
|
|
|
SPARK-mini-base is based on the Phi3 architecture. |
|
|
|
### Compute Infrastructure |
|
|
|
SPARK-mini-base is trained on the Nuclearn Training cluster - an A100-80GB server cluster with 800Gb/s Infiniband connectivity |
|
|
|
## Model Card Authors |
|
|
|
Bradley Fox, Nuclearn Inc |
|
Jerrold Vincent, Nuclearn Inc |
|
Nate Irby, Nuclearn Inc |