File size: 3,556 Bytes

287f8d0
6d85554
287f8d0
90c5c69
 
287f8d0
 
 
 
 
 
 
 
 
 
 
0a551ad
 
 
287f8d0
 
 
 
 
90c5c69
 
 
287f8d0
 
 
90c5c69
 
287f8d0
 
 
 
3698c6f
90c5c69
 
 
287f8d0
3698c6f
287f8d0
 
 
7846975
3698c6f
 
287f8d0
 
 
 
3698c6f
287f8d0
 
4cccd03
a82fc33
4cccd03
 
 
bca7d6c
549bd41
4cccd03
287f8d0
 
 
 
 
 
 
26635dc
287f8d0
 
 
 
 
 
f8d3bf0
287f8d0
 
 
 
 
 
 
 
 
 
a54069a
 
4cccd03
287f8d0
 
2570865
287f8d0
90c5c69

---
license: llama3.1
datasets:
- FreedomIntelligence/medical-o1-reasoning-SFT
- SNUH-HARI/KorMedLawQA
language:
- en
- ko
metrics:
- accuracy
- perplexity
base_model:
- UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B
library_name: transformers
tags:
- medical
- unsloth
- trl
- sft
---

# SNUH-HARI/DeepSeek-llama3.1-HARI-8B

## Model Description
**SNUH-HARI/DeepSeek-llama3.1-HARI-8B** is a fine-tuned version of **DeepSeek-llama3.1-Blossom** with **8 billion parameters**, optimized for 
**healthcare applications**. Developed by **Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH)**, 
this model integrates **medical open dataset (including synthesized data) and pseudonymized clinical notes** to enhance **patient safety** and responsible AI in medicine.

- **Architecture:** Transformer-based large language model (LLM)
- **Languages:** English, Korean
- **Primary Domains:** Healthcare, General NLP
- **Use Cases:** Medical question answering, clinical decision support, patient safety applications

## Training Details
- **Base Model:** DeepSeek-llama3.1
- **Fine-Tuned Datasets:**
  - **SNUH pseudonymized clinical notes** for real-world medical knowledge
  - **MedicalLawQA** (curated from [Korea Legislation Research Institute](https://elaw.klri.re.kr/eng_service/main.do) data using GPT-4o-mini)
  - **Medical reasoning dataset** from [FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT?row=1)

- **Optimization:** Mixed precision (FP16) for efficiency
- **Compute Resources:** High-performance GPUs (e.g., NVIDIA H100 clusters)

## Intended Use
This model is designed for **research, healthcare AI, and legal AI applications**. It is particularly suitable for:
- **Medical question answering**
- **Clinical decision-making support**
- **Healthcare policy and compliance**

## Limitations & Ethical Considerations
- **Not a replacement for medical professionals:** Outputs should be validated by experts.
- **Potential biases:** Legal and medical knowledge are jurisdiction-specific; users should verify regional applicability.
- **Privacy compliance:** No personally identifiable information was used in training.

## Evaluation & Benchmarks

This model was evaluated using 100 medical law-related QA pairs from the KMLE (Korean Medical Licensing Exam) 2019–2023 dataset.

| Model | Accuracy (%) |
|-------------------------------|--------------|
| **DeepSeek-llama3.1-Bllossom-8B** | 34 |
| **DeepSeek-llama3.1-HARI-8B** (ours) | TBD |


## How to Use
You can use the model via **Hugging Face Transformers**:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SNUH-HARI/DeepSeek-llama3.1-HARI-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What are the legal requirements for prescribing narcotics in South Korea?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=1024)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## License
This model is released under the **MIT License**.

## Citation
If you use this model in your research, please cite:

```
@misc{SNUH-HARI-DeepSeek-llama3.1-HARI-8B,
  title={SNUH-HARI/DeepSeek-llama3.1-HARI-8B},
  author={Hyeonhoon Lee ([email protected])},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Seoul National University Hospital (SNUH)-HARI/DeepSeek-llama3.1-HARI-8B}
}
```