metadata

license: mit
datasets:
  - SNUH-HARI/MedicalLawQA
language:
  - en
  - ko
metrics:
  - accuracy
  - perplexity
base_model:
  - UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B
library_name: transformers
tags:
  - medical
  - unsloth
  - trl
  - sft

SNUH-HARI/DeepSeek-llama3.1-HARI-8B

Model Description

SNUH-HARI/DeepSeek-llama3.1-HARI-8B is a fine-tuned version of DeepSeek-llama3.1-Blossom with 8 billion parameters, optimized for healthcare, legal, and multilingual applications. Developed by Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH), this model integrates medical law and pseudonymized clinical data to enhance patient safety and responsible AI in medicine.

Architecture: Transformer-based large language model (LLM)
Languages: English, Korean
Primary Domains: Healthcare, Legal, and General NLP
Use Cases: Medical and legal question answering, clinical decision support, patient safety applications

Training Details

Base Model: DeepSeek-llama3.1
Fine-Tuned Datasets:
- MedicalLawQA (curated from Korea Legislation Research Institute data using GPT-4o-mini)
- SNUH pseudonymized clinical notes for real-world medical knowledge
Optimization: Mixed precision (FP16) for efficiency
Compute Resources: High-performance GPUs (e.g., NVIDIA H100 clusters)

Intended Use

This model is designed for research, healthcare AI, and legal AI applications. It is particularly suitable for:

Medical and legal question answering
Clinical decision-making support
Healthcare policy and compliance

Limitations & Ethical Considerations

Not a replacement for medical professionals: Outputs should be validated by experts.
Potential biases: Legal and medical knowledge are jurisdiction-specific; users should verify regional applicability.
Privacy compliance: No personally identifiable information was used in training.

Evaluation & Benchmarks

Perplexity Score: TBD
Accuracy on legal-medical QA tasks: TBD
Comparison with baseline models (e.g., Med-PaLM, GPT-4-turbo): TBD

How to Use

You can use the model via Hugging Face Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SNUH-HARI/DeepSeek-llama3.1-HARI-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What are the legal requirements for prescribing narcotics in South Korea?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=1024)
print(tokenizer.decode(output[0], skip_special_tokens=True))

License

This model is released under the MIT License.

Citation

If you use this model in your research, please cite:

@misc{SNUH-HARI-DeepSeek-llama3.1-HARI-8B,
  title={SNUH-HARI/DeepSeek-llama3.1-HARI-8B},
  author={Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH)},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Seoul National University Hospital (SNUH)-HARI/DeepSeek-llama3.1-HARI-8B}
}