Unified_Clinical_PII_NER

Overview

Unified_Clinical_PII_NER is a fine-tuned version of Bio_ClinicalBERT that has been adapted to perform Named Entity Recognition (NER) for both clinical and personally identifiable information (PII) extraction. The model is designed for use in healthcare NLP tasks, enabling the extraction of clinical entities (e.g., diseases, medications) as well as key PII attributes (e.g., first name, last name, DOB, email, etc.) from clinical texts.

Model Details

Base Model: Bio_ClinicalBERT (emilyalsentzer/Bio_ClinicalBERT)
Task: Token Classification (NER) using the BIO scheme
Entities Extracted: Clinical entities and PII, such as:
- FIRSTNAME, MIDDLENAME, LASTNAME, DOB, EMAIL, PHONENUMBER, GENDER, JOBTITLE, JOBTYPE, AGE, CITY, STATE, STREET, SECONDARYADDRESS, etc.
Evaluation Metrics:
- Evaluation Loss: 0.1623
- Precision: 83.84%
- Recall: 85.47%
- F1 Score: 84.64%
- Token-level Accuracy: 94.79%

Usage

To load the model and tokenizer using the Hugging Face Transformers library, use the following code:

from transformers import AutoTokenizer, AutoModelForTokenClassification

model_name = "ku1ithdev/unified-clinical-pii-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

ku1ithdev
/

unified-clinical-pii-ner

Unified_Clinical_PII_NER

Overview

Model Details

Usage

Dataset used to train ku1ithdev/unified-clinical-pii-ner