Healthcare Interoperability Domain-Adapted PubMedBERT
This model is a fine-tuned version of pritamdeka/S-PubMedBert-MS-MARCO
that has been domain-adapted for healthcare interoperability concepts and terminology.
Model Details
- Base model: PubMedBERT-MS-MARCO
- Task: Domain adaptation via masked language modeling (MLM)
- Training data: Healthcare interoperability literature corpus
- Focus areas: FHIR, HL7, medical terminologies, health information exchange
Domain-Specific Knowledge
This model has been fine-tuned to better understand healthcare interoperability concepts including:
- Health data exchange standards (FHIR, HL7, DICOM, CCD)
- Terminology systems (SNOMED CT, LOINC, ICD, RxNorm)
- Interoperability frameworks and architectures
- Electronic Health Records (EHR) systems and implementations
Evaluation Results
Our domain-adapted model shows significant improvements in understanding healthcare interoperability terminology as demonstrated in these fill-mask examples:
Prompt | Top Predictions |
---|---|
"FHIR is a standard for health [MASK] exchange." | "information" (91.3%), "data" (5.6%) |
"Health information [MASK] is critical for coordinated care." | "exchange" (58.7%), "sharing" (4.6%) |
"Blockchain technology can improve data [MASK] in healthcare." | "exchange" (21.2%), "quality" (6.0%) |
"Electronic Health Records require [MASK] standards for sharing information." | "common" (10.1%), "international" (8.0%) |
Usage
from transformers import AutoModelForMaskedLM, AutoTokenizer
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adamleeit/pubmedbert-interop-mlm")
model = AutoModelForMaskedLM.from_pretrained("adamleeit/pubmedbert-interop-mlm")
# Example usage
text = "FHIR enables healthcare [MASK] exchange between systems."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
token_id = inputs.input_ids[0].tolist().index(tokenizer.mask_token_id)
probs = outputs.logits[0, token_id].softmax(dim=0)
top_tokens = torch.topk(probs, k=5)
for i, (score, token_id) in enumerate(zip(top_tokens.values, top_tokens.indices)):
print(f"{i+1}. {tokenizer.decode([token_id])}: {score.item():.3f}")
For Embeddings/Sentence Transformers
For generating document embeddings (e.g., for topic modeling), use our sentence-transformer version:
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("adamleeit/pubmedbert-interop-sentence")
# Generate embeddings
embeddings = model.encode(["FHIR enables healthcare information exchange between systems."])
Citation
If you use this model in your research, please cite:
@misc{pubmedbert-interop-mlm,
author = {Lee AM},
title = {Healthcare Interoperability Domain-Adapted PubMedBERT},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/adamleeit/pubmedbert-interop-mlm}}
}
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for adamleeit/pubmedbert-interop-mlm
Base model
pritamdeka/S-PubMedBert-MS-MARCO