image/png

ModernBERT Medical Relevance Classifier

The ModernBERT Medical Relevance Classifier is a transformer-based language model designed to evaluate the scope of medical relevance in biomedical texts. Built on top of the ModernBERT architecture, it predicts a continuous or near-continuous measure of how closely a text pertains to medical/biological content. This model is particularly suitable for identifying documents that are highly relevant to medical topics, aiding in tasks such as corpus filtering, data triaging, or domain-specific retrieval pipelines.

Model Details

  • Developed by: TheBlueScrubs
  • Model Type: Transformer-based language model (for regression/classification)
  • Language: English
  • License: Apache-2.0
  • Base Model: answerdotai/ModernBERT-base

ModernBERT adopts recent innovations such as Rotary Positional Embeddings, local–global alternating attention, and Flash Attention, which enable both extended context windows (up to 8,192 tokens) and more efficient inference.

Intended Uses & Limitations

Intended Uses

  • Biomedical Document Filtering: Identifying which texts are more relevant to medical or biological research.
  • Data Preprocessing: Screening large corpora to retain only highly relevant medical content for subsequent tasks (e.g., entity extraction, summarization).

Limitations

  • Domain Shift: Trained primarily on biomedical texts, particularly those related to cancer and general medical literature. Relevance scores for out-of-domain texts (e.g., chemistry or physics) may be inaccurate.
  • Score Interpretation: The raw output can be a continuous score that may need thresholding or binarization based on your specific application.

How to Use

Use the Hugging Face Transformers library to load and run this model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("TheBlueScrubs/ModernBERT-base-TBS-MedicalRelevance")
model = AutoModelForSequenceClassification.from_pretrained("TheBlueScrubs/ModernBERT-base-TBS-MedicalRelevance")

# Example text
text = "This study discusses the efficacy of a new monoclonal antibody for metastatic breast cancer."

# Tokenize input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=4096)

# Get model predictions
outputs = model(**inputs)
predictions = outputs.logits

# Interpret predictions (e.g., a continuous or near-continuous score)
relevance_score = predictions.item()
print(f"Relevance Score: {relevance_score}")

Training Data

A balanced subset of The Blue Scrubs dataset was created to ensure coverage across different relevance levels. Each text entry is paired with a “Scope of Medical Relevance” score, which served as the regression target. The data preparation steps included:

  • Scanning a large corpus of medical documents for valid rows (removing parse/NaN/out-of-range entries).
  • Retaining rows with relevance scores spanning 1 (least relevant) to 5 (most relevant).
  • Randomly sampling to balance coverage across low- and high-relevance texts.

Training Procedure

Preprocessing

  • Tokenizer: ModernBERT tokenizer, max sequence length = 4,096.
  • No Additional Filtering: Data was considered reliable following the basic cleaning steps.

Training Hyperparameters

  • Learning Rate: 2e-5
  • Number of Epochs: 3
  • Batch Size: 16 (per device)
  • Gradient Accumulation Steps: 1
  • Optimizer: AdamW
  • Weight Decay: 0.01
  • FP16 Training: Enabled
  • Total Training Steps: ~3 epochs over the balanced set

The above settings reflect a typical fine-tuning approach with the Hugging Face Trainer API. We utilized multiple GPUs in a distributed data-parallel configuration, adjusting for HPC constraints.

Evaluation

Testing Data

The final model was evaluated on an out-of-sample test set containing documents not seen during training or validation. This test set covers a variety of biomedical topics to ensure generalization.

Metrics

  • Accuracy (where applicable, after binarizing or thresholding scores)
  • R-Squared (r²): Evaluates how well the predictions track the true variability in relevance
  • Mean Squared Error (MSE): Quantifies the average squared difference between predicted and true relevance scores

Results

  • MSE: ~0.373 on the test set
  • Accuracy: 0.9573

These results suggest that the model reliably assigns a relevance score consistent with the ground-truth annotations.

image/png

Bias, Risks, and Limitations

  • Data Composition: Certain subdomains may be underrepresented; the model may be less accurate for rare specialties.
  • Overinterpretation: A single numeric score does not ensure clinically rigorous validation. Always verify with domain experts.
  • Shifting Standards: Medical fields evolve quickly, so re-training or updating data may be necessary to maintain relevance accuracy.

Recommendations

  • Domain-Specific Check: If you specialize in a particular area (e.g., pediatrics), consider additional fine-tuning or custom calibration.
  • Thresholding Strategy: For binary classification (e.g., “Relevant” vs. “Not relevant”), select an optimal cutoff based on your dataset and tolerance for false positives/negatives.
  • Continuous Monitoring: Periodically evaluate new new data to ensure the model remains valid as medical literature grows.

Citation

If you utilize this model in your research or applications, please cite it as follows:

@misc{thebluescrubs2025modernbert,
  author = {TheBlueScrubs},
  title = {ModernBERT Medical Relevance Classifier},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/TheBlueScrubs/ModernBERT-base-TBS-MedicalRelevance}
}

Model Card Authors

  • TheBlueScrubs Team

Downloads last month
21
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheBlueScrubs/ModernBERT-base-TBS-MedicalRelevance

Finetuned
(523)
this model

Dataset used to train TheBlueScrubs/ModernBERT-base-TBS-MedicalRelevance