---
base_model: meta-llama/Llama-3.3-70B-Instruct
library_name: peft
license: mit
language:
- en
pipeline_tag: text-classification
tags:
- me
---

# Model Card for Model ID

Social determinants of health (SDoHs) are economic, social and personal
circumstances that affect or influence an individual's health status.  This
model inferences as a multilabel classification at the sentence level and
supervised-fined on the Amended dataset from the paper "Integration of Large
Language Models and Traditional Deep Learning for Social Determinants of Health
Prediction" ([arXiv]).

The model is to be used on clinical text to classify zero or more SDoH labels.
Typical users of this model are clinical informatics physicians or biomedical
NLP researchers.


## Model Details

The model was trained the training and validation splits of the combined
MIMIC-III and synthetic datasets provided by [Guevara et AL.].

- **Developed by:** Paul Landes
- **Funded by [optional]:** Center for Health Equity using Machine Learning &
  Artificial Intelligence (CHEMA) postdoctoral funding award.
 - **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model [optional]:** Llama 3.3 70B Instruct


## Usage

The model is used by inferencing with the supervised-fine tuned prompt, then
parsing the output.  This creates a pipeline and uses the LLM to generate
output from the supervised fine tuned model.  It then parses the output into
SDOH labels.

```python
import re
import torch
import transformers


# parse the LLM response
def parse_response(text):
    res_regs = (re.compile(r'(?:.*?`([a-z,` ]{3,}`))', re.DOTALL),
                re.compile(r'.*?[`#-]([a-z, \t\n\r]{3,}?)[`-].*', re.DOTALL))
    matched: str = ''
    for pat in res_regs:
        m: re.Match = pat.match(text)
        if m is not None:
            matched = m.group(1)
            break
    return sorted(set(filter(lambda s: matched.find(s) > -1, labels)))


# the prompt and role used to supervised-fine tune the model
_PROMPT: str = """\
Classify sentences for social determinants of health (SDOH).

Definitions SDOHs are given with labels in back ticks:

* `housing`: The status of a patient’s housing is a critical SDOH, known to affect the outcome of treatment.

* `transportation`: This SDOH pertains to a patient’s inability to get to/from their healthcare visits.

* `relationship`: Whether or not a patient is in a partnered relationship is an abundant SDOH in the clinical notes.

* `parent`: This SDOH should be used for descriptions of a patient being a parent to at least one child who is a minor (under the age of 18 years old).

* `employment`: This SDOH pertains to expressions of a patient’s employment status. A sentence should be annotated as an Employment Status SDOH if it expresses if the patient is employed (a paid job), unemployed, retired, or a current student.

* `support`: This SDOH is a sentence describes a patient that is actively receiving care support, such as emotional, health, financial support.  This support comes from family and friends but not health care professionals.

* `-`: If no SDOH is found.

Classify sentences for social determinants of health (SDOH) as a list labels in three back ticks. The sentence can be a member of multiple classes so output the labels that are mostly likely to be present.

### Sentence: {sent}
### SDOH labels:"""
role = 'You are a social determinants of health (SDOH) classifier.'

# output classes
labels = 'transportation housing relationship employment support parent'.split()

# example sentence
sent = 'Pt is homeless and has no car and has no parents or support'

# create a pipeline for inferencing
pipeline = transformers.pipeline(
    'text-generation',
    model='plandes/sdoh-llama-3-3-70b',
    model_kwargs={'torch_dtype': torch.bfloat16},
    device_map='auto')

# prompt used by the chat template
messages = [
    {'role': 'system', 'content': 'You are a social determinants of health (SDOH) classifier.'},
    {'role': 'user', 'content': _PROMPT.format(sent=sent)}]

# inference the LLM
outputs = pipeline(
    messages,
    max_new_tokens=512,
    eos_token_id=[
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids('<|eot_id|>'),
    ],
    pad_token_id=pipeline.tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.01)

# print the textual LLM output
output = outputs[0]['generated_text'][-1]['content']
print('model response:', output)

# print the parsed labels from the LLM outupt
print('labels:', parse_response(output))
```

## Citation [optional]

**BibTeX:**

[More Information Needed]


<!-- links -->
[Guevara et al.]: https://www.nature.com/articles/s41746-023-00970-0
[arXiv]: https://arxiv.org/pdf/2505.04655