File size: 4,745 Bytes
d1c6c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14728f0
 
 
 
d1c6c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99b7f1a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
base_model: meta-llama/Llama-3.3-70B-Instruct
library_name: peft
license: mit
language:
- en
pipeline_tag: text-classification
tags:
- me
---

# Model Card for Model ID

Social determinants of health (SDoHs) are economic, social and personal
circumstances that affect or influence an individual's health status.  This
model inferences as a multilabel classification at the sentence level and
supervised-fined on the Amended dataset from the paper "Integration of Large
Language Models and Traditional Deep Learning for Social Determinants of Health
Prediction" ([arXiv]).

The model is to be used on clinical text to classify zero or more SDoH labels.
Typical users of this model are clinical informatics physicians or biomedical
NLP researchers.


## Model Details

The model was trained the training and validation splits of the combined
MIMIC-III and synthetic datasets provided by [Guevara et AL.].

- **Developed by:** Paul Landes
- **Funded by [optional]:** Center for Health Equity using Machine Learning &
  Artificial Intelligence (CHEMA) postdoctoral funding award.
 - **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model [optional]:** Llama 3.3 70B Instruct


## Usage

The model is used by inferencing with the supervised-fine tuned prompt, then
parsing the output.  This creates a pipeline and uses the LLM to generate
output from the supervised fine tuned model.  It then parses the output into
SDOH labels.

```python
import re
import torch
import transformers


# parse the LLM response
def parse_response(text):
    res_regs = (re.compile(r'(?:.*?`([a-z,` ]{3,}`))', re.DOTALL),
                re.compile(r'.*?[`#-]([a-z, \t\n\r]{3,}?)[`-].*', re.DOTALL))
    matched: str = ''
    for pat in res_regs:
        m: re.Match = pat.match(text)
        if m is not None:
            matched = m.group(1)
            break
    return sorted(set(filter(lambda s: matched.find(s) > -1, labels)))


# the prompt and role used to supervised-fine tune the model
_PROMPT: str = """\
Classify sentences for social determinants of health (SDOH).

Definitions SDOHs are given with labels in back ticks:

* `housing`: The status of a patient’s housing is a critical SDOH, known to affect the outcome of treatment.

* `transportation`: This SDOH pertains to a patient’s inability to get to/from their healthcare visits.

* `relationship`: Whether or not a patient is in a partnered relationship is an abundant SDOH in the clinical notes.

* `parent`: This SDOH should be used for descriptions of a patient being a parent to at least one child who is a minor (under the age of 18 years old).

* `employment`: This SDOH pertains to expressions of a patient’s employment status. A sentence should be annotated as an Employment Status SDOH if it expresses if the patient is employed (a paid job), unemployed, retired, or a current student.

* `support`: This SDOH is a sentence describes a patient that is actively receiving care support, such as emotional, health, financial support.  This support comes from family and friends but not health care professionals.

* `-`: If no SDOH is found.

Classify sentences for social determinants of health (SDOH) as a list labels in three back ticks. The sentence can be a member of multiple classes so output the labels that are mostly likely to be present.

### Sentence: {sent}
### SDOH labels:"""
role = 'You are a social determinants of health (SDOH) classifier.'

# output classes
labels = 'transportation housing relationship employment support parent'.split()

# example sentence
sent = 'Pt is homeless and has no car and has no parents or support'

# create a pipeline for inferencing
pipeline = transformers.pipeline(
    'text-generation',
    model='plandes/sdoh-llama-3-3-70b',
    model_kwargs={'torch_dtype': torch.bfloat16},
    device_map='auto')

# prompt used by the chat template
messages = [
    {'role': 'system', 'content': 'You are a social determinants of health (SDOH) classifier.'},
    {'role': 'user', 'content': _PROMPT.format(sent=sent)}]

# inference the LLM
outputs = pipeline(
    messages,
    max_new_tokens=512,
    eos_token_id=[
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids('<|eot_id|>'),
    ],
    pad_token_id=pipeline.tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.01)

# print the textual LLM output
output = outputs[0]['generated_text'][-1]['content']
print('model response:', output)

# print the parsed labels from the LLM outupt
print('labels:', parse_response(output))
```

## Citation [optional]

**BibTeX:**

[More Information Needed]


<!-- links -->
[Guevara et al.]: https://www.nature.com/articles/s41746-023-00970-0
[arXiv]: https://arxiv.org/pdf/2505.04655