--- base_model: meta-llama/Llama-3.3-70B-Instruct library_name: peft license: mit language: - en pipeline_tag: text-classification tags: - me --- # Model Card for Model ID Social determinants of health (SDoHs) are economic, social and personal circumstances that affect or influence an individual's health status. This model inferences as a multilabel classification at the sentence level and supervised-fined on the Amended dataset from the paper "Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction" ([arXiv]). The model is to be used on clinical text to classify zero or more SDoH labels. Typical users of this model are clinical informatics physicians or biomedical NLP researchers. ## Model Details The model was trained the training and validation splits of the combined MIMIC-III and synthetic datasets provided by [Guevara et AL.]. - **Developed by:** Paul Landes - **Funded by [optional]:** Center for Health Equity using Machine Learning & Artificial Intelligence (CHEMA) postdoctoral funding award. - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model [optional]:** Llama 3.3 70B Instruct ## Usage The model is used by inferencing with the supervised-fine tuned prompt, then parsing the output. This creates a pipeline and uses the LLM to generate output from the supervised fine tuned model. It then parses the output into SDOH labels. ```python import re import torch import transformers # parse the LLM response def parse_response(text): res_regs = (re.compile(r'(?:.*?`([a-z,` ]{3,}`))', re.DOTALL), re.compile(r'.*?[`#-]([a-z, \t\n\r]{3,}?)[`-].*', re.DOTALL)) matched: str = '' for pat in res_regs: m: re.Match = pat.match(text) if m is not None: matched = m.group(1) break return sorted(set(filter(lambda s: matched.find(s) > -1, labels))) # the prompt and role used to supervised-fine tune the model _PROMPT: str = """\ Classify sentences for social determinants of health (SDOH). Definitions SDOHs are given with labels in back ticks: * `housing`: The status of a patient’s housing is a critical SDOH, known to affect the outcome of treatment. * `transportation`: This SDOH pertains to a patient’s inability to get to/from their healthcare visits. * `relationship`: Whether or not a patient is in a partnered relationship is an abundant SDOH in the clinical notes. * `parent`: This SDOH should be used for descriptions of a patient being a parent to at least one child who is a minor (under the age of 18 years old). * `employment`: This SDOH pertains to expressions of a patient’s employment status. A sentence should be annotated as an Employment Status SDOH if it expresses if the patient is employed (a paid job), unemployed, retired, or a current student. * `support`: This SDOH is a sentence describes a patient that is actively receiving care support, such as emotional, health, financial support. This support comes from family and friends but not health care professionals. * `-`: If no SDOH is found. Classify sentences for social determinants of health (SDOH) as a list labels in three back ticks. The sentence can be a member of multiple classes so output the labels that are mostly likely to be present. ### Sentence: {sent} ### SDOH labels:""" role = 'You are a social determinants of health (SDOH) classifier.' # output classes labels = 'transportation housing relationship employment support parent'.split() # example sentence sent = 'Pt is homeless and has no car and has no parents or support' # create a pipeline for inferencing pipeline = transformers.pipeline( 'text-generation', model='plandes/sdoh-llama-3-3-70b', model_kwargs={'torch_dtype': torch.bfloat16}, device_map='auto') # prompt used by the chat template messages = [ {'role': 'system', 'content': 'You are a social determinants of health (SDOH) classifier.'}, {'role': 'user', 'content': _PROMPT.format(sent=sent)}] # inference the LLM outputs = pipeline( messages, max_new_tokens=512, eos_token_id=[ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids('<|eot_id|>'), ], pad_token_id=pipeline.tokenizer.eos_token_id, do_sample=True, temperature=0.01) # print the textual LLM output output = outputs[0]['generated_text'][-1]['content'] print('model response:', output) # print the parsed labels from the LLM outupt print('labels:', parse_response(output)) ``` ## Citation [optional] **BibTeX:** [More Information Needed] [Guevara et al.]: https://www.nature.com/articles/s41746-023-00970-0 [arXiv]: https://arxiv.org/pdf/2505.04655