Sifal commited on
Commit
a7e544d
·
verified ·
1 Parent(s): 41587e8

Much cleaner API + FR paper + ack

Browse files
Files changed (1) hide show
  1. README.md +19 -23
README.md CHANGED
@@ -38,11 +38,11 @@ Clinical Mosaic is a transformer-based language model built on the Mosaic BERT a
38
 
39
  ## Model Details
40
  - **Developed by:** Sifal Klioui, Sana Sellami, and Youssef Trardi (Aix-Marseille Univ, LIS, CNRS, Marseille, France)
41
- - **Funded by:** PICOMALE project (AMIDEX)
42
  - **Base Model:** Mosaic BERT
43
  - **License:** MIMIC Data Use Agreement (requires compliance with original DUA)
44
  - **Repository:** [PatientTrajectoryForecasting](https://github.com/MostHumble/PatientTrajectoryForecasting)
45
- - **Paper:** *Patient Trajectory Prediction: Integrating Clinical Notes with Transformers* ([PDF](insert-link))
46
 
47
  ## Uses
48
 
@@ -78,18 +78,11 @@ Install the Hugging Face Transformers library and load the model as follows:
78
  ### For embeddings generation:
79
 
80
  ```python
81
- from transformers import AutoModel, BertTokenizer, BertConfig
 
82
 
83
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # MosaicBERT uses the standard BERT tokenizer
84
- config = BertConfig.from_pretrained('Sifal/ClinicalMosaic') # the config needs to be passed in
85
-
86
- ClincalMosaic = AutoModel.from_pretrained(
87
- 'Sifal/ClinicalMosaic',
88
- config=config,
89
- torch_dtype='auto',
90
- trust_remote_code=True,
91
- device_map="auto"
92
- )
93
 
94
  # Example usage
95
  clinical_text = "..."
@@ -101,18 +94,12 @@ last_layer_embeddings = ClincalMosaic(**inputs, output_all_encoded_layers=False)
101
  ### For sequence classification:
102
 
103
  ```python
104
- from transformers import AutoModelForSequenceClassification, BertTokenizer, BertConfig
105
 
106
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # ClincalMosaic uses the standard BERT tokenizer
107
- config = BertConfig.from_pretrained('Sifal/ClinicalMosaic') # the config needs to be passed in
108
-
109
- # Set the hidden size and number of labels:
110
- config.num_labels = 4
111
- config.hidden_size = 768
112
 
113
  ClassifierClincalMosaic = AutoModelForSequenceClassification.from_pretrained(
114
  'Sifal/ClinicalMosaic',
115
- config=config,
116
  torch_dtype='auto',
117
  trust_remote_code=True,
118
  device_map="auto"
@@ -185,13 +172,22 @@ The model demonstrates robust performance on clinical natural language inference
185
 
186
  ## Acknowledgments
187
 
188
- We would like to thank the **LIS laboratory** for providing the GPU resources necessary for pretraining and conducting extensive experiments. Additionally, we acknowledge **CEDRE** for supporting early-stage experiments and hosting part of the computational infrastructure.
189
 
190
  ## Citation
191
 
192
  **BibTeX:**
193
 
194
- To be added
 
 
 
 
 
 
 
 
 
195
 
196
  ## More Information
197
 
 
38
 
39
  ## Model Details
40
  - **Developed by:** Sifal Klioui, Sana Sellami, and Youssef Trardi (Aix-Marseille Univ, LIS, CNRS, Marseille, France)
41
+ - **Funded by:** PICOMALE project (AMIDEX) Under the direction of the CEDRE
42
  - **Base Model:** Mosaic BERT
43
  - **License:** MIMIC Data Use Agreement (requires compliance with original DUA)
44
  - **Repository:** [PatientTrajectoryForecasting](https://github.com/MostHumble/PatientTrajectoryForecasting)
45
+ - **Paper:** *Patient Trajectory Prediction: Integrating Clinical Notes with Transformers* [[FR](https://editions-rnti.fr/?inprocid=1002990),[EN: to be added]()]
46
 
47
  ## Uses
48
 
 
78
  ### For embeddings generation:
79
 
80
  ```python
81
+ # Load model directly
82
+ from transformers import AutoTokenizer, AutoModel
83
 
84
+ tokenizer = AutoTokenizer.from_pretrained("Sifal/ClinicalMosaic", trust_remote_code=True)
85
+ ClincalMosaic = AutoModel.from_pretrained("Sifal/ClinicalMosaic", trust_remote_code=True)
 
 
 
 
 
 
 
 
86
 
87
  # Example usage
88
  clinical_text = "..."
 
94
  ### For sequence classification:
95
 
96
  ```python
97
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
98
 
99
+ tokenizer = AutoTokenizer.from_pretrained('Sifal/ClinicalMosaic')
 
 
 
 
 
100
 
101
  ClassifierClincalMosaic = AutoModelForSequenceClassification.from_pretrained(
102
  'Sifal/ClinicalMosaic',
 
103
  torch_dtype='auto',
104
  trust_remote_code=True,
105
  device_map="auto"
 
172
 
173
  ## Acknowledgments
174
 
175
+ We would like to thank **LIS** | Laboratoire d'Informatique et Systèmes, Aix-Marseille University for providing the GPU resources necessary for pretraining and conducting extensive experiments. Additionally, we acknowledge **CEDRE** | CEntre de formation et de soutien aux Données de la REcherche, Programme 2 du projet France 2030 IDeAL for supporting early-stage experiments and hosting part of the computational infrastructure.
176
 
177
  ## Citation
178
 
179
  **BibTeX:**
180
 
181
+ ```bibtex
182
+ @article{RNTI/papers/1002990,
183
+ author = {Sifal Klioui and Sana Sellami and Youssef Trardi},
184
+ title = {Prédiction de la trajectoire du patient : Intégration des notes cliniques aux transformers},
185
+ journal = {Revue des Nouvelles Technologies de l'Information},
186
+ volume = {Extraction et Gestion des Connaissances, RNTI-E-41},
187
+ year = {2025},
188
+ pages = {135-146}
189
+ }
190
+ ```
191
 
192
  ## More Information
193