Much cleaner API + FR paper + ack

Browse files

Files changed (1) hide show

README.md +19 -23

README.md CHANGED Viewed

@@ -38,11 +38,11 @@ Clinical Mosaic is a transformer-based language model built on the Mosaic BERT a
 ## Model Details
 - **Developed by:** Sifal Klioui, Sana Sellami, and Youssef Trardi (Aix-Marseille Univ, LIS, CNRS, Marseille, France)
-- **Funded by:** PICOMALE project (AMIDEX)
 - **Base Model:** Mosaic BERT
 - **License:** MIMIC Data Use Agreement (requires compliance with original DUA)
 - **Repository:** [PatientTrajectoryForecasting](https://github.com/MostHumble/PatientTrajectoryForecasting)
-- **Paper:** *Patient Trajectory Prediction: Integrating Clinical Notes with Transformers* ([PDF](insert-link))
 ## Uses
@@ -78,18 +78,11 @@ Install the Hugging Face Transformers library and load the model as follows:
 ### For embeddings generation:
 ```python
-from transformers import AutoModel, BertTokenizer, BertConfig
-tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # MosaicBERT uses the standard BERT tokenizer
-config = BertConfig.from_pretrained('Sifal/ClinicalMosaic') # the config needs to be passed in
-ClincalMosaic = AutoModel.from_pretrained(
-    'Sifal/ClinicalMosaic',
-    config=config,
-    torch_dtype='auto',
-    trust_remote_code=True,
-    device_map="auto"
-)
 # Example usage
 clinical_text = "..."
@@ -101,18 +94,12 @@ last_layer_embeddings = ClincalMosaic(**inputs, output_all_encoded_layers=False)
 ### For sequence classification:
 ```python
-from transformers import AutoModelForSequenceClassification, BertTokenizer, BertConfig
-tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # ClincalMosaic uses the standard BERT tokenizer
-config = BertConfig.from_pretrained('Sifal/ClinicalMosaic') # the config needs to be passed in
-# Set the hidden size and number of labels:
-config.num_labels = 4
-config.hidden_size = 768
 ClassifierClincalMosaic = AutoModelForSequenceClassification.from_pretrained(
     'Sifal/ClinicalMosaic',
-    config=config,
     torch_dtype='auto',
     trust_remote_code=True,
     device_map="auto"
@@ -185,13 +172,22 @@ The model demonstrates robust performance on clinical natural language inference
 ## Acknowledgments
-We would like to thank the **LIS laboratory** for providing the GPU resources necessary for pretraining and conducting extensive experiments. Additionally, we acknowledge **CEDRE** for supporting early-stage experiments and hosting part of the computational infrastructure.
 ## Citation
 **BibTeX:**
-To be added
 ## More Information

 ## Model Details
 - **Developed by:** Sifal Klioui, Sana Sellami, and Youssef Trardi (Aix-Marseille Univ, LIS, CNRS, Marseille, France)
+- **Funded by:** PICOMALE project (AMIDEX) Under the direction of the CEDRE
 - **Base Model:** Mosaic BERT
 - **License:** MIMIC Data Use Agreement (requires compliance with original DUA)
 - **Repository:** [PatientTrajectoryForecasting](https://github.com/MostHumble/PatientTrajectoryForecasting)
+- **Paper:** *Patient Trajectory Prediction: Integrating Clinical Notes with Transformers* [[FR](https://editions-rnti.fr/?inprocid=1002990),[EN: to be added]()]
 ## Uses
 ### For embeddings generation:
 ```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("Sifal/ClinicalMosaic", trust_remote_code=True)
+ClincalMosaic = AutoModel.from_pretrained("Sifal/ClinicalMosaic", trust_remote_code=True)
 # Example usage
 clinical_text = "..."
 ### For sequence classification:
 ```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained('Sifal/ClinicalMosaic')
 ClassifierClincalMosaic = AutoModelForSequenceClassification.from_pretrained(
     'Sifal/ClinicalMosaic',
     torch_dtype='auto',
     trust_remote_code=True,
     device_map="auto"
 ## Acknowledgments
+We would like to thank **LIS** | Laboratoire d'Informatique et Systèmes, Aix-Marseille University for providing the GPU resources necessary for pretraining and conducting extensive experiments. Additionally, we acknowledge **CEDRE** | CEntre de formation et de soutien aux Données de la REcherche, Programme 2 du projet France 2030 IDeAL for supporting early-stage experiments and hosting part of the computational infrastructure.
 ## Citation
 **BibTeX:**
+```bibtex
+@article{RNTI/papers/1002990,
+  author    = {Sifal Klioui and Sana Sellami and Youssef Trardi},
+  title     = {Prédiction de la trajectoire du patient : Intégration des notes cliniques aux transformers},
+  journal = {Revue des Nouvelles Technologies de l'Information},
+  volume = {Extraction et Gestion des Connaissances, RNTI-E-41},
+  year      = {2025},
+  pages     = {135-146}
+}
+```
 ## More Information