File size: 9,920 Bytes
c870e03 c1f7a75 c870e03 c1f7a75 c870e03 c1f7a75 c870e03 c1f7a75 1aed05b 38cba5f 1aed05b c1f7a75 d53615f 98eddbd d53615f c1f7a75 3fbea7e c1f7a75 d53615f 8e90262 d53615f 8e90262 d53615f c1f7a75 8e90262 d53615f 8e90262 c1f7a75 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
---
library_name: transformers
license: mit
base_model: almanach/moderncamembert-cv2-base
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: Moderncamembert-4entities
results: []
datasets:
- CATIE-AQ/frenchNER_4entities
language:
- fr
widget:
- text: "Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques."
pipeline_tag: token-classification
co2_eq_emissions: 22
---
# Moderncamembert-4entities
## Model Description
We present **Moderncamembert-4entities**, which is a [Moderncamembert-cv2-base](https://huggingface.co/almanach/moderncamembert-cv2-base) fine-tuned for the Name Entity Recognition task for the French language on four French NER datasets for 4 entities (LOC, PER, ORG, MISC).
All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities).
There are a total of **384,773** rows, of which **328,757** are for training, **24,131** for validation and **31,885** for testing.
## Evaluation results
The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
### frenchNER_4entities
For space reasons, we show only the F1 of the different models. You can see the full results below the table.
<table>
<thead>
<tr>
<th><br>Model</th>
<th><br>Parameters</th>
<th><br>Context</th>
<th><br>PER</th>
<th><br>LOC</th>
<th><br>ORG</th>
<th><br>MISC</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
<td><br>110M</td>
<td><br>512 tokens</td>
<td><br>0.971</td>
<td><br>0.947</td>
<td><br>0.902</td>
<td><br>0.663</td>
</tr>
<tr>
<td rowspan="1"><br><a href="https://hf.co/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
<td><br>67.5M</td>
<td><br>512 tokens</td>
<td><br>0.974</td>
<td><br>0.948</td>
<td><br>0.892</td>
<td><br>0.658</td>
</tr>
<tr>
<td rowspan="1"><br><a href="https://hf.co/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
<td><br>110M</td>
<td><br>512 tokens</td>
<td><br>0.978</td>
<td><br>0.958</td>
<td><br>0.903</td>
<td><br>0.814</td>
</tr>
<tr>
<td rowspan="1"><br><a href="https://hf.co/CATIE-AQ/NERmembert2-4entities">NERmembert2-4entities</a></td>
<td><br>111M</td>
<td><br>1024 tokens</td>
<td><br>0.978</td>
<td><br>0.958</td>
<td><br>0.901</td>
<td><br>0.806</td>
</tr>
<tr>
<td rowspan="1"><br><a href="https://huggingface.co/CATIE-AQ/NERmemberta-4entities">NERmemberta-4entities</a></td>
<td><br>111M</td>
<td><br>1024 tokens</td>
<td><br>0.979</td>
<td><br>0.961</td>
<td><br>0.915</td>
<td><br>0.812</td>
</tr>
<tr>
<td rowspan="1"><br>Moderncamembert-4entities (this model)</td>
<td><br>136M</td>
<td><br>8192 tokens</td>
<td><br>0.981</td>
<td><br>0.960</td>
<td><br>0.913</td>
<td><br>0.811</td>
</tr>
<tr>
<td rowspan="1"><br><a href="https://hf.co/CATIE-AQ/NERmembert-large-4entities">NERmembert-large-4entities</a></td>
<td><br>336M</td>
<td><br>512 tokens</td>
<td><br><b>0.982</b></td>
<td><br><b>0.964</b></td>
<td><br><b>0.919</b></td>
<td><br><b>0.834</b></td>
</tr>
</tbody>
</table>
<details>
<summary>Full results</summary>
<code>
{'LOC': {'precision': 0.9565485362095532,<br>
'recall': 0.9639751552795031,<br>
'f1': 0.9602474864655839,<br>
'number': 54740},<br>
'MISC': {'precision': 0.8599987367357251,<br>
'recall': 0.7680873268834796,<br>
'f1': 0.8114486642728371,<br>
'number': 35453},<br>
'O': {'precision': 0.9908647492910065,<br>
'recall': 0.9941133167897094,<br>
'f1': 0.9924863747765278,<br>
'number': 805547},<br>
'ORG': {'precision': 0.9089921444091593,<br>
'recall': 0.9175031632222691,<br>
'f1': 0.913227824188741,<br>
'number': 11855},<br>
'PER': {'precision': 0.97616260010303,<br>
'recall': 0.9855785143505603,<br>
'f1': 0.9808479600959955,<br>
'number': 63447},<br>
'overall_precision': 0.9826691327460604,<br>
'overall_recall': 0.9826691327460604,<br>
'overall_f1': 0.9826691327460604,<br>
'overall_accuracy': 0.9826691327460604}
</code>
</details>
## Usage
```python
from transformers import pipeline
ner = pipeline('token-classification', model='CATIE-AQ/Moderncamembert_4entities', tokenizer='CATIE-AQ/Moderncamembert_4entities', aggregation_strategy="simple")
result = ner(
"Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques."
)
print(result)
```
## Environmental Impact
*Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.*
- **Hardware Type:** A100 PCIe 40/80GB
- **Hours used:** 2h48min
- **Cloud Provider:** Private Infrastructure
- **Carbon Efficiency (kg/kWh):** 0.032 (estimated from [electricitymaps](https://app.electricitymaps.com/zone/FR) for the day of April 15, 2025.)
- **Carbon Emitted** *(Power consumption x Time x Carbon produced based on location of power grid)*: 0.022 kg eq. CO2
## Citations
### Moderncamembert-4entities
```
@misc {Moderncamembert2025,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { Moderncamembert-4entities},
year = 2025,
url = { https://huggingface.co/CATIE-AQ/Moderncamembert-4entities },
doi = { 10.57967/hf/5202 },
publisher = { Hugging Face }
}
```
### Moderncamembert-cv2-base
```
@misc{antoun2025modernbertdebertav3examiningarchitecture,
title={ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance},
author={Wissam Antoun and Benoît Sagot and Djamé Seddah},
year={2025},
eprint={2504.08716},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.08716},
}
```
### NERmemBERTa-4entities
```
@misc {NERmemberta2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmemberta-4entities},
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmemberta-4entities },
doi = { 10.57967/hf/3640 },
publisher = { Hugging Face }
}
```
### CamemBERT 2.0
```
@misc{antoun2024camembert20smarterfrench,
title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
year={2024},
eprint={2411.08868},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.08868},
}
```
### NERmemBERT
```
@misc {NERmembert2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { NERmembert-base-3entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/NERmembert-base-4entities },
doi = { 10.57967/hf/1752 },
publisher = { Hugging Face }
}
```
### CamemBERT
```
@inproceedings{martin2020camembert,
title={CamemBERT: a Tasty French Language Model},
author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}}
```
### frenchNER_4entities
```
@misc {frenchNER2024,
author = { {BOURDOIS, Loïck} },
organization = { {Centre Aquitain des Technologies de l'Information et Electroniques} },
title = { frenchNER_4entities },
year = 2024,
url = { https://huggingface.co/CATIE-AQ/frenchNER_4entities },
doi = { 10.57967/hf/1751 },
publisher = { Hugging Face }
}
```
## License
MIT |