metadata
datasets:
- samirmsallem/wiki_def_de_multitask
language:
- de
base_model:
- distilbert/distilbert-base-multilingual-cased
library_name: transformers
tags:
- science
- ner
- def_extraction
- definitions
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: checkpoints
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: samirmsallem/wiki_def_de_multitask
type: samirmsallem/wiki_def_de_multitask
metrics:
- name: F1
type: f1
value: 0.812455003599712
- name: Precision
type: precision
value: 0.8076097328244275
- name: Recall
type: recall
value: 0.8173587638821825
- name: Loss
type: loss
value: 0.329479843378067
NER model for definition component recognition in German scientific texts
distilbert-base-multilingual-cased-definitions_ner is a NER model (token classification) in the scientific domain in German, finetuned from the model distilbert-base-multilingual-cased. It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german.
The model is specifically designed to recognize and classify components of definitions, using the following entity labels:
- DF: Definiendum (the term being defined)
- VF: Definitor (the verb or phrase introducing the definition)
- GF: Definiens (the explanation or meaning)
Training was conducted using a standard NER objective. The model achieves an F1 score of approximately 81% on the evaluation set.
Here are the overall final metrics on the test dataset after 5 epochs of training:
- f1: 0.812455003599712
- precision: 0.8076097328244275
- recall: 0.8173587638821825
- loss: 0.329479843378067
Model Performance Comparision on wiki_definitions_de_multitask:
Model | Precision | Recall | F1 Score | Eval Samples per Second | Epoch |
---|---|---|---|---|---|
distilbert-base-multilingual-cased-definitions_ner | 80.76 | 81.74 | 81.25 | 457.53 | 5.0 |
scibert_scivocab_cased-definitions_ner | 80.54 | 82.11 | 81.32 | 236.61 | 4.0 |
GottBERT_base_best-definitions_ner | 82.98 | 82.81 | 82.90 | 272.26 | 5.0 |
xlm-roberta-base-definitions_ner | 81.90 | 83.35 | 82.62 | 241.21 | 5.0 |
gbert-base-definitions_ner | 82.73 | 83.56 | 83.14 | 278.87 | 5.0 |