superbean
/

distilbert-base-cased-finer-finetuned

Token Classification

Inference Endpoints

Model card Files Files and versions Community

superbean commited on 4 days ago

Commit

a6faddb

·

verified ·

1 Parent(s): 3141de7

Updates details in subset selection

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -43,11 +43,12 @@ The following steps have been taken for getting the subset:
     - DebtInstrumentBasisSpreadOnVariableRate1
     - DebtInstrumentFaceAmount
     Any other entity from the original dataset will be considered as "O".
-2. Any record in the dataset with more than 200 tokens(words) are removed. (What is left is already covering majority of the cases.)
-3. Any record without any entity in it is removed.
 All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.

     - DebtInstrumentBasisSpreadOnVariableRate1
     - DebtInstrumentFaceAmount
+    These 4 entities above are picked up because they are the most common ones from the original dataset.
     Any other entity from the original dataset will be considered as "O".
+3. Any record in the dataset with more than 200 tokens(words) are removed. (What is left is already covering majority of the cases.)
+4. Any record without any entity in it is removed.
 All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.