superbean commited on
Commit
a6faddb
·
verified ·
1 Parent(s): 3141de7

Updates details in subset selection

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -43,11 +43,12 @@ The following steps have been taken for getting the subset:
43
  - DebtInstrumentBasisSpreadOnVariableRate1
44
  - DebtInstrumentFaceAmount
45
 
 
46
  Any other entity from the original dataset will be considered as "O".
47
 
48
- 2. Any record in the dataset with more than 200 tokens(words) are removed. (What is left is already covering majority of the cases.)
49
 
50
- 3. Any record without any entity in it is removed.
51
 
52
  All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.
53
 
 
43
  - DebtInstrumentBasisSpreadOnVariableRate1
44
  - DebtInstrumentFaceAmount
45
 
46
+ These 4 entities above are picked up because they are the most common ones from the original dataset.
47
  Any other entity from the original dataset will be considered as "O".
48
 
49
+ 3. Any record in the dataset with more than 200 tokens(words) are removed. (What is left is already covering majority of the cases.)
50
 
51
+ 4. Any record without any entity in it is removed.
52
 
53
  All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.
54