Updates details in subset selection
Browse files
README.md
CHANGED
@@ -43,11 +43,12 @@ The following steps have been taken for getting the subset:
|
|
43 |
- DebtInstrumentBasisSpreadOnVariableRate1
|
44 |
- DebtInstrumentFaceAmount
|
45 |
|
|
|
46 |
Any other entity from the original dataset will be considered as "O".
|
47 |
|
48 |
-
|
49 |
|
50 |
-
|
51 |
|
52 |
All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.
|
53 |
|
|
|
43 |
- DebtInstrumentBasisSpreadOnVariableRate1
|
44 |
- DebtInstrumentFaceAmount
|
45 |
|
46 |
+
These 4 entities above are picked up because they are the most common ones from the original dataset.
|
47 |
Any other entity from the original dataset will be considered as "O".
|
48 |
|
49 |
+
3. Any record in the dataset with more than 200 tokens(words) are removed. (What is left is already covering majority of the cases.)
|
50 |
|
51 |
+
4. Any record without any entity in it is removed.
|
52 |
|
53 |
All the three steps haven been executed with both "train" and "validation" part of the finer-139 dataset. For the "test" set, however, step 3 is not run because we still want to see how the fine-tuned model can cope with more generalized cases.
|
54 |
|