中文说明
多标签分类模型(明/清实录)
本模型基于 Jihuai/bert-ancient-chinese 预训练权重,可用于对《明实录》和《清实录》中的文本进行多标签分类推理。
训练语料
- 来源:《朝鲜王朝实录》
- 训练样本数:约30万
- 标签类别数:194
- 文本语言:中文繁体(亦支持中文简体)
评价指标
指标 | 基于 Samples | 基于 Label(Micro) |
---|---|---|
F1 | 0.7095 | 0.6714 |
Precision | 0.7692 | 0.7441 |
Recall | 0.7000 | 0.6116 |
Hamming Loss | 0.0068 |
注:虽然本模型能大致区分大多数标签,但对于正例极少的标签表现尚不理想。
后续工作
相关工作论文coming soon!
引用
如需引用,可先引用本页面,仅供学习交流用! 模型链接:https://huggingface.co/bztxb/shiluBERT
线上使用示例
见HuggingFace Space: bztxb/InferShilu
English Version
Multi-Label Classification Model (The Veritable Records of the Ming/Qing Dynasty)
This model is built on the pretrained weights of Jihuai/bert-ancient-chinese and is designed for multi-label classification inference on texts from the Ming Shilu and Qing Shilu.
Training Corpus
- Source: The Veritable Records of the Joseon Dynasty
- Training size: about 0.3 million
- Number of Labels: 194
- Text Language: Traditional Chinese (also supports Simplified Chinese)
Evaluation Metrics
Metric | Sample-based | Label-based (Micro) |
---|---|---|
F1 Score | 0.7095 | 0.6714 |
Precision | 0.7692 | 0.7441 |
Recall | 0.7000 | 0.6116 |
Hamming Loss | 0.0068 | — |
Note: Although this model can broadly distinguish most labels, its performance on classes with very few positive examples remains suboptimal.
Future Work
Working paper coming soon!
Citation
If you wish to cite this work, please refer to this page, welcome any feedback.
Model link: https://huggingface.co/bztxb/shiluBERT
How to use
See HuggingFace Space: bztxb/InferShilu
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support