中文说明

多标签分类模型（明/清实录）

本模型基于 Jihuai/bert-ancient-chinese 预训练权重，可用于对《明实录》和《清实录》中的文本进行多标签分类推理。

训练语料

来源：《朝鲜王朝实录》
训练样本数：约30万
标签类别数：194
文本语言：中文繁体（亦支持中文简体）

评价指标

指标	基于 Samples	基于 Label（Micro）
F1	0.7095	0.6714
Precision	0.7692	0.7441
Recall	0.7000	0.6116
Hamming Loss	0.0068

注：虽然本模型能大致区分大多数标签，但对于正例极少的标签表现尚不理想。

后续工作

引用

如需引用，可先引用本页面，仅供学习交流用！模型链接：https://huggingface.co/bztxb/shiluBERT

线上使用示例

见HuggingFace Space: bztxb/InferShilu

English Version

Multi-Label Classification Model (The Veritable Records of the Ming/Qing Dynasty)

This model is built on the pretrained weights of Jihuai/bert-ancient-chinese and is designed for multi-label classification inference on texts from the Ming Shilu and Qing Shilu.

Training Corpus

Source: The Veritable Records of the Joseon Dynasty
Training size: about 0.3 million
Number of Labels: 194
Text Language: Traditional Chinese (also supports Simplified Chinese)

Evaluation Metrics

Metric	Sample-based	Label-based (Micro)
F1 Score	0.7095	0.6714
Precision	0.7692	0.7441
Recall	0.7000	0.6116
Hamming Loss	0.0068	—

Note: Although this model can broadly distinguish most labels, its performance on classes with very few positive examples remains suboptimal.

Future Work

Working paper coming soon!

Citation

If you wish to cite this work, please refer to this page, welcome any feedback.
Model link: https://huggingface.co/bztxb/shiluBERT

How to use

See HuggingFace Space: bztxb/InferShilu

bztxb
/

shiluBERT