中文说明

多标签分类模型(明/清实录)

本模型基于 Jihuai/bert-ancient-chinese 预训练权重,可用于对《明实录》和《清实录》中的文本进行多标签分类推理。

训练语料

  • 来源《朝鲜王朝实录》
  • 训练样本数:约30万
  • 标签类别数:194
  • 文本语言:中文繁体(亦支持中文简体)

评价指标

指标 基于 Samples 基于 Label(Micro)
F1 0.7095 0.6714
Precision 0.7692 0.7441
Recall 0.7000 0.6116
Hamming Loss 0.0068

:虽然本模型能大致区分大多数标签,但对于正例极少的标签表现尚不理想。

后续工作

相关工作论文coming soon!

引用

如需引用,可先引用本页面,仅供学习交流用! 模型链接:https://huggingface.co/bztxb/shiluBERT

线上使用示例

见HuggingFace Space: bztxb/InferShilu image/png

English Version

Multi-Label Classification Model (The Veritable Records of the Ming/Qing Dynasty)

This model is built on the pretrained weights of Jihuai/bert-ancient-chinese and is designed for multi-label classification inference on texts from the Ming Shilu and Qing Shilu.

Training Corpus

Evaluation Metrics

Metric Sample-based Label-based (Micro)
F1 Score 0.7095 0.6714
Precision 0.7692 0.7441
Recall 0.7000 0.6116
Hamming Loss 0.0068

Note: Although this model can broadly distinguish most labels, its performance on classes with very few positive examples remains suboptimal.

Future Work

Working paper coming soon!

Citation

If you wish to cite this work, please refer to this page, welcome any feedback.
Model link: https://huggingface.co/bztxb/shiluBERT

How to use

See HuggingFace Space: bztxb/InferShilu image/png

Downloads last month
28
Safetensors
Model size
116M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using bztxb/shiluBERT 1