NamBert-for-csc
Official model for the paper "Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design".
Github: https://github.com/iioSnail/NamBert
The sentence-level performance of the model in SIGHAN datasets is as follows:
Detect-Acc | Detect-Precision | Detect-Recall | Detect-F1 | Correct-Acc | Correct-Precision | Correct-Recall | Correct-F1 | |
---|---|---|---|---|---|---|---|---|
Sighan2013 | 82.70 | 87.72 | 82.39 | 84.97 | 81.60 | 86.51 | 81.26 | 83.80 |
Sighan2014 | 79.76 | 69.03 | 75.00 | 71.89 | 79.10 | 67.79 | 73.65 | 70.60 |
Sighan2015 | 86.18 | 77.52 | 85.40 | 81.27 | 85.73 | 76.68 | 84.47 | 80.39 |
Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True)
model = AutoModel.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True)
inputs = tokenizer("我喜换吃平果,逆呢?", return_tensors='pt')
logits = model(**inputs).logits
target_ids = logits.argmax(-1)
target_ids = tokenizer.restore_ids(target_ids, inputs['input_ids'])
print(''.join(tokenizer.convert_ids_to_tokens(target_ids[0, 1:-1])))
Or
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True)
model = AutoModel.from_pretrained("iioSnail/NamBert-for-csc", trust_remote_code=True)
model = model.to(device)
model = model.eval()
model.set_tokenizer(tokenizer)
model.predict("我是炼习时长两念半的个人练习生菜徐坤")
model.predict(["我是炼习时长两念半的个人练习生菜徐坤", "喜欢场跳rap篮球!!"])
- Downloads last month
- 143
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support