Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ language:
|
|
10 |
This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
|
11 |
It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
|
12 |
|
13 |
-
For detailed information on the training methods, evaluation, and analysis results, please visit at [
|
14 |
|
15 |
## Usage
|
16 |
|
@@ -50,7 +50,7 @@ print("Predicted token:", predicted_token)
|
|
50 |
|
51 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|
52 |
|
53 |
-
Training code can be found at https://github.com/llm-jp/
|
54 |
|
55 |
| Model | stage 1 | stage 2 |
|
56 |
|:------------------ |----------------:|----------------:|
|
@@ -77,14 +77,14 @@ For reference, [ModernBERT](https://arxiv.org/abs/2412.13663) uses 1.72T tokens
|
|
77 |
## Evaluation
|
78 |
|
79 |
JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
|
80 |
-
Evaluation code can be found at https://github.com/
|
81 |
|
82 |
| Model | JSTS (pearson) | JNLI (accuracy) | JCoLA (accuracy) | Avg |
|
83 |
|-------------------------------------------------------|--------|--------|---------|--------------|
|
84 |
| tohoku-nlp/bert-base-japanese-v3 | 0.920 | 0.912 | 0.880 | 0.904 |
|
85 |
| sbintuitions/modernbert-ja-130m | 0.916 | 0.927 | 0.868 | 0.904 |
|
86 |
| sbintuitions/modernbert-ja-310m | **0.932** | **0.933** | **0.883** | **0.916** |
|
87 |
-
| **
|
88 |
|
89 |
## LICENSE
|
90 |
|
@@ -92,4 +92,14 @@ Evaluation code can be found at https://github.com/speed1313/bert-eval
|
|
92 |
|
93 |
## Citation
|
94 |
|
95 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
|
11 |
It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
|
12 |
|
13 |
+
For detailed information on the training methods, evaluation, and analysis results, please visit at [llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length](https://arxiv.org/abs/2504.15544)
|
14 |
|
15 |
## Usage
|
16 |
|
|
|
50 |
|
51 |
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
|
52 |
|
53 |
+
Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
|
54 |
|
55 |
| Model | stage 1 | stage 2 |
|
56 |
|:------------------ |----------------:|----------------:|
|
|
|
77 |
## Evaluation
|
78 |
|
79 |
JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
|
80 |
+
Evaluation code can be found at https://github.com/llm-jp/llm-jp-modernbert
|
81 |
|
82 |
| Model | JSTS (pearson) | JNLI (accuracy) | JCoLA (accuracy) | Avg |
|
83 |
|-------------------------------------------------------|--------|--------|---------|--------------|
|
84 |
| tohoku-nlp/bert-base-japanese-v3 | 0.920 | 0.912 | 0.880 | 0.904 |
|
85 |
| sbintuitions/modernbert-ja-130m | 0.916 | 0.927 | 0.868 | 0.904 |
|
86 |
| sbintuitions/modernbert-ja-310m | **0.932** | **0.933** | **0.883** | **0.916** |
|
87 |
+
| **llm-jp/llm-jp-modernbert-base** | 0.918 | 0.913 | 0.844 | 0.892 |
|
88 |
|
89 |
## LICENSE
|
90 |
|
|
|
92 |
|
93 |
## Citation
|
94 |
|
95 |
+
```
|
96 |
+
@misc{sugiura2025llmjpmodernbertmodernbertmodeltrained,
|
97 |
+
title={llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length},
|
98 |
+
author={Issa Sugiura and Kouta Nakayama and Yusuke Oda},
|
99 |
+
year={2025},
|
100 |
+
eprint={2504.15544},
|
101 |
+
archivePrefix={arXiv},
|
102 |
+
primaryClass={cs.CL},
|
103 |
+
url={https://arxiv.org/abs/2504.15544},
|
104 |
+
}
|
105 |
+
```
|