Fill-Mask
Transformers
Safetensors
Japanese
modernbert
speed commited on
Commit
3a6d756
·
verified ·
1 Parent(s): bd4e27a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -5
README.md CHANGED
@@ -10,7 +10,7 @@ language:
10
  This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
11
  It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
12
 
13
- For detailed information on the training methods, evaluation, and analysis results, please visit at [TODO]()
14
 
15
  ## Usage
16
 
@@ -50,7 +50,7 @@ print("Predicted token:", predicted_token)
50
 
51
  This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
52
 
53
- Training code can be found at https://github.com/llm-jp/bert-ja
54
 
55
  | Model | stage 1 | stage 2 |
56
  |:------------------ |----------------:|----------------:|
@@ -77,14 +77,14 @@ For reference, [ModernBERT](https://arxiv.org/abs/2412.13663) uses 1.72T tokens
77
  ## Evaluation
78
 
79
  JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
80
- Evaluation code can be found at https://github.com/speed1313/bert-eval
81
 
82
  | Model | JSTS (pearson) | JNLI (accuracy) | JCoLA (accuracy) | Avg |
83
  |-------------------------------------------------------|--------|--------|---------|--------------|
84
  | tohoku-nlp/bert-base-japanese-v3 | 0.920 | 0.912 | 0.880 | 0.904 |
85
  | sbintuitions/modernbert-ja-130m | 0.916 | 0.927 | 0.868 | 0.904 |
86
  | sbintuitions/modernbert-ja-310m | **0.932** | **0.933** | **0.883** | **0.916** |
87
- | **speed/llm-jp-modernbert-base** | 0.918 | 0.913 | 0.844 | 0.892 |
88
 
89
  ## LICENSE
90
 
@@ -92,4 +92,14 @@ Evaluation code can be found at https://github.com/speed1313/bert-eval
92
 
93
  ## Citation
94
 
95
- TODO:
 
 
 
 
 
 
 
 
 
 
 
10
  This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
11
  It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
12
 
13
+ For detailed information on the training methods, evaluation, and analysis results, please visit at [llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length](https://arxiv.org/abs/2504.15544)
14
 
15
  ## Usage
16
 
 
50
 
51
  This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
52
 
53
+ Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
54
 
55
  | Model | stage 1 | stage 2 |
56
  |:------------------ |----------------:|----------------:|
 
77
  ## Evaluation
78
 
79
  JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
80
+ Evaluation code can be found at https://github.com/llm-jp/llm-jp-modernbert
81
 
82
  | Model | JSTS (pearson) | JNLI (accuracy) | JCoLA (accuracy) | Avg |
83
  |-------------------------------------------------------|--------|--------|---------|--------------|
84
  | tohoku-nlp/bert-base-japanese-v3 | 0.920 | 0.912 | 0.880 | 0.904 |
85
  | sbintuitions/modernbert-ja-130m | 0.916 | 0.927 | 0.868 | 0.904 |
86
  | sbintuitions/modernbert-ja-310m | **0.932** | **0.933** | **0.883** | **0.916** |
87
+ | **llm-jp/llm-jp-modernbert-base** | 0.918 | 0.913 | 0.844 | 0.892 |
88
 
89
  ## LICENSE
90
 
 
92
 
93
  ## Citation
94
 
95
+ ```
96
+ @misc{sugiura2025llmjpmodernbertmodernbertmodeltrained,
97
+ title={llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length},
98
+ author={Issa Sugiura and Kouta Nakayama and Yusuke Oda},
99
+ year={2025},
100
+ eprint={2504.15544},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.CL},
103
+ url={https://arxiv.org/abs/2504.15544},
104
+ }
105
+ ```