llm-jp
/

llm-jp-modernbert-base

Fill-Mask

Transformers

Safetensors

Japanese

modernbert

Model card Files Files and versions Community

speed commited on 6 days ago

Commit

3a6d756

verified ·

1 Parent(s): bd4e27a

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -5

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ language:
 This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
 It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
-For detailed information on the training methods, evaluation, and analysis results, please visit at [TODO]()
 ## Usage
@@ -50,7 +50,7 @@ print("Predicted token:", predicted_token)
 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
-Training code can be found at https://github.com/llm-jp/bert-ja
 | Model              |      stage 1    |    stage 2      |
 |:------------------ |----------------:|----------------:|
@@ -77,14 +77,14 @@ For reference, [ModernBERT](https://arxiv.org/abs/2412.13663) uses 1.72T tokens
 ## Evaluation
 JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
-Evaluation code can be found at https://github.com/speed1313/bert-eval
 | Model                                                 |   JSTS (pearson) |   JNLI (accuracy) |   JCoLA (accuracy) |   Avg |
 |-------------------------------------------------------|--------|--------|---------|--------------|
 | tohoku-nlp/bert-base-japanese-v3                      |  0.920 |  0.912 |   0.880 |        0.904 |
 | sbintuitions/modernbert-ja-130m                       |  0.916 |  0.927 |   0.868 |        0.904 |
 | sbintuitions/modernbert-ja-310m                       |  **0.932** |  **0.933** |   **0.883** |        **0.916** |
-| **speed/llm-jp-modernbert-base**        |  0.918 |  0.913 |   0.844 |        0.892 |
 ## LICENSE
@@ -92,4 +92,14 @@ Evaluation code can be found at https://github.com/speed1313/bert-eval
 ## Citation
-TODO:

 This model is based on the [modernBERT-base](https://arxiv.org/abs/2412.13663) architecture with [llm-jp-tokenizer](https://github.com/llm-jp/llm-jp-tokenizer).
 It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
+For detailed information on the training methods, evaluation, and analysis results, please visit at [llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length](https://arxiv.org/abs/2504.15544)
 ## Usage
 This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
+Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
 | Model              |      stage 1    |    stage 2      |
 |:------------------ |----------------:|----------------:|
 ## Evaluation
 JSTS, JNLI, and JCoLA from [JGLUE](https://aclanthology.org/2022.lrec-1.317/) were used.
+Evaluation code can be found at https://github.com/llm-jp/llm-jp-modernbert
 | Model                                                 |   JSTS (pearson) |   JNLI (accuracy) |   JCoLA (accuracy) |   Avg |
 |-------------------------------------------------------|--------|--------|---------|--------------|
 | tohoku-nlp/bert-base-japanese-v3                      |  0.920 |  0.912 |   0.880 |        0.904 |
 | sbintuitions/modernbert-ja-130m                       |  0.916 |  0.927 |   0.868 |        0.904 |
 | sbintuitions/modernbert-ja-310m                       |  **0.932** |  **0.933** |   **0.883** |        **0.916** |
+| **llm-jp/llm-jp-modernbert-base**        |  0.918 |  0.913 |   0.844 |        0.892 |
 ## LICENSE
 ## Citation
+```
+@misc{sugiura2025llmjpmodernbertmodernbertmodeltrained,
+      title={llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length},
+      author={Issa Sugiura and Kouta Nakayama and Yusuke Oda},
+      year={2025},
+      eprint={2504.15544},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2504.15544},
+}
+```