Update README.md
Browse files
README.md
CHANGED
@@ -52,7 +52,7 @@ All pre-training is done on the [Cultura-X](https://huggingface.co/datasets/uonl
|
|
52 |
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
|
53 |
|
54 |
## Evaluation Results
|
55 |
-
| sambanovasystems/SambaLingo-Russian-Base | IlyaGusev/saiga_mistral_7b_merged | ai-forever/rugpt3large_based_on_gpt2 | bigscience/bloom-7b1 | facebook/xglm-7.5B | ai-forever/mGPT-13B | |
|
56 |
|------------------------------------------|-----------------------------------|--------------------------------------|----------------------|--------------------|---------------------|--------|
|
57 |
| Holdout Perplexity (Lower is better) | 1.444 | 1.556 | 1.611 | 1.797 | 1.504 | 1.806 |
|
58 |
| FLORES en->ru (8 shot, CHRF) | 47.19% | 42.46% | 31.90% | 20.42% | 26.26% | 21.12% |
|
@@ -65,6 +65,7 @@ We extended the vocabulary of the base llama model from 32,000 tokens to 57,000
|
|
65 |
| XStoryCloze (0 shot) | 71.67% | 68.96% | 60.75% | 52.68% | 63.40% | 59.43% |
|
66 |
| XWinograd (0 shot) | 69.21% | 66.67% | 60.63% | 57.14% | 63.17% | 60.00% |
|
67 |
|
|
|
68 |
## Uses
|
69 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
70 |
|
|
|
52 |
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
|
53 |
|
54 |
## Evaluation Results
|
55 |
+
| | sambanovasystems/SambaLingo-Russian-Base | IlyaGusev/saiga_mistral_7b_merged | ai-forever/rugpt3large_based_on_gpt2 | bigscience/bloom-7b1 | facebook/xglm-7.5B | ai-forever/mGPT-13B | |
|
56 |
|------------------------------------------|-----------------------------------|--------------------------------------|----------------------|--------------------|---------------------|--------|
|
57 |
| Holdout Perplexity (Lower is better) | 1.444 | 1.556 | 1.611 | 1.797 | 1.504 | 1.806 |
|
58 |
| FLORES en->ru (8 shot, CHRF) | 47.19% | 42.46% | 31.90% | 20.42% | 26.26% | 21.12% |
|
|
|
65 |
| XStoryCloze (0 shot) | 71.67% | 68.96% | 60.75% | 52.68% | 63.40% | 59.43% |
|
66 |
| XWinograd (0 shot) | 69.21% | 66.67% | 60.63% | 57.14% | 63.17% | 60.00% |
|
67 |
|
68 |
+
|
69 |
## Uses
|
70 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
71 |
|