sambanovasystems
/

SambaLingo-Russian-Base

@@ -52,7 +52,7 @@ All pre-training is done on the [Cultura-X](https://huggingface.co/datasets/uonl
 We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
 ## Evaluation Results
-| sambanovasystems/SambaLingo-Russian-Base | IlyaGusev/saiga_mistral_7b_merged | ai-forever/rugpt3large_based_on_gpt2 | bigscience/bloom-7b1 | facebook/xglm-7.5B | ai-forever/mGPT-13B |        |
 |------------------------------------------|-----------------------------------|--------------------------------------|----------------------|--------------------|---------------------|--------|
 | Holdout Perplexity (Lower is better)     | 1.444                             | 1.556                                | 1.611                | 1.797              | 1.504               | 1.806  |
 | FLORES en->ru (8 shot, CHRF)             | 47.19%                            | 42.46%                               | 31.90%               | 20.42%             | 26.26%              | 21.12% |
@@ -65,6 +65,7 @@ We extended the vocabulary of the base llama model from 32,000 tokens to 57,000
 | XStoryCloze (0 shot)                     | 71.67%                            | 68.96%                               | 60.75%               | 52.68%             | 63.40%              | 59.43% |
 | XWinograd (0 shot)                       | 69.21%                            | 66.67%                               | 60.63%               | 57.14%             | 63.17%              | 60.00% |
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

 We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
 ## Evaluation Results
+| | sambanovasystems/SambaLingo-Russian-Base | IlyaGusev/saiga_mistral_7b_merged | ai-forever/rugpt3large_based_on_gpt2 | bigscience/bloom-7b1 | facebook/xglm-7.5B | ai-forever/mGPT-13B |        |
 |------------------------------------------|-----------------------------------|--------------------------------------|----------------------|--------------------|---------------------|--------|
 | Holdout Perplexity (Lower is better)     | 1.444                             | 1.556                                | 1.611                | 1.797              | 1.504               | 1.806  |
 | FLORES en->ru (8 shot, CHRF)             | 47.19%                            | 42.46%                               | 31.90%               | 20.42%             | 26.26%              | 21.12% |
 | XStoryCloze (0 shot)                     | 71.67%                            | 68.96%                               | 60.75%               | 52.68%             | 63.40%              | 59.43% |
 | XWinograd (0 shot)                       | 69.21%                            | 66.67%                               | 60.63%               | 57.14%             | 63.17%              | 60.00% |
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->