This is a MicroBERT model for Indonesian.

  • Its suffix is -m, which means that it was pretrained using supervision from masked language modeling.
  • The unlabeled Indonesian data was taken from a June 2022 dump of Indonesian Wikipedia, downsampled to 1,439,772 tokens.
  • The UD treebank UD_Indonesian-GSD, v2.10, totaling 122,021 tokens, was used for labeled data.

Please see the repository and the paper for more details.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support