---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:967831
- loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
widget:
- source_sentence: 'Penghasilan rata-rata pelaku usaha mandiri: Analisis berdasarkan
lokasi dan jenjang pendidikan, 2023'
sentences:
- Rata-rata Pendapatan bersih Berusaha Sendiri menurut Provinsi dan Pendidikan yang
Ditamatkan, 2023
- Rata-Rata Pengeluaran per Kapita Sebulan Menurut Kelompok Barang (rupiah), 2013-2021
- Ringkasan Neraca Arus Dana, Triwulan III, 2006, (Miliar Rupiah)
- source_sentence: Bagaimana traffic penerbangan internasional di Indonesia pada 2008?
sentences:
- Tingkat Inflasi Harga Konsumen Nasional Bulanan (M-to-M) 1 (2022=100)
- Balita (0-59 Bulan) Menurut Status Gizi, Tahun 1998-2005
- Lalu Lintas Penerbangan Luar Negeri Indonesia Tahun 2003-2022
- source_sentence: Data indeks daya penyebaran dan derajat kepekaan sektor ekonomi,
ambil contoh tahun 2005
sentences:
- Indeks Daya Penyebaran dan Indeks Derajat Kepekaan Menurut Sektor Ekonomi, 1995,
2000, 2005, dan 2010
- Ekspor Kopi Menurut Negara Tujuan Utama, 2000-2023
- Anggaran Kesehatan dari Direktorat Penyusunan APBN - Direktorat Jenderal Anggaran,
Kementerian Keuangan
- source_sentence: Data aktivitas penduduk 15 tahun ke atas berdasarkan kelompok umur,
satu minggu ke belakang (periode 2002)
sentences:
- Ekspor Lada Putih menurut Negara Tujuan Utama, 2012-2023
- Rata-rata Konsumsi dan Pengeluaran Perkapita Seminggu Menurut Komoditi Makanan
dan Golongan Pengeluaran per Kapita Seminggu di Provinsi Sulawesi Selatan, 2018-2023
- Penduduk Berumur 15 Tahun Ke Atas Menurut Golongan Umur dan Jenis Kegiatan Selama
Seminggu yang Lalu, 1997 - 2007
- source_sentence: Laporan singkat arus kas Q2 2005, dalam miliar
sentences:
- Ringkasan Neraca Arus Dana, Triwulan Kedua, 2005, (Miliar Rupiah)
- Indikator Pendidikan, 1994-2023
- Rata-rata Upah/Gaji Bersih sebulan Buruh/Karyawan Pegawai Menurut Pendidikan Tertinggi
dan Jumlah Jam Kerja Utama, 2020
datasets:
- yahyaabd/statictable-triplets-all
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@1
- cosine_ndcg@5
- cosine_ndcg@10
- cosine_mrr@1
- cosine_mrr@5
- cosine_mrr@10
- cosine_map@1
- cosine_map@5
- cosine_map@10
model-index:
- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: bps statictable ir
type: bps-statictable-ir
metrics:
- type: cosine_accuracy@1
value: 0.8990228013029316
name: Cosine Accuracy@1
- type: cosine_accuracy@5
value: 0.9837133550488599
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.8990228013029316
name: Cosine Precision@1
- type: cosine_precision@5
value: 0.21889250814332245
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.12605863192182412
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7029638149674847
name: Cosine Recall@1
- type: cosine_recall@5
value: 0.789022126091837
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8116078533769628
name: Cosine Recall@10
- type: cosine_ndcg@1
value: 0.8990228013029316
name: Cosine Ndcg@1
- type: cosine_ndcg@5
value: 0.8178579787978988
name: Cosine Ndcg@5
- type: cosine_ndcg@10
value: 0.8156444177517035
name: Cosine Ndcg@10
- type: cosine_mrr@1
value: 0.8990228013029316
name: Cosine Mrr@1
- type: cosine_mrr@5
value: 0.9347991313789358
name: Cosine Mrr@5
- type: cosine_mrr@10
value: 0.9368827878599865
name: Cosine Mrr@10
- type: cosine_map@1
value: 0.8990228013029316
name: Cosine Map@1
- type: cosine_map@5
value: 0.772128121606949
name: Cosine Map@5
- type: cosine_map@10
value: 0.7635855701310564
name: Cosine Map@10
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the [statictable-triplets-all](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Maximum Sequence Length:** 128 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [statictable-triplets-all](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all)
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yahyaabd/paraphrase-multilingual-miniLM-L12-v2-mnrl-beir-2")
# Run inference
sentences = [
'Laporan singkat arus kas Q2 2005, dalam miliar',
'Ringkasan Neraca Arus Dana, Triwulan Kedua, 2005, (Miliar Rupiah)',
'Rata-rata Upah/Gaji Bersih sebulan Buruh/Karyawan Pegawai Menurut Pendidikan Tertinggi dan Jumlah Jam Kerja Utama, 2020',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `bps-statictable-ir`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.899 |
| cosine_accuracy@5 | 0.9837 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.899 |
| cosine_precision@5 | 0.2189 |
| cosine_precision@10 | 0.1261 |
| cosine_recall@1 | 0.703 |
| cosine_recall@5 | 0.789 |
| cosine_recall@10 | 0.8116 |
| cosine_ndcg@1 | 0.899 |
| cosine_ndcg@5 | 0.8179 |
| **cosine_ndcg@10** | **0.8156** |
| cosine_mrr@1 | 0.899 |
| cosine_mrr@5 | 0.9348 |
| cosine_mrr@10 | 0.9369 |
| cosine_map@1 | 0.899 |
| cosine_map@5 | 0.7721 |
| cosine_map@10 | 0.7636 |
## Training Details
### Training Dataset
#### statictable-triplets-all
* Dataset: [statictable-triplets-all](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all) at [24979b4](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all/tree/24979b4f0d8269377aca975e20d52e69c3b5a030)
* Size: 967,831 training samples
* Columns: query
, pos
, and neg
* Approximate statistics based on the first 1000 samples:
| | query | pos | neg |
|:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
| type | string | string | string |
| details |
Indeks harga petani (diterima & dibayar) dan NTP per provinsi, 2012
| Indeks Harga yang Diterima Petani (It), Indeks Harga yang Dibayar Petani (Ib), dan Nilai Tukar Petani (NTP) Menurut Provinsi, 2008-2016
| Persentase Rumah Tangga Menurut Provinsi dan KebiasaanMemanfaatkan Air Bekas untuk Keperluan Lain, 2013, 2014, 2017, 2021
|
| Data rumah tangga perikanan budidaya Indonesia, detail per provinsi dan jenis budidaya, di tahun 2008
| Jumlah Rumah Tangga Perikanan Budidaya Menurut Provinsi dan Jenis Budidaya, 2000-2016
| Ringkasan Neraca Arus Dana, 2005, (Miliar Rupiah)
|
| Lapangan pekerjaan vs pendidikan pekerja (15 tahun ke atas), 1986 hingga 1996
| Penduduk Berumur 15 Tahun Ke Atas yang Bekerja Selama Seminggu yang Lalu Menurut Lapangan Pekerjaan Utama dan Pendidikan Tertinggi yang Ditamatkan, 1986 -1996
| Tabel Input-Output Indonesia Transaksi Domestik Atas Dasar Harga Produsen (17 Lapangan Usaha), 2016 (Juta Rupiah)
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Evaluation Dataset
#### statictable-triplets-all
* Dataset: [statictable-triplets-all](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all) at [24979b4](https://huggingface.co/datasets/yahyaabd/statictable-triplets-all/tree/24979b4f0d8269377aca975e20d52e69c3b5a030)
* Size: 967,831 evaluation samples
* Columns: query
, pos
, and neg
* Approximate statistics based on the first 1000 samples:
| | query | pos | neg |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string | string |
| details | Bagaimana hubungan IHK dan rata-rata upah buruh industri (bukan supervisor) bulanan tahun 2010, acuan 1996?
| IHK dan Rata-rata Upah per Bulan Buruh Industri di Bawah Mandor (Supervisor), 1996-2014 (1996=100)
| Rata-rata Harga Valuta Asing Terpilih menurut Provinsi, 2014
|
| Berapa rata-rata gaji bulanan pekerja Indonesia berdasarkan ijazah terakhir dan sektor pekerjaannya (2017)?
| Rata-rata Upah/Gaji Bersih Sebulan Buruh/Karyawan/Pegawai Menurut Pendidikan Tertinggi yang Ditamatkan dan Lapangan Pekerjaan Utama di 9 Sektor (rupiah), 2017
| Rata-Rata Pengeluaran per Kapita Sebulan Menurut Kelompok Barang (rupiah), 2013-2021
|
| Data luas lahan (hektar) yang dipakai untuk jenis budidaya perikanan di tiap provinsi tahun 2009
| Luas Area Usaha Budidaya Perikanan Menurut Provinsi dan Jenis Budidaya (ha), 2005-2016
| Ringkasan Neraca Arus Dana, Triwulan I, 2008, (Miliar Rupiah)
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `weight_decay`: 0.01
- `num_train_epochs`: 2
- `lr_scheduler_type`: reduce_lr_on_plateau
- `lr_scheduler_kwargs`: {'factor': 0.5, 'patience': 2}
- `warmup_steps`: 10000
- `save_on_each_node`: True
- `fp16`: True
- `dataloader_num_workers`: 2
- `load_best_model_at_end`: True
- `eval_on_start`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters