Regulatory Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hshashank06/regulatory-model")
# Run inference
sentences = [
    ' Home Depot\'s stock closed at $135.39 while being above a "golden cross" on January 19, 2017.',
    ' In the given text passage, when did Home Depot\'s stock close at $135.39 while being above a "golden cross"? \n',
    ' According to Maley, where might the funds from potentially declining sectors like FANGs be directed towards? \n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_accuracy@3 0.7349 0.7302 0.7223 0.7058 0.6759
cosine_accuracy@5 0.7676 0.7651 0.7568 0.7406 0.7139
cosine_accuracy@10 0.8058 0.8034 0.7947 0.7819 0.7583
cosine_precision@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_precision@3 0.245 0.2434 0.2408 0.2353 0.2253
cosine_precision@5 0.1535 0.153 0.1514 0.1481 0.1428
cosine_precision@10 0.0806 0.0803 0.0795 0.0782 0.0758
cosine_recall@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_recall@3 0.7349 0.7302 0.7223 0.7058 0.6759
cosine_recall@5 0.7676 0.7651 0.7568 0.7406 0.7139
cosine_recall@10 0.8058 0.8034 0.7947 0.7819 0.7583
cosine_ndcg@10 0.7073 0.704 0.6945 0.6793 0.6523
cosine_mrr@10 0.6755 0.6719 0.6622 0.6463 0.6182
cosine_map@100 0.6797 0.6761 0.6666 0.6508 0.6229

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 185,814 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 3 tokens
    • mean: 43.18 tokens
    • max: 200 tokens
    • min: 10 tokens
    • mean: 23.08 tokens
    • max: 63 tokens
  • Samples:
    positive anchor
    The BVPS (Book Value Per Share) is calculated by dividing a company's common equity value by its total number of shares outstanding. In the given example, if a company has a common equity value of $100 million and 10 million shares outstanding, its BVPS would be $10 ($100 million / 10 million). You can calculate a company's BVPS using Microsoft Excel by entering the values of common stock, retained earnings, and additional paid-in capital into cells A1 through A3. What is the BVPS and how is it calculated?
    They facilitate commodities trading using their resources, can take delivery of commodities if needed, provide advisory services for clients, and act as market makers by buying and selling futures contracts to add liquidity to the marketplace. The passage uses the example of a commercial baking firm to demonstrate how their impact can be seen in the market. What role do eligible commercial entities play in commodities trading and market liquidity?
    Naive diversification is a type of diversification strategy where an investor randomly selects different securities, hoping to lower the risk of the portfolio due to the varied nature of the chosen securities. It is less sophisticated than diversification methods using statistical modeling, but when guided by experience, careful security examination, and common sense, it remains an effective strategy for reducing portfolio risk. What is the concept of naive diversification in investing and how does it compare to more sophisticated diversification methods?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0276 10 43.573 - - - - -
0.0551 20 42.1758 - - - - -
0.0827 30 37.6368 - - - - -
0.1102 40 34.5743 - - - - -
0.1378 50 29.5956 - - - - -
0.1653 60 23.4468 - - - - -
0.1929 70 19.7425 - - - - -
0.2204 80 16.9744 - - - - -
0.2480 90 15.2437 - - - - -
0.2755 100 13.9444 - - - - -
0.3031 110 12.067 - - - - -
0.3306 120 11.1149 - - - - -
0.3582 130 10.4083 - - - - -
0.3857 140 8.915 - - - - -
0.4133 150 9.4964 - - - - -
0.4408 160 8.0434 - - - - -
0.4684 170 8.1963 - - - - -
0.4960 180 8.5704 - - - - -
0.5235 190 7.711 - - - - -
0.5511 200 7.6676 - - - - -
0.5786 210 6.9899 - - - - -
0.6062 220 7.6195 - - - - -
0.6337 230 7.0456 - - - - -
0.6613 240 7.5541 - - - - -
0.6888 250 6.6543 - - - - -
0.7164 260 6.8849 - - - - -
0.7439 270 7.6635 - - - - -
0.7715 280 7.2155 - - - - -
0.7990 290 6.3284 - - - - -
0.8266 300 6.577 - - - - -
0.8541 310 5.0835 - - - - -
0.8817 320 6.1866 - - - - -
0.9092 330 5.9467 - - - - -
0.9368 340 5.663 - - - - -
0.9644 350 5.417 - - - - -
0.9919 360 6.0331 - - - - -
0.9974 362 - 0.6940 0.6900 0.6791 0.6603 0.6273
1.0220 370 5.5374 - - - - -
1.0496 380 4.5917 - - - - -
1.0771 390 4.6483 - - - - -
1.1047 400 4.96 - - - - -
1.1323 410 4.6808 - - - - -
1.1598 420 5.2396 - - - - -
1.1874 430 4.651 - - - - -
1.2149 440 4.4875 - - - - -
1.2425 450 4.6877 - - - - -
1.2700 460 4.2209 - - - - -
1.2976 470 4.678 - - - - -
1.3251 480 4.6774 - - - - -
1.3527 490 4.4409 - - - - -
1.3802 500 4.4464 - - - - -
1.4078 510 4.2724 - - - - -
1.4353 520 4.5017 - - - - -
1.4629 530 4.3469 - - - - -
1.4904 540 4.4925 - - - - -
1.5180 550 3.922 - - - - -
1.5455 560 4.6949 - - - - -
1.5731 570 4.0364 - - - - -
1.6007 580 4.3846 - - - - -
1.6282 590 3.7526 - - - - -
1.6558 600 4.0508 - - - - -
1.6833 610 4.6315 - - - - -
1.7109 620 3.7683 - - - - -
1.7384 630 4.6994 - - - - -
1.7660 640 4.1994 - - - - -
1.7935 650 4.3915 - - - - -
1.8211 660 4.2947 - - - - -
1.8486 670 4.6972 - - - - -
1.8762 680 4.1664 - - - - -
1.9037 690 4.1861 - - - - -
1.9313 700 3.6879 - - - - -
1.9588 710 4.3767 - - - - -
1.9864 720 4.48 - - - - -
1.9974 724 - 0.7013 0.6971 0.6885 0.6716 0.6414
2.0165 730 3.6164 - - - - -
2.0441 740 3.3361 - - - - -
2.0716 750 3.4175 - - - - -
2.0992 760 3.9006 - - - - -
2.1267 770 3.0823 - - - - -
2.1543 780 3.029 - - - - -
2.1818 790 3.8081 - - - - -
2.2094 800 3.4486 - - - - -
2.2370 810 3.6064 - - - - -
2.2645 820 3.0896 - - - - -
2.2921 830 3.3233 - - - - -
2.3196 840 2.9528 - - - - -
2.3472 850 3.0482 - - - - -
2.3747 860 3.2795 - - - - -
2.4023 870 2.9218 - - - - -
2.4298 880 3.4518 - - - - -
2.4574 890 3.6095 - - - - -
2.4849 900 3.2002 - - - - -
2.5125 910 3.368 - - - - -
2.5400 920 3.0623 - - - - -
2.5676 930 3.3495 - - - - -
2.5951 940 3.7123 - - - - -
2.6227 950 3.7795 - - - - -
2.6502 960 3.5567 - - - - -
2.6778 970 3.3498 - - - - -
2.7054 980 3.3141 - - - - -
2.7329 990 2.9425 - - - - -
2.7605 1000 2.9978 - - - - -
2.7880 1010 3.2468 - - - - -
2.8156 1020 2.5252 - - - - -
2.8431 1030 3.3108 - - - - -
2.8707 1040 3.195 - - - - -
2.8982 1050 3.1019 - - - - -
2.9258 1060 3.7059 - - - - -
2.9533 1070 3.1952 - - - - -
2.9809 1080 3.2454 - - - - -
2.9974 1086 - 0.7056 0.7030 0.6939 0.6779 0.6505
3.0110 1090 3.3788 - - - - -
3.0386 1100 2.9617 - - - - -
3.0661 1110 3.4313 - - - - -
3.0937 1120 2.5883 - - - - -
3.1212 1130 2.8836 - - - - -
3.1488 1140 2.3895 - - - - -
3.1763 1150 2.5155 - - - - -
3.2039 1160 3.3168 - - - - -
3.2314 1170 3.0286 - - - - -
3.2590 1180 3.1494 - - - - -
3.2866 1190 2.87 - - - - -
3.3141 1200 2.591 - - - - -
3.3417 1210 2.8437 - - - - -
3.3692 1220 3.0344 - - - - -
3.3968 1230 3.0685 - - - - -
3.4243 1240 3.4623 - - - - -
3.4519 1250 3.4256 - - - - -
3.4794 1260 2.7349 - - - - -
3.5070 1270 2.8587 - - - - -
3.5345 1280 2.729 - - - - -
3.5621 1290 3.0288 - - - - -
3.5896 1300 2.6599 - - - - -
3.6172 1310 2.4755 - - - - -
3.6447 1320 3.0501 - - - - -
3.6723 1330 2.545 - - - - -
3.6998 1340 2.5919 - - - - -
3.7274 1350 2.9026 - - - - -
3.7550 1360 2.7362 - - - - -
3.7825 1370 3.3311 - - - - -
3.8101 1380 2.8415 - - - - -
3.8376 1390 3.2033 - - - - -
3.8652 1400 2.7483 - - - - -
3.8927 1410 3.0403 - - - - -
3.9203 1420 3.0724 - - - - -
3.9478 1430 2.9797 - - - - -
3.9754 1440 2.6779 - - - - -
3.9974 1448 - 0.7073 0.704 0.6945 0.6793 0.6523
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.4.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for hshashank06/regulatory-model

Finetuned
(369)
this model

Evaluation results