ModernBERT-base trained on GooAQ

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the ๐Ÿค— Hub
model = CrossEncoder("tomaarsen/reranker-ModernBERT-base-gooaq-bce-0margin-3min-100max-5top")
# Get scores for pairs of texts
pairs = [
    ['what is baking powder bicarbonate soda?', 'Baking soda and bicarbonate of soda are actually different names for the same thing. ... Both bicarbonate of soda and baking powder are leavening (raising) agents. When included in a batter, the leavening agent creates air bubbles that expand when cooked, and cause it to rise.'],
    ['what is baking powder bicarbonate soda?', "What is baking soda? Baking soda is a leavening agent used in baked goods like cakes, muffins, and cookies. Formally known as sodium bicarbonate, it's a white crystalline powder that is naturally alkaline, or basic (1). Baking soda becomes activated when it's combined with both an acidic ingredient and a liquid."],
    ['what is baking powder bicarbonate soda?', 'The chemical name for baking powder is sodium hydrogencarbonate. You may see it called bicarbonate of soda in the supermarket. This is the old name for the same stuff. It has the chemical formula NaHCO3.'],
    ['what is baking powder bicarbonate soda?', "Substituting baking soda for baking powder What's more, baking soda has much stronger leavening power than baking powder. As a rule of thumb, about 1 teaspoon of baking powder is equivalent to 1/4 teaspoon of baking soda."],
    ['what is baking powder bicarbonate soda?', "Baking soda is a leavening agent used in baked goods like cakes, muffins, and cookies. Formally known as sodium bicarbonate, it's a white crystalline powder that is naturally alkaline, or basic (1). Baking soda becomes activated when it's combined with both an acidic ingredient and a liquid."],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what is baking powder bicarbonate soda?',
    [
        'Baking soda and bicarbonate of soda are actually different names for the same thing. ... Both bicarbonate of soda and baking powder are leavening (raising) agents. When included in a batter, the leavening agent creates air bubbles that expand when cooked, and cause it to rise.',
        "What is baking soda? Baking soda is a leavening agent used in baked goods like cakes, muffins, and cookies. Formally known as sodium bicarbonate, it's a white crystalline powder that is naturally alkaline, or basic (1). Baking soda becomes activated when it's combined with both an acidic ingredient and a liquid.",
        'The chemical name for baking powder is sodium hydrogencarbonate. You may see it called bicarbonate of soda in the supermarket. This is the old name for the same stuff. It has the chemical formula NaHCO3.',
        "Substituting baking soda for baking powder What's more, baking soda has much stronger leavening power than baking powder. As a rule of thumb, about 1 teaspoon of baking powder is equivalent to 1/4 teaspoon of baking soda.",
        "Baking soda is a leavening agent used in baked goods like cakes, muffins, and cookies. Formally known as sodium bicarbonate, it's a white crystalline powder that is naturally alkaline, or basic (1). Baking soda becomes activated when it's combined with both an acidic ingredient and a liquid.",
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric Value
map 0.7234 (+0.1923)
mrr@10 0.7223 (+0.1984)
ndcg@10 0.7676 (+0.1764)

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.4711 (-0.0185) 0.3601 (+0.0992) 0.6047 (+0.1851)
mrr@10 0.4565 (-0.0210) 0.5969 (+0.0971) 0.6064 (+0.1797)
ndcg@10 0.5342 (-0.0062) 0.4250 (+0.0999) 0.6652 (+0.1646)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.4786 (+0.0886)
mrr@10 0.5533 (+0.0853)
ndcg@10 0.5415 (+0.0861)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 577,957 training samples
  • Columns: question, answer, and label
  • Approximate statistics based on the first 1000 samples:
    question answer label
    type string string int
    details
    • min: 21 characters
    • mean: 42.64 characters
    • max: 76 characters
    • min: 54 characters
    • mean: 250.97 characters
    • max: 376 characters
    • 0: ~83.00%
    • 1: ~17.00%
  • Samples:
    question answer label
    what is baking powder bicarbonate soda? Baking soda and bicarbonate of soda are actually different names for the same thing. ... Both bicarbonate of soda and baking powder are leavening (raising) agents. When included in a batter, the leavening agent creates air bubbles that expand when cooked, and cause it to rise. 1
    what is baking powder bicarbonate soda? What is baking soda? Baking soda is a leavening agent used in baked goods like cakes, muffins, and cookies. Formally known as sodium bicarbonate, it's a white crystalline powder that is naturally alkaline, or basic (1). Baking soda becomes activated when it's combined with both an acidic ingredient and a liquid. 0
    what is baking powder bicarbonate soda? The chemical name for baking powder is sodium hydrogencarbonate. You may see it called bicarbonate of soda in the supermarket. This is the old name for the same stuff. It has the chemical formula NaHCO3. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fct": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss gooaq-dev_ndcg@10 NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - 0.1293 (-0.4619) 0.0284 (-0.5121) 0.2145 (-0.1105) 0.0134 (-0.4872) 0.0854 (-0.3699)
0.0001 1 1.2576 - - - - -
0.0221 200 1.2027 - - - - -
0.0443 400 1.1352 - - - - -
0.0664 600 0.7686 - - - - -
0.0886 800 0.6163 - - - - -
0.1107 1000 0.5764 0.7162 (+0.1250) 0.4924 (-0.0480) 0.3647 (+0.0396) 0.6409 (+0.1403) 0.4993 (+0.0440)
0.1329 1200 0.5488 - - - - -
0.1550 1400 0.525 - - - - -
0.1772 1600 0.4987 - - - - -
0.1993 1800 0.4943 - - - - -
0.2215 2000 0.4777 0.7508 (+0.1596) 0.5672 (+0.0268) 0.3969 (+0.0718) 0.6236 (+0.1230) 0.5292 (+0.0739)
0.2436 2200 0.4487 - - - - -
0.2658 2400 0.4582 - - - - -
0.2879 2600 0.4473 - - - - -
0.3100 2800 0.4266 - - - - -
0.3322 3000 0.4374 0.7478 (+0.1565) 0.5851 (+0.0446) 0.3863 (+0.0613) 0.6684 (+0.1678) 0.5466 (+0.0912)
0.3543 3200 0.421 - - - - -
0.3765 3400 0.4317 - - - - -
0.3986 3600 0.4206 - - - - -
0.4208 3800 0.417 - - - - -
0.4429 4000 0.4113 0.7577 (+0.1665) 0.5611 (+0.0207) 0.3973 (+0.0722) 0.6564 (+0.1557) 0.5382 (+0.0829)
0.4651 4200 0.4008 - - - - -
0.4872 4400 0.3884 - - - - -
0.5094 4600 0.4136 - - - - -
0.5315 4800 0.389 - - - - -
0.5536 5000 0.3877 0.7609 (+0.1697) 0.5509 (+0.0104) 0.3878 (+0.0627) 0.6807 (+0.1800) 0.5398 (+0.0844)
0.5758 5200 0.3901 - - - - -
0.5979 5400 0.389 - - - - -
0.6201 5600 0.3999 - - - - -
0.6422 5800 0.3703 - - - - -
0.6644 6000 0.3854 0.7620 (+0.1708) 0.5444 (+0.0039) 0.4040 (+0.0790) 0.6917 (+0.1911) 0.5467 (+0.0913)
0.6865 6200 0.3685 - - - - -
0.7087 6400 0.3751 - - - - -
0.7308 6600 0.3709 - - - - -
0.7530 6800 0.3788 - - - - -
0.7751 7000 0.3734 0.7672 (+0.1760) 0.5404 (+0.0000) 0.4075 (+0.0824) 0.6638 (+0.1632) 0.5372 (+0.0819)
0.7973 7200 0.3629 - - - - -
0.8194 7400 0.3547 - - - - -
0.8415 7600 0.3639 - - - - -
0.8637 7800 0.3597 - - - - -
0.8858 8000 0.3522 0.7676 (+0.1764) 0.5342 (-0.0062) 0.4250 (+0.0999) 0.6652 (+0.1646) 0.5415 (+0.0861)
0.9080 8200 0.327 - - - - -
0.9301 8400 0.344 - - - - -
0.9523 8600 0.3578 - - - - -
0.9744 8800 0.3547 - - - - -
0.9966 9000 0.3491 0.7675 (+0.1763) 0.5423 (+0.0019) 0.4188 (+0.0937) 0.6621 (+0.1614) 0.5411 (+0.0857)
-1 -1 - 0.7676 (+0.1764) 0.5342 (-0.0062) 0.4250 (+0.0999) 0.6652 (+0.1646) 0.5415 (+0.0861)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.0
  • Datasets: 2.21.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
150M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tomaarsen/reranker-ModernBERT-base-gooaq-bce-0margin-3min-100max-5top

Finetuned
(529)
this model

Evaluation results