ModernBERT-base trained on GooAQ

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the ๐Ÿค— Hub
model = CrossEncoder("ayushexel/reranker-ModernBERT-base-gooaq-bce-495000")
# Get scores for pairs of texts
pairs = [
    ['how much coffee is in a mocha frappe from starbucks?', 'The Mocha Frappuccino Light Blended Beverage has slightly more caffeine: Tall - 60 mg. Grande - 95 mg. Venti Iced - 120 mg.'],
    ['how much coffee is in a mocha frappe from starbucks?', "Typically Starbucks Vanilla Bean Frappuccino has no coffee in it at all. I believe it's made with sweetened condensed milk, vanilla bean powder and ice and topped with whipped cream of course! According to the Starbucks menu online their frappuccino has 57 grams of sugar and 59 grams of carbs in one 16 ounce serving!"],
    ['how much coffee is in a mocha frappe from starbucks?', 'It also has 5g of protein and 75mg of caffeine. The Mocha Cookie Crumble Frappucino pours a blend of coffee, milk and ice atop whipped cream and chocolate cookie crumble. The whole thing is topped with a blend of rich mocha sauce and Frappuccino chips.'],
    ['how much coffee is in a mocha frappe from starbucks?', 'There are 460 calories in 1 serving of Starbucks Java Chip Frappuccino Blended Coffee with Whipped Cream (Grande).'],
    ['how much coffee is in a mocha frappe from starbucks?', 'Each 14 fl. oz bottle contains 250 calories, 4.5 grams of fat, 38 grams of sugar, and 150 milligrams of caffeine. So, while certainly not the healthiest option, they are convenient for anyone who loves iced Starbucks lattes. The drink is inspired by the Salted Caramel Mocha served seasonally at Starbucks stores.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'how much coffee is in a mocha frappe from starbucks?',
    [
        'The Mocha Frappuccino Light Blended Beverage has slightly more caffeine: Tall - 60 mg. Grande - 95 mg. Venti Iced - 120 mg.',
        "Typically Starbucks Vanilla Bean Frappuccino has no coffee in it at all. I believe it's made with sweetened condensed milk, vanilla bean powder and ice and topped with whipped cream of course! According to the Starbucks menu online their frappuccino has 57 grams of sugar and 59 grams of carbs in one 16 ounce serving!",
        'It also has 5g of protein and 75mg of caffeine. The Mocha Cookie Crumble Frappucino pours a blend of coffee, milk and ice atop whipped cream and chocolate cookie crumble. The whole thing is topped with a blend of rich mocha sauce and Frappuccino chips.',
        'There are 460 calories in 1 serving of Starbucks Java Chip Frappuccino Blended Coffee with Whipped Cream (Grande).',
        'Each 14 fl. oz bottle contains 250 calories, 4.5 grams of fat, 38 grams of sugar, and 150 milligrams of caffeine. So, while certainly not the healthiest option, they are convenient for anyone who loves iced Starbucks lattes. The drink is inspired by the Salted Caramel Mocha served seasonally at Starbucks stores.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric Value
map 0.5977 (+0.2173)
mrr@10 0.5967 (+0.2262)
ndcg@10 0.6431 (+0.2104)

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.3512 (-0.1384) 0.3739 (+0.1129) 0.3826 (-0.0370)
mrr@10 0.3360 (-0.1415) 0.5320 (+0.0322) 0.3942 (-0.0325)
ndcg@10 0.4135 (-0.1269) 0.4074 (+0.0823) 0.4417 (-0.0590)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.3692 (-0.0208)
mrr@10 0.4207 (-0.0473)
ndcg@10 0.4208 (-0.0345)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,749,365 training samples
  • Columns: question, answer, and label
  • Approximate statistics based on the first 1000 samples:
    question answer label
    type string string int
    details
    • min: 19 characters
    • mean: 42.48 characters
    • max: 86 characters
    • min: 52 characters
    • mean: 249.98 characters
    • max: 382 characters
    • 0: ~81.60%
    • 1: ~18.40%
  • Samples:
    question answer label
    how much coffee is in a mocha frappe from starbucks? The Mocha Frappuccino Light Blended Beverage has slightly more caffeine: Tall - 60 mg. Grande - 95 mg. Venti Iced - 120 mg. 1
    how much coffee is in a mocha frappe from starbucks? Typically Starbucks Vanilla Bean Frappuccino has no coffee in it at all. I believe it's made with sweetened condensed milk, vanilla bean powder and ice and topped with whipped cream of course! According to the Starbucks menu online their frappuccino has 57 grams of sugar and 59 grams of carbs in one 16 ounce serving! 0
    how much coffee is in a mocha frappe from starbucks? It also has 5g of protein and 75mg of caffeine. The Mocha Cookie Crumble Frappucino pours a blend of coffee, milk and ice atop whipped cream and chocolate cookie crumble. The whole thing is topped with a blend of rich mocha sauce and Frappuccino chips. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 12
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 12
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss gooaq-dev_ndcg@10 NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - 0.1284 (-0.3043) 0.0244 (-0.5160) 0.2350 (-0.0900) 0.0158 (-0.4849) 0.0917 (-0.3636)
0.0001 1 1.2609 - - - - -
0.0186 200 1.2031 - - - - -
0.0372 400 1.1489 - - - - -
0.0559 600 0.8525 - - - - -
0.0745 800 0.7113 - - - - -
0.0931 1000 0.676 - - - - -
0.1117 1200 0.6486 - - - - -
0.1304 1400 0.6133 - - - - -
0.1490 1600 0.6005 - - - - -
0.1676 1800 0.5815 - - - - -
0.1862 2000 0.5711 - - - - -
0.2048 2200 0.5572 - - - - -
0.2235 2400 0.5572 - - - - -
0.2421 2600 0.5449 - - - - -
0.2607 2800 0.5342 - - - - -
0.2793 3000 0.5325 - - - - -
0.2980 3200 0.5321 - - - - -
0.3166 3400 0.5182 - - - - -
0.3352 3600 0.5245 - - - - -
0.3538 3800 0.5302 - - - - -
0.3724 4000 0.5095 - - - - -
0.3911 4200 0.5178 - - - - -
0.4097 4400 0.4962 - - - - -
0.4283 4600 0.4988 - - - - -
0.4469 4800 0.4983 - - - - -
0.4655 5000 0.4973 - - - - -
0.4842 5200 0.4876 - - - - -
0.5028 5400 0.4807 - - - - -
0.5214 5600 0.4862 - - - - -
0.5400 5800 0.4784 - - - - -
0.5587 6000 0.4811 - - - - -
0.5773 6200 0.4817 - - - - -
0.5959 6400 0.4706 - - - - -
0.6145 6600 0.4659 - - - - -
0.6331 6800 0.4644 - - - - -
0.6518 7000 0.4764 - - - - -
0.6704 7200 0.4753 - - - - -
0.6890 7400 0.4727 - - - - -
0.7076 7600 0.4693 - - - - -
0.7263 7800 0.4621 - - - - -
0.7449 8000 0.4514 - - - - -
0.7635 8200 0.4561 - - - - -
0.7821 8400 0.4574 - - - - -
0.8007 8600 0.4579 - - - - -
0.8194 8800 0.4478 - - - - -
0.8380 9000 0.4481 - - - - -
0.8566 9200 0.4568 - - - - -
0.8752 9400 0.4455 - - - - -
0.8939 9600 0.4614 - - - - -
0.9125 9800 0.4436 - - - - -
0.9311 10000 0.4482 - - - - -
0.9497 10200 0.4455 - - - - -
0.9683 10400 0.4422 - - - - -
0.9870 10600 0.4506 - - - - -
-1 -1 - 0.6431 (+0.2104) 0.4135 (-0.1269) 0.4074 (+0.0823) 0.4417 (-0.0590) 0.4208 (-0.0345)

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
6
Safetensors
Model size
150M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ayushexel/reranker-ModernBERT-base-gooaq-bce-495000

Finetuned
(529)
this model

Evaluation results