CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.3712 (-0.1184) 0.2849 (+0.0239) 0.4117 (-0.0079)
mrr@10 0.3590 (-0.1185) 0.4289 (-0.0709) 0.4104 (-0.0163)
ndcg@10 0.4330 (-0.1074) 0.2706 (-0.0545) 0.4660 (-0.0347)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.3559 (-0.0341)
mrr@10 0.3994 (-0.0686)
ndcg@10 0.3898 (-0.0655)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 78,704 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.89 characters
    • max: 101 characters
    • min: 3 elements
    • mean: 6.50 elements
    • max: 10 elements
    • min: 3 elements
    • mean: 6.50 elements
    • max: 10 elements
  • Samples:
    query docs labels
    elysia meaning origin ['Meaning of Elysia. Latin-American name. In Latin-American, the name Elysia means-the blessed home.The name Elysia originated as an Latin-American name. The name Elysia is most often used as a girl name or female name. Latin-American Name Meaning-the blessed home. Origin-Latin-America. ', 'The Greek name Elysia means-sweet; blissful. Mythology: Elysium was the dwelling place of happy souls. ', 'Here are pictures of people with the name Elysia. Help us put a face to the name by uploading your pictures to BabyNames.com! ', 'The meaning of Elyssa has more than one different etymologies. It has same or different meanings in other countries and languages. The different meanings of the name Elyssa are: 1 Hebrew meaning: My God is a vow. 2 Greek meaning: My God is a vow. 3 English meaning: My God is a vow.', 'Elysia is a rare given name for women. Elysia is an equally unique last name for all people. (2000 U.S. Census). Displayed below is an analysis of the popularity of the girl name Ely... [1, 0, 0, 0, 0, ...]
    what zone is highgate station ["For the station known from 1907 to 1939 as Highgate, see Archway tube station. Highgate is a London Underground station and former railway station in Archway Road, in the London Borough of Haringey in north London. The station takes its name from nearby Highgate Village. It is on the High Barnet branch of the Northern line, between Archway and East Finchley stations and is in Travelcard Zone 3. The station was originally opened in 1867 as part of the Great Northern Railway 's line between Finsbury Park and Edgware stations. Highgate station was originally constructed by the Edgware, Highgate and London Railway in the 1860s on its line from Finsbury Park station to Edgware station.", "At the time of the station's construction the first cable car in Europe operated non-stop up Highgate Hill to the village from outside the Archway Tavern, and this name was also considered for the station. It is located underneath the Archway Tower, at the intersection of Holloway Road, Highgate Hill, Ju... [1, 0, 0, 0, 0, ...]
    how much does thyroid surgery cost ['1 The price of thyroidectomy depends on the location where the surgery will be performed. 2 The cost can also differ depending on the experience and skill of the physician that will perform the surgery. 3 This is due to the boost in their reputation for the surgeries that they have performed. 1 On average, this procedure can cost anywhere from $16,000 to as much as $65,000 without any type of health insurance. 2 SurgeryCosts.net offers information to people who want to know more about', '1 For example, a one-month supply of the generic anti-thyroid drug methimazole costs about $30-$120, depending on the dose -- or, about $360-$1,440 a year. 2 And a one-month supply of the brand-name drug Tapazole costs about $90-$150 or more, depending on the dose -- or, about $1,080-$1,800 per year. 1 After the thyroid is destroyed by a radioactive iodine treatment or surgically removed, the patient typically needs to take thyroid hormone replacement such as levothyroxine, which typically costs ... [1, 0, 0, 0, 0, ...]
  • Loss: ListMLELoss with these parameters:
    {
        "lambda_weight": null,
        "activation_fct": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16,
        "respect_input_order": true
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 1,000 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 9 characters
    • mean: 33.94 characters
    • max: 99 characters
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    what are some facts penguin enemies ['Penguins are social birds. Many species feed, swim and nest in groups. During the breeding season, some species form large groups, or “rookeries”, that include thousands of penguins. Each penguin has a distinct call, allowing individuals to find their mate and their chicks even in large groups. ', 'Breeding Gentoo Penguin Facts. Gentoo penguins are commonly found to breed across sub-Antarctic islands. Some of the notable colonies include Kerguelen islands, Falkland islands, and South Georgia, with fewer numbers also inhabit in the Heard Islands, Macquarie Islands, Antarctic Peninsula, and South Shetland Islands. How about summarizing some of the most interesting and rarely known gentoo penguin facts such as gentoo penguins habitat, diet, breeding, and predators. The gentoo penguins are simply characterized by the broad white stripe extending like a bonnet across the top of its head', 'Predators
    oral surface definition zoology ['oral. adj. 1. spoken or verbal: an oral agreement. 2. (Medicine) relating to, affecting, or for use in the mouth: an oral thermometer. 3. (Zoology) of or relating to the surface of an animal, such as a jellyfish, on which the mouth is situated. 4. (Medicine) denoting a drug to be taken by mouth Compare parenteral: an oral contraceptive.', 'In a medusa, the oral surface and tentacles face downward. The body of a medusa is typically bell-shaped or umbrella-shaped, and medusae are free-swimming. In a typical medusa, the margins of the bell extend to form a shelf called the velum, which partially closes the open side of the bell.', "Definition of ORAL ARM. : one of the prolongations of the distal end of the manubrium of a jellyfish. ADVERTISEMENT. This word doesn't usually appear in our free dictionary, but the definition from our premium Unabridged Dictionary is offered here on a limited basis.", 'See also occlusal surface. labial surface the vestibular surface of the incisors and canin... [1, 0, 0, 0, 0, ...]
    what year was the protect act enacted ["The PROTECT Act of 2003 (Pub.L. 108–21, 117 Stat. 650, S. 151, enacted April 30, 2003) is a United States law with the stated intent of preventing child abuse. PROTECT is a backronym which stands for P rosecutorial R emedies and O ther T ools to end the E xploitation of C hildren T oday. The Department of Justice appealed the Eleventh Circuit's ruling to the U.S. Supreme Court. The Supreme Court reversed the Eleventh Circuit's ruling in May 2008 and upheld this portion of the act.", 'Copyright Renewal Act of 1992, title I of the Copyright Amendments Act of 1992, Pub. L. No. 102-307, 106 Stat. 264 (amending chapter 3, title 17 of the United States Code, by providing for automatic renewal of copyright for works copyrighted between January 1, 1964, and December 31, 1977), enacted June 26, 1992. [Amendments to the Semiconductor Chip Protection Act of 1984], Pub. L. No. 100-159, 101 Stat. 899 (amending chapter 9, title 17, United States Code, regarding protection extended to semiconducto... [1, 0, 0, 0, 0, ...]
  • Loss: ListMLELoss with these parameters:
    {
        "lambda_weight": null,
        "activation_fct": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16,
        "respect_input_order": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0536 (-0.4868) 0.3415 (+0.0165) 0.0633 (-0.4373) 0.1528 (-0.3026)
0.0002 1 15.0387 - - - - -
0.0508 250 13.8424 - - - - -
0.1016 500 12.2432 12.1961 0.0338 (-0.5066) 0.3357 (+0.0107) 0.0687 (-0.4319) 0.1461 (-0.3093)
0.1525 750 12.2166 - - - - -
0.2033 1000 12.1697 12.1567 0.0286 (-0.5118) 0.3049 (-0.0202) 0.0311 (-0.4696) 0.1215 (-0.3339)
0.2541 1250 12.1288 - - - - -
0.3049 1500 12.1364 12.1497 0.0389 (-0.5015) 0.2523 (-0.0727) 0.0284 (-0.4722) 0.1065 (-0.3488)
0.3558 1750 12.1556 - - - - -
0.4066 2000 12.134 12.1342 0.1969 (-0.3435) 0.2295 (-0.0955) 0.2666 (-0.2340) 0.2310 (-0.2244)
0.4574 2250 12.1346 - - - - -
0.5082 2500 12.0789 12.1369 0.2381 (-0.3023) 0.2086 (-0.1164) 0.3112 (-0.1895) 0.2526 (-0.2027)
0.5591 2750 12.1796 - - - - -
0.6099 3000 12.122 12.1233 0.2978 (-0.2426) 0.2211 (-0.1039) 0.3967 (-0.1039) 0.3052 (-0.1501)
0.6607 3250 12.1834 - - - - -
0.7115 3500 12.11 12.1241 0.3919 (-0.1486) 0.2391 (-0.0860) 0.4388 (-0.0619) 0.3566 (-0.0988)
0.7624 3750 12.1394 - - - - -
0.8132 4000 12.0582 12.1232 0.4330 (-0.1074) 0.2706 (-0.0545) 0.4660 (-0.0347) 0.3898 (-0.0655)
0.8640 4250 12.152 - - - - -
0.9148 4500 12.0818 12.1178 0.4173 (-0.1232) 0.2749 (-0.0502) 0.4767 (-0.0240) 0.3896 (-0.0658)
0.9656 4750 12.1172 - - - - -
-1 -1 - - 0.4330 (-0.1074) 0.2706 (-0.0545) 0.4660 (-0.0347) 0.3898 (-0.0655)
  • The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.222 kWh
  • Carbon Emitted: 0.086 kg of CO2
  • Hours Used: 0.721 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.1
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListMLELoss

@inproceedings{lan2013position,
    title={Position-aware ListMLE: a sequential learning process for ranking},
    author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
    booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
    pages={333--342},
    year={2013}
}
Downloads last month
8
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle

Finetuned
(89)
this model

Dataset used to train tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle

Evaluation results