CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: microsoft/MiniLM-L12-H384-uncased
- Maximum Sequence Length: 512 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle")
# Get scores for pairs of texts
pairs = [
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'How many calories in an egg',
[
'There are on average between 55 and 80 calories in an egg depending on its size.',
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
'Most of the calories in an egg come from the yellow yolk in the center.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Datasets:
NanoMSMARCO_R100
,NanoNFCorpus_R100
andNanoNQ_R100
- Evaluated with
CrossEncoderRerankingEvaluator
with these parameters:{ "at_k": 10, "always_rerank_positives": true }
Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
---|---|---|---|
map | 0.3712 (-0.1184) | 0.2849 (+0.0239) | 0.4117 (-0.0079) |
mrr@10 | 0.3590 (-0.1185) | 0.4289 (-0.0709) | 0.4104 (-0.0163) |
ndcg@10 | 0.4330 (-0.1074) | 0.2706 (-0.0545) | 0.4660 (-0.0347) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator
with these parameters:{ "dataset_names": [ "msmarco", "nfcorpus", "nq" ], "rerank_k": 100, "at_k": 10, "always_rerank_positives": true }
Metric | Value |
---|---|
map | 0.3559 (-0.0341) |
mrr@10 | 0.3994 (-0.0686) |
ndcg@10 | 0.3898 (-0.0655) |
Training Details
Training Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 78,704 training samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 11 characters
- mean: 33.89 characters
- max: 101 characters
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
- Samples:
query docs labels elysia meaning origin
['Meaning of Elysia. Latin-American name. In Latin-American, the name Elysia means-the blessed home.The name Elysia originated as an Latin-American name. The name Elysia is most often used as a girl name or female name. Latin-American Name Meaning-the blessed home. Origin-Latin-America. ', 'The Greek name Elysia means-sweet; blissful. Mythology: Elysium was the dwelling place of happy souls. ', 'Here are pictures of people with the name Elysia. Help us put a face to the name by uploading your pictures to BabyNames.com! ', 'The meaning of Elyssa has more than one different etymologies. It has same or different meanings in other countries and languages. The different meanings of the name Elyssa are: 1 Hebrew meaning: My God is a vow. 2 Greek meaning: My God is a vow. 3 English meaning: My God is a vow.', 'Elysia is a rare given name for women. Elysia is an equally unique last name for all people. (2000 U.S. Census). Displayed below is an analysis of the popularity of the girl name Ely...
[1, 0, 0, 0, 0, ...]
what zone is highgate station
["For the station known from 1907 to 1939 as Highgate, see Archway tube station. Highgate is a London Underground station and former railway station in Archway Road, in the London Borough of Haringey in north London. The station takes its name from nearby Highgate Village. It is on the High Barnet branch of the Northern line, between Archway and East Finchley stations and is in Travelcard Zone 3. The station was originally opened in 1867 as part of the Great Northern Railway 's line between Finsbury Park and Edgware stations. Highgate station was originally constructed by the Edgware, Highgate and London Railway in the 1860s on its line from Finsbury Park station to Edgware station.", "At the time of the station's construction the first cable car in Europe operated non-stop up Highgate Hill to the village from outside the Archway Tavern, and this name was also considered for the station. It is located underneath the Archway Tower, at the intersection of Holloway Road, Highgate Hill, Ju...
[1, 0, 0, 0, 0, ...]
how much does thyroid surgery cost
['1 The price of thyroidectomy depends on the location where the surgery will be performed. 2 The cost can also differ depending on the experience and skill of the physician that will perform the surgery. 3 This is due to the boost in their reputation for the surgeries that they have performed. 1 On average, this procedure can cost anywhere from $16,000 to as much as $65,000 without any type of health insurance. 2 SurgeryCosts.net offers information to people who want to know more about', '1 For example, a one-month supply of the generic anti-thyroid drug methimazole costs about $30-$120, depending on the dose -- or, about $360-$1,440 a year. 2 And a one-month supply of the brand-name drug Tapazole costs about $90-$150 or more, depending on the dose -- or, about $1,080-$1,800 per year. 1 After the thyroid is destroyed by a radioactive iodine treatment or surgically removed, the patient typically needs to take thyroid hormone replacement such as levothyroxine, which typically costs ...
[1, 0, 0, 0, 0, ...]
- Loss:
ListMLELoss
with these parameters:{ "lambda_weight": null, "activation_fct": "torch.nn.modules.linear.Identity", "mini_batch_size": 16, "respect_input_order": true }
Evaluation Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 1,000 evaluation samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 9 characters
- mean: 33.94 characters
- max: 99 characters
- min: 2 elements
- mean: 6.00 elements
- max: 10 elements
- min: 2 elements
- mean: 6.00 elements
- max: 10 elements
- Samples:
query docs labels what are some facts penguin enemies
['Penguins are social birds. Many species feed, swim and nest in groups. During the breeding season, some species form large groups, or “rookeries”, that include thousands of penguins. Each penguin has a distinct call, allowing individuals to find their mate and their chicks even in large groups. ', 'Breeding
Gentoo Penguin Facts. Gentoo penguins are commonly found to breed across sub-Antarctic islands. Some of the notable colonies include Kerguelen islands, Falkland islands, and South Georgia, with fewer numbers also inhabit in the Heard Islands, Macquarie Islands, Antarctic Peninsula, and South Shetland Islands. How about summarizing some of the most interesting and rarely known gentoo penguin facts such as gentoo penguins habitat, diet, breeding, and predators. The gentoo penguins are simply characterized by the broad white stripe extending like a bonnet across the top of its head', 'Predators oral surface definition zoology
['oral. adj. 1. spoken or verbal: an oral agreement. 2. (Medicine) relating to, affecting, or for use in the mouth: an oral thermometer. 3. (Zoology) of or relating to the surface of an animal, such as a jellyfish, on which the mouth is situated. 4. (Medicine) denoting a drug to be taken by mouth Compare parenteral: an oral contraceptive.', 'In a medusa, the oral surface and tentacles face downward. The body of a medusa is typically bell-shaped or umbrella-shaped, and medusae are free-swimming. In a typical medusa, the margins of the bell extend to form a shelf called the velum, which partially closes the open side of the bell.', "Definition of ORAL ARM. : one of the prolongations of the distal end of the manubrium of a jellyfish. ADVERTISEMENT. This word doesn't usually appear in our free dictionary, but the definition from our premium Unabridged Dictionary is offered here on a limited basis.", 'See also occlusal surface. labial surface the vestibular surface of the incisors and canin...
[1, 0, 0, 0, 0, ...]
what year was the protect act enacted
["The PROTECT Act of 2003 (Pub.L. 108–21, 117 Stat. 650, S. 151, enacted April 30, 2003) is a United States law with the stated intent of preventing child abuse. PROTECT is a backronym which stands for P rosecutorial R emedies and O ther T ools to end the E xploitation of C hildren T oday. The Department of Justice appealed the Eleventh Circuit's ruling to the U.S. Supreme Court. The Supreme Court reversed the Eleventh Circuit's ruling in May 2008 and upheld this portion of the act.", 'Copyright Renewal Act of 1992, title I of the Copyright Amendments Act of 1992, Pub. L. No. 102-307, 106 Stat. 264 (amending chapter 3, title 17 of the United States Code, by providing for automatic renewal of copyright for works copyrighted between January 1, 1964, and December 31, 1977), enacted June 26, 1992. [Amendments to the Semiconductor Chip Protection Act of 1984], Pub. L. No. 100-159, 101 Stat. 899 (amending chapter 9, title 17, United States Code, regarding protection extended to semiconducto...
[1, 0, 0, 0, 0, ...]
- Loss:
ListMLELoss
with these parameters:{ "lambda_weight": null, "activation_fct": "torch.nn.modules.linear.Identity", "mini_batch_size": 16, "respect_input_order": true }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1seed
: 12bf16
: Trueload_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
---|---|---|---|---|---|---|---|
-1 | -1 | - | - | 0.0536 (-0.4868) | 0.3415 (+0.0165) | 0.0633 (-0.4373) | 0.1528 (-0.3026) |
0.0002 | 1 | 15.0387 | - | - | - | - | - |
0.0508 | 250 | 13.8424 | - | - | - | - | - |
0.1016 | 500 | 12.2432 | 12.1961 | 0.0338 (-0.5066) | 0.3357 (+0.0107) | 0.0687 (-0.4319) | 0.1461 (-0.3093) |
0.1525 | 750 | 12.2166 | - | - | - | - | - |
0.2033 | 1000 | 12.1697 | 12.1567 | 0.0286 (-0.5118) | 0.3049 (-0.0202) | 0.0311 (-0.4696) | 0.1215 (-0.3339) |
0.2541 | 1250 | 12.1288 | - | - | - | - | - |
0.3049 | 1500 | 12.1364 | 12.1497 | 0.0389 (-0.5015) | 0.2523 (-0.0727) | 0.0284 (-0.4722) | 0.1065 (-0.3488) |
0.3558 | 1750 | 12.1556 | - | - | - | - | - |
0.4066 | 2000 | 12.134 | 12.1342 | 0.1969 (-0.3435) | 0.2295 (-0.0955) | 0.2666 (-0.2340) | 0.2310 (-0.2244) |
0.4574 | 2250 | 12.1346 | - | - | - | - | - |
0.5082 | 2500 | 12.0789 | 12.1369 | 0.2381 (-0.3023) | 0.2086 (-0.1164) | 0.3112 (-0.1895) | 0.2526 (-0.2027) |
0.5591 | 2750 | 12.1796 | - | - | - | - | - |
0.6099 | 3000 | 12.122 | 12.1233 | 0.2978 (-0.2426) | 0.2211 (-0.1039) | 0.3967 (-0.1039) | 0.3052 (-0.1501) |
0.6607 | 3250 | 12.1834 | - | - | - | - | - |
0.7115 | 3500 | 12.11 | 12.1241 | 0.3919 (-0.1486) | 0.2391 (-0.0860) | 0.4388 (-0.0619) | 0.3566 (-0.0988) |
0.7624 | 3750 | 12.1394 | - | - | - | - | - |
0.8132 | 4000 | 12.0582 | 12.1232 | 0.4330 (-0.1074) | 0.2706 (-0.0545) | 0.4660 (-0.0347) | 0.3898 (-0.0655) |
0.8640 | 4250 | 12.152 | - | - | - | - | - |
0.9148 | 4500 | 12.0818 | 12.1178 | 0.4173 (-0.1232) | 0.2749 (-0.0502) | 0.4767 (-0.0240) | 0.3896 (-0.0658) |
0.9656 | 4750 | 12.1172 | - | - | - | - | - |
-1 | -1 | - | - | 0.4330 (-0.1074) | 0.2706 (-0.0545) | 0.4660 (-0.0347) | 0.3898 (-0.0655) |
- The bold row denotes the saved checkpoint.
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.222 kWh
- Carbon Emitted: 0.086 kg of CO2
- Hours Used: 0.721 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.1
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
ListMLELoss
@inproceedings{lan2013position,
title={Position-aware ListMLE: a sequential learning process for ranking},
author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
pages={333--342},
year={2013}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle
Base model
microsoft/MiniLM-L12-H384-uncasedDataset used to train tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle
Evaluation results
- Map on NanoMSMARCO R100self-reported0.371
- Mrr@10 on NanoMSMARCO R100self-reported0.359
- Ndcg@10 on NanoMSMARCO R100self-reported0.433
- Map on NanoNFCorpus R100self-reported0.285
- Mrr@10 on NanoNFCorpus R100self-reported0.429
- Ndcg@10 on NanoNFCorpus R100self-reported0.271
- Map on NanoNQ R100self-reported0.412
- Mrr@10 on NanoNQ R100self-reported0.410
- Ndcg@10 on NanoNQ R100self-reported0.466
- Map on NanoBEIR R100 meanself-reported0.356