Fine-tuned with Transcripts + Documents v1
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/modernbert-embed-base
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("schaitanya/modernbert-embed-base-transcript-documents-v1")
# Run inference
sentences = [
'What is the recommendation for a water birth if a woman has COVID-19?',
'- Specific medications and antibody treatment.\nIf you are very unwell, your healthcare team may advise that your baby needs to be born early to help with your own treatment and recovery. How and when this may happen will depend on your individual situation.\nIf I have COVID-19, will this affect where I give birth and my choice of pain relief in labour?\nIf you have symptoms and have tested positive for COVID-19 at the time of birth:\nIt is recommended that you give birth in a consultant led maternity unit where you and your baby can be monitored more closely during labour.\nIt is safe for you to have a vaginal birth, and if you and your baby are both well you do not need to have a planned caesarean birth. Your birth choices should be respected and followed as closely as possible.\nA caesarean birth may be recommended if you or your baby are unwell or there are other complications. However, your chance of needing an emergency caesarean birth may be higher than usual.\nAll the usual options for pain relief for labour and birth are available to you, however a water birth is not recommended. This is because it is harder to monitor and give you any treatments needed.\nIf I have COVID-19, will this affect care of my baby after birth?\nIf your baby is well and does not require care in the neonatal unit, you will stay together after you have given birth. Skin-to-skin contact is encouraged.\nHow you feed your baby is dependent on your own circumstances and preferences, and your choices will be supported. Breastfeeding may help pass protection from infections (including COVID-19) to your baby. There is no strong evidence to show that COVID-19 can be passed on in breast milk.',
"(0:00 - 4:09)\nSo what is the color of the cake? Pink or blue? What is the color of the nursery? Pink or blue? And what about the baby's clothes? Are they frilly skirts or the soccer shirts? Have you ever wondered what is it which decides the sex of the baby inside? How that little pea-shaped embryo grows into that little baby girl or a baby boy? Since ages it was the mother who was held responsible for the sex of the baby. But now we know it is the father who decides whether it is going to be a pretty little daughter or a handsome baby boy. So hello everyone, this is Dr. Anjali Kumar once again bringing you greetings from Maitri.\n\nMaitri is a space where we talk anything and everything about women's health. So today we are starting our pregnancy series season 2 with this very question which every parent wants to know. So the baby inherits its genes from both the parents.\n\nThe genes are present in the DNA and the DNA is present in the chromosomes and the chromosomes are present in the nucleus of the cell. Every human cell has 23 pairs of chromosomes. So total 46 chromosomes.\n\nEach pair inherited from one parent. 22 of these pairs are called autosomes. They look the same in both males and females.\n\nThe 23rd pair, the sex chromosomes, that's the special one. It differs between males and females. Females have two copies of X chromosomes which makes it XX, while males have one X and one Y which makes it XY.\n\nSo at the time of fertilization, the father's sperm and the mother's egg each contributes one sex chromosomes to the baby. The mother can contribute only X since it has two copies of X chromosomes only, while the father can contribute either X or Y chromosomes. So the baby's biological or the genetic sex which is male or female is determined by the chromosome which the father contributes.\n\nSo if the father contributes his Y chromosome, it will be a male baby which is XY, while if he contributes the X chromosome, it will be a female baby XX. Baby's sex is determined at the time of fertilization or the conception when the sperm fertilizes the egg. Now this typically happens around day 14 to maybe day 17 in women who have regular cycles.\n\nNow this is the time when you don't even know that you are pregnant. You might not be even expecting a pregnancy. This is the time when the baby's sex is decided.\n\nAfter that, nobody and nothing can ever change the genetic sex of the baby. No medicine, no food, no kada, no jariputi, no exercise can change the sex of the baby afterwards.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.5191 |
cosine_accuracy@3 | 0.7441 |
cosine_accuracy@5 | 0.8 |
cosine_accuracy@10 | 0.8912 |
cosine_precision@1 | 0.5191 |
cosine_precision@3 | 0.248 |
cosine_precision@5 | 0.16 |
cosine_precision@10 | 0.0891 |
cosine_recall@1 | 0.5191 |
cosine_recall@3 | 0.7441 |
cosine_recall@5 | 0.8 |
cosine_recall@10 | 0.8912 |
cosine_ndcg@10 | 0.7051 |
cosine_mrr@10 | 0.6457 |
cosine_map@100 | 0.6503 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 6,116 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 7 tokens
- mean: 15.08 tokens
- max: 33 tokens
- min: 38 tokens
- mean: 389.33 tokens
- max: 683 tokens
- Samples:
anchor positive What lifestyle changes are recommended before pregnancy?
(0:02 - 0:50)
Are you excited to be a father? How do you think you can help your wife or partner in this journey? Would you want to help your wife or partner during labour? Do you know how does the delivery occurs? Have you read something about the delivery in the baby care? So a lot has been written about women and pregnancy, but we do not talk much about the fathers. Is their role only up to providing the sperm to fertilize the egg? Is it all about the moms? So hello everyone, this is Dr. Anjali Kumar, once again bringing greetings from Maitri. Maitri is a space where we talk anything and everything about women's health.
(0:50 - 1:06)
But this time in this episode, we will talk about the fathers. We were not sure when to plan our family. She wanted a baby early and I wanted to wait for a few years.
(1:07 - 1:28)
Plan and talk when you want to plan the pregnancy. Plan well the career, the finances, visit a doctor together for the pre-conceptional checks, tests and the contraceptive ...Does the absence of symptoms indicate an absence of infection?
(0:00 - 0:21)
Very important point. Some people with STDs may not actually have any symptoms. Now this means that the person is a carrier of infection but she is absolutely capable of transmitting the infection to the other person.
So remember absence of symptoms does not mean absence of infection.When does pre-eclampsia usually occur during pregnancy?
What is pre-eclampsia?
Pre-eclampsia is a condition that usually happens after 20 weeks of pregnancy. The exact cause of pre-eclampsia is not understood. It is usually a combination of:
raised blood pressure (hypertension)
protein in your urine (proteinuria).
Sometimes pre-eclampsia can affect your liver, kidneys and blood clotting without protein in your urine.
Pre-eclampsia is common, affecting between 1–5 in 100 women during pregnancy. It is usually mild but in a small number of cases, it can develop into a more serious illness. Around one in 200 women develop severe pre-eclampsia, which can be life-threatening for both you and your baby.
How will I know if I have pre-eclampsia?
Often you will have no symptoms and pre-eclampsia may be diagnosed for the first time at your routine antenatal appointments or during labour when you have your blood pressure checked
and your urine tested.
If you do develop symptoms they usually happen towards the end of your pregnancy but can also happen f... - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 4lr_scheduler_type
: cosinewarmup_ratio
: 0.1log_level
: debugbf16
: Truetf32
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: debuglog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Truelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | cosine_ndcg@10 |
---|---|---|---|
0.4178 | 10 | 5.9152 | - |
0.8355 | 20 | 2.7824 | - |
0.9608 | 23 | - | 0.6781 |
1.2924 | 30 | 1.9575 | - |
1.7102 | 40 | 1.5202 | - |
1.9608 | 46 | - | 0.6943 |
2.1671 | 50 | 1.4008 | - |
2.5849 | 60 | 1.1741 | - |
2.9608 | 69 | - | 0.7031 |
3.0418 | 70 | 1.0995 | - |
3.4595 | 80 | 1.0416 | - |
3.8773 | 90 | 1.1648 | - |
3.9608 | 92 | - | 0.7051 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 3.4.1
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for schaitanya/modernbert-embed-base-transcript-documents-v1
Base model
answerdotai/ModernBERT-base
Finetuned
nomic-ai/modernbert-embed-base
Evaluation results
- Cosine Accuracy@1 on Unknownself-reported0.519
- Cosine Accuracy@3 on Unknownself-reported0.744
- Cosine Accuracy@5 on Unknownself-reported0.800
- Cosine Accuracy@10 on Unknownself-reported0.891
- Cosine Precision@1 on Unknownself-reported0.519
- Cosine Precision@3 on Unknownself-reported0.248
- Cosine Precision@5 on Unknownself-reported0.160
- Cosine Precision@10 on Unknownself-reported0.089
- Cosine Recall@1 on Unknownself-reported0.519
- Cosine Recall@3 on Unknownself-reported0.744