SentenceTransformer based on BAAI/bge-m3
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-m3
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("mrhimanshu/finetuned-bge-m3")
# Run inference
sentences = [
'What are the features of TMLite?',
'##\u200cAbout this Document\u200c\n\nPurposeIntended AudienceSupplied Documentation\n\n\n\n##\u200cPurpose\u200c\n\nThe TMLite Features User Guide is designed for E500 release NLA 8.8.0 supporting TMLite, a TM500 platform and its features.\nNOTE: All beta and non-commercial releases should be uninstalled before installing this release.\n\n\n\n##\u200cIntended Audience\u200c\n\nThis document is intended to be used by Development Engineers and Project/ Engineering Management.\n\n\n\n##\u200cSupplied Documentation\u200c\n\nFor the complete list of supplied documentation, refer to the Supported Documents section in the E500 Release Notes NLA 8_8_0_45000-533.\n\n\n\n##\u200c1 Introduction\u200c\n\n1.1 Hardware Compatibility1.1.1 Supported Hardware Platforms\n\n\n\n##\u200c1.1 Hardware Compatibility\u200c\n\n1.1.1 Supported Hardware Platforms\n\n\n\u200c1.1.1 Supported Hardware Platforms\u200c\nFor TM500 benchtop systems:\n\n Benchtop Hardware Configuration/ Number of Radio Cards| Max Number of 2x2\nCells| Max Number of 4x4 Cells| Access Concentrator (ACE) Required \n---|---|---|--- \nTMLite (TK1071)| 1*| 1| No \nTMLite2 (HW721)| 1~| 1~| No \n*Also supports upto 2 cells, but with a maximum BW of 20 MHz per cell.^Also supports upto 4 4x4 cells when 4x4-HD and PRC Spectrum Sharing (Channelisation) are used.^^ Also supports upto 8 2x2 FDD 20 MHz cells when 2x2-DD and PRC Spectrum Sharing (Channelisation) are used.** Either one NR cell plus one LTE cell per server or upto two NR cells per server~ NR Cells (LB and MB) only.\n\n\n\n\n\n##\u200c2 TMLite (Single Server Systems)\u200c\n\n\n Released in| NLA_4_58_0 \n---|--- \nLicense required| TK1071 \nFML status| FML1 in NLA_4_58_0 \nSupported platform| HW704 that supports 1CC 2x2 and 1CC 4x4 NR SA under\nlicense bundle TK5510 NOTE: Interconnect to Mk3 TM500 benchtop is not\napplicable on HW704. \nDescription| TMLite, a new TM500 platform, utilizes a single server and each\nserver supports only one NR cell. The NR cell can support the following\naspects:\uf0b7 FR1 (Numerology 0 or 1) or FR2 (Numerology 3)\uf0b7 Upto 32 UEs\uf0b7 2x2 UL\nand DL MIMO\uf0b7 4x4 DL MIMO for Numerologies 0 and 1 only\uf0b7 K1 and K2 values:o For\n2x2 SA (NR only cell):\uf0d8 Numerology 0: K1 >= 1 and K2 >= 1\uf0d8 Numerology 1: K1 >=\n1 and K2 >= 2\uf0d8 Numerology 3: K1 >= 2 and K2 >= 3o For 4x4, SA and all NSA\npermutations using interconnect to Mk3 TM500 Benchtop (4x4 and 2x2):\uf0d8 Single\nUE testing supports the same K1 and K2 values as 2x2 SA above\uf0d8 32 UE testing\nrequires K1 >= 4 and K2 > = 4NOTE: TMLite can be connected to a Mk3 TM500\nbenchtop system to facilitate NSA (ENDC) testing and NSA 4x4 DL MIMO for\nNumerology 3 connected to a Mk3 TM500 benchtop is not supported. \nLimitations| This feature does not support the following aspects:\uf0b7 4x4 DL MIMO\nfor Numerology\uf0b7 UL DST-S-OFDM\uf0b7 Fast fading\uf0b7 Flexible PDCCH monitoring\n\n\n2.1 NSA on a TMLite2.2 2CC 2x2 NR Cells on TMLite2.3 5G O-RAN FH NR SA 1CC 4x4 on TMLite\n\n\n\n##\u200c2.1 NSA on a TMLite\u200c\n\n\n Description| TMLite supports 1 LTE cell (AoS Architecture) and 1 NR cell with\nthe following configurations:\uf0b7 Upto 20MHz bandwidth per carrier (i.e., max NR\ncarrier bandwidth is 20MHz)\uf0b7 Upto 2 layers in DL and 2 layers in UL\uf0b7 Upto 32\nUEs\uf0b7 Numerologies 0 (LB) and 1 (MB) on NR carrier\uf0b7 SA support on the single NR\ncell\uf0b7 NSA support using both carriers (LTE anchor provided by AoS cell) \n---|--- \n|\n\n\n\n Limitations| This feature does not support the following aspects:\uf0b7 4x4 DL MIMO\nfor Numerology\uf0b7 UL DST-S-OFDM\uf0b7 Fast fading\uf0b7 Flexible PDCCH monitoring \n---|---\n\n\n\n\n\n##\u200c2.2 2CC 2x2 NR Cells on TMLite\u200c\n\n\n Description| TMLite supports two NR cells with the following configurations:\uf0b7\nUpto 20MHz bandwidth per carrier (i.e., maximum NR carrier banwidth is 20MHz)\uf0b7\nUpto 2 layers in DL and 2 layers in UL\uf0b7 Upto 32 UEs\uf0b7 2CC DL and UL CA\uf0b7\nNumerologies 0 (LB) and 1 (MB) on NR carrierNOTE: Both NR carriers must be the\nsame Numerology, either 2 Numerology 0 carriers or 2 Numerology 1 carriers. \n---|--- \nLimitations| This feature does not support the following aspects:\uf0b7 4x4 DL MIMO\nfor Numerology\uf0b7 UL DST-S-OFDM\uf0b7 Fast fading\uf0b7 Flexible PDCCH monitoring\n\n\n\n\n\n##\u200c2.3 5G O-RAN FH NR SA 1CC 4x4 on TMLite\u200c\n\n\n Released in| NLA_5_16_0 \n---|--- \nLicense required| TK5522 \nFML status| FML1 in NLA_5_16_0 \nSupported platform| E500-AS2, such as HW700 (O-RAN only) or HW704 (ORAN/RF) \nDescription| This feature provides support for O-RAN FH in following aspects:\uf0b7\nSA 1CC 2x2 as well as 4x4\uf0b7 NR FR1, TDD/FDD, and SU-MIMO \nLimitations| This feature does not support the following aspects:\uf0b7 NB restart\nwith ORAN\uf0b7 2CC or higher configurations\uf0b7 More than 32 UEs\n\n\n\n\n\n##\u200c3 Soft UE_TMLite2\u200c\n\n3.1 TMLite2 256 UE O-RAN Bundle (4CC – 256 UE)3.2 TM500 5G O-RAN FH OPT 7-2 – Dedicated Port Per RU Link3.3 Supported Soft UE Licenses\n\n\n\n##\u200c3.1 TMLite2 256 UE O-RAN Bundle (4CC – 256 UE)\u200c\n\n\n Released in| NLA_6_18_0 \n---|--- \nLicense required| TK5541 \nFML status| FML1 in NLA_6_18_0 \nSupported platform| E500-AS2 (HW721) platform \nDescription| This feature supports the following functionalities:\uf0b7 ORAN and\nRF\uf0b7 CONF SYSTEM SWUE ON is needed before STRT\uf0b7 For 2x2 MB and 4x4 MB (15kHz\nand 30kHz), 100MHz BW:o MCS 27o 256 QAMo 256 UEs per system and 256 UEs per\nCCo Maximum 2CC on one servero Maximum 4CC on two servers using InfiniBand \nLimitations| This feature has the following limitations:\uf0b7 Excludes MU-MIMO\uf0b7\nBenign PDCCH configuration (maximum number of decodes 2000)\uf0b7 The SE per slot\nis limited to 32, but only 8 PUSCH per slot per HWI.\uf0b7 minK values are one more\nthan AS2 single server solution\uf0b7 DL decoding capability is 3~4dB worse than\nAS2 single server\uf0b7 K1 and K2 values:o For Numerology 0, minK1 = 2 and minK2 =\n2o For Numerology 1, minK1 = 3 and minK2 = 3\n\n\n\n\n\n##\u200c3.2 TM500 5G O-RAN FH OPT 7-2 – Dedicated Port Per RU Link\u200c\n\n\n Released in| NLA_8_4_0 \n---|--- \nLicense required| TK2222 \nFML status| FML1 in NLA_8_4_0 \nSupported platform| E500-AS2 (HW720/ HW721) platform \nDescription| This feature supports the following functionalities:\uf0b7 eCPRIlink\non dedicated physcial port per carrier/ RU for SoftUEo Dedicated NIC\n(Mellanox) for eCPRI with 2x port\uf0b7 Dedicated NIC (Mellanox) for RDA\n(RTRTK1167)\n\n\n\n | NOTE: SoftUE can support 2CCs with 2RUs on single server with each RU a\ndedicated port. \n---|---\n\n\n\n< Previous | Contents\n\n##\u200c3.3 Supported Soft UE Licenses\u200c',
'High Level Tech Spec\nAS2 spec can further scale with optimizations (for example beyond 36 CC, 24 MU-MIMO layers, More RU etc)\nData rates can be optionally extended to beyond 25Gbps with NIC UG\nviavisolutions.com\nCommon SW & Tools across deployments allows seamless transition\nviavisolutions.com\n© 2022 VIAVI Solutions Inc. 7\nE500-AS2 Development Themes & Examples\nFor discussion purposes only\nviavisolutions.com\nVIAVI//Restricted\n© 2022 VIAVI Solutions Inc.\nProprietary and Confidential\nSystem Throughput\nCA & MIMO\nDensity Increase\nO-RU\n16-24 Layers\nLatency Decrease\nE500 AS2\nURLLC\n# UEs\nPerforman ce\nAdditional FR2 UEs\n24 Layer MU-MIMO\nMulti Pcell per CA Group\nIncreased Tx\n/UEs per Slot\nSystem Level LTE/NR UEs\nIncreased # RedCap UEs\nTime Sensitive Logging\n36-48\nCarriers\n12 CC CA\n50Gbps\n25Gbps\nExtending the functionality and performance of the current systems\nDeployment Agnostic Software Solution\nOne Application Software across multiple deployments\nCommon SW deployed onto whatever platform is needed to achieve the test\nSame RANtoCoreTM Application Software: Features, Automation…\nTM500\nTM500 AS2\nRack\nRack\nx86\nSW architecture supports private cloud deployment\nAccess from anywhere, anytime, scale when needed etc\nSoft\nMk4.1, MK4.3 CLA/AS\nE500 Mk4.1, MK4.3 CLA/HYB/AS\nE500 AS2\nSoft UE\nUSEoft\nUSEoft\nUSEoft\nUCSEoonfttai\nnUeErs\nCloud Performance & Scale\nFuture-Proof Roadmap & Performance\nLeading roadmap commitment on all deployment models\nRel-16, Rel-17, Rel-18…\nPerformance optimizations on all deployments\nTest Features: e.g. channel models evolved to support Rel-16 indoors scenarios\nAS2 with evolved solution architecture pushes the performance envelope\nAt least double #CC, #Layers, for high process demanding numerologies\nHigher system throughput\nIncrease in supported aggregated bandwidth and combinations\nLower latency while still maintaining high #UE & #SE\nMore users (up to 12 users per cabinet)\nHigher spec servers in AS2 vs AS\nEnabling support for higher-end use cases in the future\nGreener\nEvolve E500 solution, AS2, has reduced power consumption/per bit /per throughput vs existing deployments\nAround 34% reduced power rating per carrier as compared to E500 Mk4.3 SVR(AS).\nAdditionally, we are working on developing software-oriented E500 power-saving solutions for all the existing install base.\nSaving is achieved when servers in the E500 partition are in an idle state.\nServers that are not selected into an E500 ASC group will go to (or remain in) a ‘sleep’ mode (low power states).\nInitial in-house measurement shows ~10% power saving at an E500 Mk4.3 SVR(AS) in an idle state after enabling this solution.\nQuestions\nWill the AS2 support more Blind Decodes than AS?\no How much more?\no Increased Tx /UEs per Slot\n► Blind Decodes should no longer be an issue on any of the 5G Systems. So there are no plans to increase this further unless absolutely necessary\n► The Tx\\UEs per Slot is already much higher than Ericsson requires but with AS2 we aim to increase this further without impacting K Values\nAS2 will support “Rel-16, Rel-17, Rel-18…”, what about AS?\n► Please refer to the main VIAVI message on the roadmap – Exec Summary slide from the main deck.\nHow do we explain “Future Proof” (AS was also “Future Proof”)?\n► The AS system still remains future proof. 3GPP features and roadmap is developed on the AS systems. Keeping in mind that AS technology is equal to MK4.3 this is already 6 years and keeps on delivering.\n► Same as above\nSize/footprint aspects\n► ?\nExec Summary\nVIAVI is committed to\nSupport the 3GPP feature roadmap on all the system variations.\nProvide common software across all the hardware deployments.\nPreserve and re-use existing investment by offering upgrade paths to uplift the performance further.\nProvide evolved solutions to support additional performance needs.\nMore clarification regarding “Secure product”\nWill we keep up with new BIOS versions from DELL\n► The current process is not different for AS2. So this is a general H/W question for all the available systems that have servers.\nUpgrade from AS to AS2?\n► Radio card can be re-used between the 2 systems. Different approaches will have to be explored as long as they are commercially viable.\nCan AS2 be rack mounted, i.e. no cabinet?\n► Yes. Currently not available but we are working to release the ability of having a rack mounted system\n36 CC 4x4\no Any limitation on BW when using 36 CC?\n► This is in feasibility and design phase. Most likely some of the Carriers will have to be reduced BW but currently TBD. Further communications to be follow once feasibility work is completed\nData rates, is it correct that if DL only can achieve 25 Gbps?\n► The current NIC installed in the system is 25Gbps. There is however an additional slot available to add an additional NIC taking the overall TP to 50Gbps.\nVIAVI//Restricted\nOther topics\nList of selling points and benefits with AS2 -> See next slide\no Success factors from other customers\nAS2 Configuration Options\no Can the old RDAs be reused -> Open for discussion\no What are the use cases / justifications for Standalone AS2 versus AS -> Covered through the slides\nDo the TK3500 include both RDA and Core-Em?\no How to sell with or without Core-Em -> The H/W is Core-Em ready without any additional cost to Ericsson. Enabling the feature is only a s/w license (optional TK)\nURLLC\no Time Sensitive logging -> As part of TSN the TM500 will have to improve the logging performance in order to provide higher granularity.\nSelling Points for AS2\nIncreased Capacity on a Cabinet Level\nOn a Cabinet we increase from 12CC to 24CC. This is possible due to the higher performance servers used (additional MIPS) and S/W extensions\nReduced Power Consumption per Carrier\nLook at power values in the Matrix\nAbility to support large configs under Single user\n12CC 4x4 Single User vs 3 Users on AS (2 users on AS requires additional H/W)\nScaling on Aggregated Layers for MU-MIMO and CA\nAS: Up to 8 Carriers or 16Layers (with some exceptions on the layers for FR1 only)\nAS2: 12 Carriers or 24 Layers for FR1 and FR2\nAdditional Users\n3 users on AS vs 12 users on AS2 -> Provides better mapping of the H/W to different types of deployments and use cases.\nWhat is High Density and Double Density?\nThe Radio Card available in the E500-AS system supports 200MHz of Carrier BW.\nE500 can utilise this additional BW by allowing 2 contiguous(in frequency) Carriers to be connected on the same Radio – also referred to as Channelization of Carriers\nThis allows to double the amount of Carriers connected to each Radio Card\nFor 4x4 Carriers all the antenna ports are shared between 2 Carriers.\nFor 2x2 there are 2 sets of 2 Carriers in each Radio -> 4 Carriers 2x2 per PRC\nFE SERVER\n20 MHz 20 MHz 5 MHz 10 MHz\n20 MHz\n20 MHz\nCentre Freq.\n20 MHz\n20 MHz\nFES\n200 MHz 200 MHz\nWhat is High Density and Double Density? (cont)\nIn order to get the maximum benefit of this Radio capability, the BaseBand requirements should also be condensed\nFor “High Density” a 4x4 Carrier can be hosted by a single BBS (instead of 2)\n4 BBS + 2 Radios can support 2 Carriers 4x4 in a standard config\nWith 4x4 HD the same H/W can support 4 Carriers 4x4\nWhile for “Double Density” 2 Carriers 2x2 (with reduced BW) can share the same BBS\n4 BBS + 2 Radios can support 4 Carriers 2x2 in a standard config\nWith DD the same H/W can support 8 Carriers 2x2\nTM500 Double Density (DD)\nIn the standard E500 architecture, one baseband server (BBS) provides for 1CC 2x2\nDouble Density provides improved hardware utilization, enabling less hardware for smaller systems, applicable primarily for NR\nDouble Density:\nprovides for 2CC 2x2 20MHz per BBS (up to 4CC per PRC)\nis applicable on a per-user basis – max 8CC 2x2 per user\nis applicable to NR\nExample benefits – improved hardware utilization:\n2 BBS can support 4CC\n4 BBS will support 8CC\nAdditional CC for existing hardware\nSpectrum sharing capability is a pre-requisite\nStandard Config: 1 HL Server, 4 BB Servers\n\n PRC 0| | PRC 1 \n---|---|---\n\n\n\n HL SERVER \n--- \nBB SERVER 0 \nBB SERVER 1 \nBB SERVER 2 \nBB SERVER 3 \nPRC 0 PRC 1FE SERVER 0',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 18,185 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 6 tokens
- mean: 18.97 tokens
- max: 75 tokens
- min: 189 tokens
- mean: 1923.07 tokens
- max: 2730 tokens
- Samples:
sentence_0 sentence_1 What is the purpose of SETP L0_UL_CTRL_NR_PRACH_POWER_DENSITY_MODE?
##A.19 RRC_NR_ACQUIRE_SIB1
This SETP is now obsolete.
This SETP enables/disables the acquisition of SIB1 in NR. To disable the SIB acquisition when required in case of ENDC tests, this command is used with value 0.
SETP RRC_NR_ACQUIRE_SIB1
N = 0 (disable) or 1 (enable). Default = 1
##A.20 L2_MAC_NR_ENABLE_HARQ_CONTENTION_CHECK
This SETP enables/disables the HARQ contention resolution check for MSG4 in NR SA.
SETP L2_MAC_NR_ENABLE_HARQ_CONTENTION_CHECK
N = 0 (disable) or 1 (enable). Default = 1
This SETP command is recommended to be used with value 0 if the K1 value for MSG4 in NR SA is set less than the default value of 4. This would configure the system to bypass the HARQ contention resolution check for MSG4.
##A.21 L0_UL_CTRL_NR_ENABLE_UL_POWER_CONTROL
This SETP turns on the 3GPP based Open Loop Power Control functionality.
SETP L0_UL_CTRL_NR_ENABLE_UL_POWER_CONTROL
N = 0 (disable) or 1 (enable). Default = 1
##A.22 NR_UL_SRP_SRS_PWR_SCALE
This SETP ...How does the compressor generate IRs and non-IRs in the combined radio context?
Parameter name
What are the safety measures for handling and disposing of the E500 42U 5G Network Tester at the end of its life cycle?
The E500 42U 5G Network Tester is designed to meet the product safety standard BS/EN
61010-1:2010, and is engineered with ease of use and safety as prime considerations. When installing and operating the E500 42U 5G Network Tester you must follow the warnings, cautions and recommendations shown in this document.
How safety information is shownWarningsCautionsHazard and information symbolsGeneral conditions of useWarnings and precautionsHeavy InstrumentElectrical hazards (AC supply voltage)InstallationLaser radiationInitial visual inspectionGrounding the systemTransportationStabilityToxic hazardsRF hazardNoise hazardSuitability for useDamaged CablesIsolating EquipmentRemoving and installing components in a serverAvoiding damage to the equipmentLoose cablesMechanical protectionGeneral RF connector careConnecting and torqueing SMA connectorsConnecting and torqueing N connectorsElectrostatic discharge (ESD)VentilationPowering down the cabinetCleaningRoutine safety testing and inspectionR... - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 4per_device_eval_batch_size
: 4num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 4per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.1100 | 500 | 0.236 |
0.2199 | 1000 | 0.1539 |
0.3299 | 1500 | 0.1362 |
0.4399 | 2000 | 0.1212 |
0.5498 | 2500 | 0.1094 |
0.6598 | 3000 | 0.0963 |
0.7697 | 3500 | 0.0963 |
0.8797 | 4000 | 0.126 |
0.9897 | 4500 | 0.1049 |
Framework Versions
- Python: 3.11.7
- Sentence Transformers: 3.4.1
- Transformers: 4.49.0
- PyTorch: 2.1.1+cu121
- Accelerate: 1.5.2
- Datasets: 2.14.5
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 300
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mrhimanshu/finetuned-bge-m3
Base model
BAAI/bge-m3