SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 6,300 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 8 tokens
- mean: 26.77 tokens
- max: 189 tokens
- min: 5 tokens
- mean: 116.82 tokens
- max: 256 tokens
- Samples:
anchor positive How can I configure the
TextEncoderOnnxConfig
class for optimal ONNX export of a text encoder model intended for information retrieval?(config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.
How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations?
The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.
How can I manage access tokens to secure my organization's resources?
Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.
- Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 1024 }
Evaluation Dataset
Unnamed Dataset
- Size: 700 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 700 samples:
anchor positive type string string details - min: 8 tokens
- mean: 26.76 tokens
- max: 67 tokens
- min: 3 tokens
- mean: 115.51 tokens
- max: 256 tokens
- Samples:
anchor positive How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object?
Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.
How can the
generationlogits_process.NoBadWordsLogitsProcessor
static class be effectively integrated into a retrieval model to improve filtering of inappropriate content?Kind: static class ofgeneration/logits_process
How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance?
(model= Noneconfig= Nonekwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonekwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:
- Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 1024 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 32learning_rate
: 2e-05weight_decay
: 0.01num_train_epochs
: 5warmup_ratio
: 0.1warmup_steps
: 50fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.01adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 50log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.5076 | 100 | 0.308 | - |
1.0152 | 200 | 0.179 | - |
1.5228 | 300 | 0.127 | 0.0739 |
2.0305 | 400 | 0.0828 | - |
2.5381 | 500 | 0.0528 | - |
3.0457 | 600 | 0.0576 | 0.0436 |
3.5533 | 700 | 0.0396 | - |
1.0152 | 200 | 0.0262 | 0.0379 |
2.0305 | 400 | 0.0159 | 0.0360 |
3.0457 | 600 | 0.0082 | 0.0340 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.0.1
- Transformers: 4.47.0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for truong1301/bi-encode-HG-DOCS
Base model
sentence-transformers/all-MiniLM-L6-v2