metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:6300
  - loss:CachedMultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      How can AnimateDiff, a motion adapter for pretrained diffusion models, be
      used to generate videos from images?
    sentences:
      - >-
        Performs a single real input radix-2 transformation on the provided data
        Kind: instance method ofP2FFT The input data array The output data array
        The output offset The input offset The step
      - >-
        AnimateDiffis an adapter model that inserts a motion module into a
        pretrained diffusion model to animate an image. The adapter is trained
        on video clips to learn motion which is used to condition the generation
        process to create a video. It is faster and easier to only train the
        adapter and it can be loaded into most diffusion models, effectively
        turning them into “video models”. Start by loading aMotionAdapter. Then
        load a finetuned Stable Diffusion model with theAnimateDiffPipeline.
        Create a prompt and generate the video.
      - >-
        Utility class to handle streaming of tokens generated by whisper
        speech-to-text models. Callback functions are invoked when each of the
        following events occur: Kind: static class ofgeneration/streamers
  - source_sentence: >-
      How to configure DeepSpeed, including ZeRO-2 and bf16 precision, for
      optimal performance with Intel Gaudi HPUs?
    sentences:
      - >-
        The DeepSpeed configuration to use is passed through a JSON file and
        enables you to choose the optimizations to apply. Here is an example for
        applying ZeRO-2 optimizations andbf16precision: The special
        value"auto"enables to automatically get the correct or most efficient
        value. You can also specify the values yourself but, if you do so, you
        should be careful not to have conflicting values with your training
        arguments. It is strongly advised to readthis sectionin the Transformers
        documentation to completely understand how this works. Other examples of
        configurations for HPUs are proposedhereby Intel. TheTransformers
        documentationexplains how to write a configuration from scratch very
        well. A more complete description of all configuration possibilities is
        availablehere.
      - >-
        Creates a new instance of TokenizerModel. The configuration object for
        the TokenizerModel.
      - >-
        Most Spaces should run out of the box after a GPU upgrade, but sometimes
        you’ll need to install CUDA versions of the machine learning frameworks
        you use. Please, follow this guide to ensure your Space takes advantage
        of the improved hardware.
  - source_sentence: >-
      Can DeBERTa's question-answering model be fine-tuned for improved
      information retrieval?
    sentences:
      - >-
        RegNetXis a convolutional network design space with simple, regular
        models with parameters: depthddd, initial widthw0>0w_{0} > 0w0>0, and
        slopewa>0w_{a} > 0wa>0, and generates a different block
        widthuju_{j}ujfor each blockj<dj < dj<d. The key restriction for the
        RegNet types of model is that there is a linear parameterisation of
        block widths (the design space only contains models with this linear
        structure):uj=w0+wa⋅ju_{j} = w_{0} + w_{a}\cdot{j}uj=w0+wa⋅j
        ForRegNetXwe have additional restrictions: we setb=1b = 1b=1(the
        bottleneck ratio),12≤d≤2812 \leq d \leq 2812≤d≤28, andwm≥2w_{m} \geq
        2wm≥2(the width multiplier).
      - >-
        DeBERTa Model with a span classification head on top for extractive
        question-answering tasks like SQuAD (a linear layers on top of the
        hidden-states output to computespan start logitsandspan end logits).
        Kind: static class ofmodels
      - >-
        The minimum length of the sequence to be generated. Corresponds to the
        length of the input prompt +min_new_tokens. Its effect is overridden
        bymin_new_tokens, if also set. Kind: instance property
        ofGenerationConfigDefault:0
  - source_sentence: >-
      How can I efficiently upload models from supported libraries like
      Transformers to the Hugging Face Hub for improved information retrieval?
    sentences:
      - >-
        🤗 Diffusers is compatible with Habana Gaudi through 🤗Optimum. Follow
        theinstallationguide to install the SynapseAI and Gaudi drivers, and
        then install Optimum Habana: To generate images with Stable Diffusion 1
        and 2 on Gaudi, you need to instantiate two instances: When you
        initialize the pipeline, you have to specifyuse_habana=Trueto deploy it
        on HPUs and to get the fastest possible generation, you should enableHPU
        graphswithuse_hpu_graphs=True. Finally, specify aGaudiConfigwhich can be
        downloaded from theHabanaorganization on the Hub. Now you can call the
        pipeline to generate images by batches from one or several prompts: For
        more information, check out 🤗 Optimum Habana’sdocumentationand
        theexampleprovided in the official GitHub repository.
      - 'While training and evaluating we record the following reward metrics:'
      - >-
        First check if your model is from a library that has built-in support to
        push to/load from the Hub, like Transformers, Diffusers, Timm, Asteroid,
        etc.:https://huggingface.co/docs/hub/models-libraries. Below we’ll show
        how easy this is for a library like Transformers: Some libraries, like
        Transformers, support loadingcode from the Hub. This is a way to make
        your model work with Transformers using thetrust_remote_code=Trueflag.
        You may want to consider this option instead of a full-fledged library
        integration.
  - source_sentence: >-
      How can I use Shiny for Python to build and deploy a Hugging Face Space
      application?
    sentences:
      - >-
        Shiny for Pythonis a pure Python implementation of Shiny. This gives you
        access to all of the great features of Shiny like reactivity, complex
        layouts, and modules without needing to use R. Shiny for Python is ideal
        for Hugging Face applications because it integrates smoothly with other
        Hugging Face tools. To get started deploying a Space, click this button
        to select your hardware and specify if you want a public or private
        Space. The Space template will populate a few files to get your app
        started. app.py This file defines your app’s logic. To learn more about
        how to modify this file, seethe Shiny for Python documentation. As your
        app gets more complex, it’s a good idea to break your application logic
        up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is
        very minimal because the library doesn’t have many system dependencies,
        but you may need to modify this file if your application has additional
        system dependencies. The one essential feature of this file is that it
        exposes and runs the app on the port specified in the space README file
        (which is 7860 by default). requirements.txt The Space will
        automatically install dependencies listed in the requirements.txt file.
        Note that you must include shiny in this file.
      - >-
        (**kwargs) A context manager that will add each keyword argument passed
        toos.environand remove them when exiting. Will convert the values
        inkwargsto strings and upper-case all the keys. () A context manager
        that will temporarily clear environment variables. When this context
        exits, the previous environment variables will be back.
        (mixed_precision= 'no'save_location: str =
        '/github/home/.cache/huggingface/accelerate/default_config.yaml'use_xpu:
        bool = False) Parameters Creates and saves a basic cluster config to be
        used on a local machine with potentially multiple GPUs. Will also set
        CPU if it is a CPU-only machine. When setting up 🤗 Accelerate for the
        first time, rather than runningaccelerate
        config[~utils.write_basic_config] can be used as an alternative for
        quick configuration. (local_process_index: intverbose:
        typing.Optional[bool] = None) Parameters Assigns the current process to
        a specific NUMA node. Ideally most efficient when having at least 2 cpus
        per node. This result is cached between calls. If you want to override
        it, please useaccelerate.utils.environment.override_numa_afifnity.
        (local_process_index: intverbose: typing.Optional[bool] = None)
        Parameters Overrides whatever NUMA affinity is set for the current
        process. This is very taxing and requires recalculating the affinity to
        set, ideally you should useutils.environment.set_numa_affinityinstead.
        (func_or_cls) Decorator to clean up accelerate environment variables set
        by the decorated class or function. In some circumstances, calling
        certain classes or functions can result in accelerate env vars being set
        and not being cleaned up afterwards. As an example, when calling:
        TrainingArguments(fp16=True, …) The following env var will be set:
        ACCELERATE_MIXED_PRECISION=fp16 This can affect subsequent code, since
        the env var takes precedence over TrainingArguments(fp16=False). This is
        especially relevant for unit testing, where we want to avoid the
        individual tests to have side effects on one another. Decorate the unit
        test function or whole class with this decorator to ensure that after
        each test, the env vars are cleaned up. This works for both
        unittest.TestCase and normal classes (pytest); it also works when
        decorating the parent class.
      - >-
        Performs a real-valued forward FFT on the given input buffer and stores
        the result in the given output buffer. The input buffer must contain
        real values only, while the output buffer will contain complex values.
        The input and output buffers must be different. Kind: instance method
        ofP2FFTThrows: The output buffer. The input buffer containing real
        values.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
    'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
    'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 6,300 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 26.77 tokens
max: 189 tokens

min: 5 tokens
mean: 116.82 tokens
max: 256 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 26.77 tokens max: 189 tokens	min: 5 tokens mean: 116.82 tokens max: 256 tokens

Samples:

anchor	positive
`How can I configure the TextEncoderOnnxConfig class for optimal ONNX export of a text encoder model intended for information retrieval?`	`(config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.`
`How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations?`	The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.
`How can I manage access tokens to secure my organization's resources?`	`Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.`

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 1024
}

Evaluation Dataset

Unnamed Dataset

Size: 700 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 700 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 26.76 tokens
max: 67 tokens

min: 3 tokens
mean: 115.51 tokens
max: 256 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 26.76 tokens max: 67 tokens	min: 3 tokens mean: 115.51 tokens max: 256 tokens

Samples:

anchor	positive
`How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object?`	`Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.`
`How can the generationlogits_process.NoBadWordsLogitsProcessor static class be effectively integrated into a retrieval model to improve filtering of inappropriate content?`	`Kind: static class ofgeneration/logits_process`
`How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance?`	(model= Noneconfig= Nonekwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonekwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:

Loss: CachedMultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "mini_batch_size": 1024
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
weight_decay: 0.01
num_train_epochs: 5
warmup_ratio: 0.1
warmup_steps: 50
fp16: True
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 50
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss
0.5076	100	0.308	-
1.0152	200	0.179	-
1.5228	300	0.127	0.0739
2.0305	400	0.0828	-
2.5381	500	0.0528	-
3.0457	600	0.0576	0.0436
3.5533	700	0.0396	-
1.0152	200	0.0262	0.0379
2.0305	400	0.0159	0.0360
3.0457	600	0.0082	0.0340

Framework Versions

Python: 3.10.12
Sentence Transformers: 4.0.1
Transformers: 4.47.0
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.3.1
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}