metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:CachedMultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: >-
How can AnimateDiff, a motion adapter for pretrained diffusion models, be
used to generate videos from images?
sentences:
- >-
Performs a single real input radix-2 transformation on the provided data
Kind: instance method ofP2FFT The input data array The output data array
The output offset The input offset The step
- >-
AnimateDiffis an adapter model that inserts a motion module into a
pretrained diffusion model to animate an image. The adapter is trained
on video clips to learn motion which is used to condition the generation
process to create a video. It is faster and easier to only train the
adapter and it can be loaded into most diffusion models, effectively
turning them into “video models”. Start by loading aMotionAdapter. Then
load a finetuned Stable Diffusion model with theAnimateDiffPipeline.
Create a prompt and generate the video.
- >-
Utility class to handle streaming of tokens generated by whisper
speech-to-text models. Callback functions are invoked when each of the
following events occur: Kind: static class ofgeneration/streamers
- source_sentence: >-
How to configure DeepSpeed, including ZeRO-2 and bf16 precision, for
optimal performance with Intel Gaudi HPUs?
sentences:
- >-
The DeepSpeed configuration to use is passed through a JSON file and
enables you to choose the optimizations to apply. Here is an example for
applying ZeRO-2 optimizations andbf16precision: The special
value"auto"enables to automatically get the correct or most efficient
value. You can also specify the values yourself but, if you do so, you
should be careful not to have conflicting values with your training
arguments. It is strongly advised to readthis sectionin the Transformers
documentation to completely understand how this works. Other examples of
configurations for HPUs are proposedhereby Intel. TheTransformers
documentationexplains how to write a configuration from scratch very
well. A more complete description of all configuration possibilities is
availablehere.
- >-
Creates a new instance of TokenizerModel. The configuration object for
the TokenizerModel.
- >-
Most Spaces should run out of the box after a GPU upgrade, but sometimes
you’ll need to install CUDA versions of the machine learning frameworks
you use. Please, follow this guide to ensure your Space takes advantage
of the improved hardware.
- source_sentence: >-
Can DeBERTa's question-answering model be fine-tuned for improved
information retrieval?
sentences:
- >-
RegNetXis a convolutional network design space with simple, regular
models with parameters: depthddd, initial widthw0>0w_{0} > 0w0>0, and
slopewa>0w_{a} > 0wa>0, and generates a different block
widthuju_{j}ujfor each blockj<dj < dj<d. The key restriction for the
RegNet types of model is that there is a linear parameterisation of
block widths (the design space only contains models with this linear
structure):uj=w0+wa⋅ju_{j} = w_{0} + w_{a}\cdot{j}uj=w0+wa⋅j
ForRegNetXwe have additional restrictions: we setb=1b = 1b=1(the
bottleneck ratio),12≤d≤2812 \leq d \leq 2812≤d≤28, andwm≥2w_{m} \geq
2wm≥2(the width multiplier).
- >-
DeBERTa Model with a span classification head on top for extractive
question-answering tasks like SQuAD (a linear layers on top of the
hidden-states output to computespan start logitsandspan end logits).
Kind: static class ofmodels
- >-
The minimum length of the sequence to be generated. Corresponds to the
length of the input prompt +min_new_tokens. Its effect is overridden
bymin_new_tokens, if also set. Kind: instance property
ofGenerationConfigDefault:0
- source_sentence: >-
How can I efficiently upload models from supported libraries like
Transformers to the Hugging Face Hub for improved information retrieval?
sentences:
- >-
🤗 Diffusers is compatible with Habana Gaudi through 🤗Optimum. Follow
theinstallationguide to install the SynapseAI and Gaudi drivers, and
then install Optimum Habana: To generate images with Stable Diffusion 1
and 2 on Gaudi, you need to instantiate two instances: When you
initialize the pipeline, you have to specifyuse_habana=Trueto deploy it
on HPUs and to get the fastest possible generation, you should enableHPU
graphswithuse_hpu_graphs=True. Finally, specify aGaudiConfigwhich can be
downloaded from theHabanaorganization on the Hub. Now you can call the
pipeline to generate images by batches from one or several prompts: For
more information, check out 🤗 Optimum Habana’sdocumentationand
theexampleprovided in the official GitHub repository.
- 'While training and evaluating we record the following reward metrics:'
- >-
First check if your model is from a library that has built-in support to
push to/load from the Hub, like Transformers, Diffusers, Timm, Asteroid,
etc.:https://huggingface.co/docs/hub/models-libraries. Below we’ll show
how easy this is for a library like Transformers: Some libraries, like
Transformers, support loadingcode from the Hub. This is a way to make
your model work with Transformers using thetrust_remote_code=Trueflag.
You may want to consider this option instead of a full-fledged library
integration.
- source_sentence: >-
How can I use Shiny for Python to build and deploy a Hugging Face Space
application?
sentences:
- >-
Shiny for Pythonis a pure Python implementation of Shiny. This gives you
access to all of the great features of Shiny like reactivity, complex
layouts, and modules without needing to use R. Shiny for Python is ideal
for Hugging Face applications because it integrates smoothly with other
Hugging Face tools. To get started deploying a Space, click this button
to select your hardware and specify if you want a public or private
Space. The Space template will populate a few files to get your app
started. app.py This file defines your app’s logic. To learn more about
how to modify this file, seethe Shiny for Python documentation. As your
app gets more complex, it’s a good idea to break your application logic
up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is
very minimal because the library doesn’t have many system dependencies,
but you may need to modify this file if your application has additional
system dependencies. The one essential feature of this file is that it
exposes and runs the app on the port specified in the space README file
(which is 7860 by default). requirements.txt The Space will
automatically install dependencies listed in the requirements.txt file.
Note that you must include shiny in this file.
- >-
(**kwargs) A context manager that will add each keyword argument passed
toos.environand remove them when exiting. Will convert the values
inkwargsto strings and upper-case all the keys. () A context manager
that will temporarily clear environment variables. When this context
exits, the previous environment variables will be back.
(mixed_precision= 'no'save_location: str =
'/github/home/.cache/huggingface/accelerate/default_config.yaml'use_xpu:
bool = False) Parameters Creates and saves a basic cluster config to be
used on a local machine with potentially multiple GPUs. Will also set
CPU if it is a CPU-only machine. When setting up 🤗 Accelerate for the
first time, rather than runningaccelerate
config[~utils.write_basic_config] can be used as an alternative for
quick configuration. (local_process_index: intverbose:
typing.Optional[bool] = None) Parameters Assigns the current process to
a specific NUMA node. Ideally most efficient when having at least 2 cpus
per node. This result is cached between calls. If you want to override
it, please useaccelerate.utils.environment.override_numa_afifnity.
(local_process_index: intverbose: typing.Optional[bool] = None)
Parameters Overrides whatever NUMA affinity is set for the current
process. This is very taxing and requires recalculating the affinity to
set, ideally you should useutils.environment.set_numa_affinityinstead.
(func_or_cls) Decorator to clean up accelerate environment variables set
by the decorated class or function. In some circumstances, calling
certain classes or functions can result in accelerate env vars being set
and not being cleaned up afterwards. As an example, when calling:
TrainingArguments(fp16=True, …) The following env var will be set:
ACCELERATE_MIXED_PRECISION=fp16 This can affect subsequent code, since
the env var takes precedence over TrainingArguments(fp16=False). This is
especially relevant for unit testing, where we want to avoid the
individual tests to have side effects on one another. Decorate the unit
test function or whole class with this decorator to ensure that after
each test, the env vars are cleaned up. This works for both
unittest.TestCase and normal classes (pytest); it also works when
decorating the parent class.
- >-
Performs a real-valued forward FFT on the given input buffer and stores
the result in the given output buffer. The input buffer must contain
real values only, while the output buffer will contain complex values.
The input and output buffers must be different. Kind: instance method
ofP2FFTThrows: The output buffer. The input buffer containing real
values.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 6,300 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 8 tokens
- mean: 26.77 tokens
- max: 189 tokens
- min: 5 tokens
- mean: 116.82 tokens
- max: 256 tokens
- Samples:
anchor positive How can I configure the
TextEncoderOnnxConfig
class for optimal ONNX export of a text encoder model intended for information retrieval?(config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.
How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations?
The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.
How can I manage access tokens to secure my organization's resources?
Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.
- Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 1024 }
Evaluation Dataset
Unnamed Dataset
- Size: 700 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 700 samples:
anchor positive type string string details - min: 8 tokens
- mean: 26.76 tokens
- max: 67 tokens
- min: 3 tokens
- mean: 115.51 tokens
- max: 256 tokens
- Samples:
anchor positive How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object?
Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.
How can the
generationlogits_process.NoBadWordsLogitsProcessor
static class be effectively integrated into a retrieval model to improve filtering of inappropriate content?Kind: static class ofgeneration/logits_process
How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance?
(model= Noneconfig= Nonekwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonekwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:
- Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 1024 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 32learning_rate
: 2e-05weight_decay
: 0.01num_train_epochs
: 5warmup_ratio
: 0.1warmup_steps
: 50fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.01adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 50log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.5076 | 100 | 0.308 | - |
1.0152 | 200 | 0.179 | - |
1.5228 | 300 | 0.127 | 0.0739 |
2.0305 | 400 | 0.0828 | - |
2.5381 | 500 | 0.0528 | - |
3.0457 | 600 | 0.0576 | 0.0436 |
3.5533 | 700 | 0.0396 | - |
1.0152 | 200 | 0.0262 | 0.0379 |
2.0305 | 400 | 0.0159 | 0.0360 |
3.0457 | 600 | 0.0082 | 0.0340 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.0.1
- Transformers: 4.47.0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}