bi-encode-HG-DOCS / README.md
truong1301's picture
Upload 16 files
4f7bfe4 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:6300
  - loss:CachedMultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      How can AnimateDiff, a motion adapter for pretrained diffusion models, be
      used to generate videos from images?
    sentences:
      - >-
        Performs a single real input radix-2 transformation on the provided data
        Kind: instance method ofP2FFT The input data array The output data array
        The output offset The input offset The step
      - >-
        AnimateDiffis an adapter model that inserts a motion module into a
        pretrained diffusion model to animate an image. The adapter is trained
        on video clips to learn motion which is used to condition the generation
        process to create a video. It is faster and easier to only train the
        adapter and it can be loaded into most diffusion models, effectively
        turning them into “video models”. Start by loading aMotionAdapter. Then
        load a finetuned Stable Diffusion model with theAnimateDiffPipeline.
        Create a prompt and generate the video.
      - >-
        Utility class to handle streaming of tokens generated by whisper
        speech-to-text models. Callback functions are invoked when each of the
        following events occur: Kind: static class ofgeneration/streamers
  - source_sentence: >-
      How to configure DeepSpeed, including ZeRO-2 and bf16 precision, for
      optimal performance with Intel Gaudi HPUs?
    sentences:
      - >-
        The DeepSpeed configuration to use is passed through a JSON file and
        enables you to choose the optimizations to apply. Here is an example for
        applying ZeRO-2 optimizations andbf16precision: The special
        value"auto"enables to automatically get the correct or most efficient
        value. You can also specify the values yourself but, if you do so, you
        should be careful not to have conflicting values with your training
        arguments. It is strongly advised to readthis sectionin the Transformers
        documentation to completely understand how this works. Other examples of
        configurations for HPUs are proposedhereby Intel. TheTransformers
        documentationexplains how to write a configuration from scratch very
        well. A more complete description of all configuration possibilities is
        availablehere.
      - >-
        Creates a new instance of TokenizerModel. The configuration object for
        the TokenizerModel.
      - >-
        Most Spaces should run out of the box after a GPU upgrade, but sometimes
        you’ll need to install CUDA versions of the machine learning frameworks
        you use. Please, follow this guide to ensure your Space takes advantage
        of the improved hardware.
  - source_sentence: >-
      Can DeBERTa's question-answering model be fine-tuned for improved
      information retrieval?
    sentences:
      - >-
        RegNetXis a convolutional network design space with simple, regular
        models with parameters: depthddd, initial widthw0>0w_{0} > 0w0​>0, and
        slopewa>0w_{a} > 0wa​>0, and generates a different block
        widthuju_{j}uj​for each blockj<dj < dj<d. The key restriction for the
        RegNet types of model is that there is a linear parameterisation of
        block widths (the design space only contains models with this linear
        structure):uj=w0+wa⋅ju_{j} = w_{0} + w_{a}\cdot{j}uj​=w0​+wa​⋅j
        ForRegNetXwe have additional restrictions: we setb=1b = 1b=1(the
        bottleneck ratio),12≤d≤2812 \leq d \leq 2812≤d≤28, andwm≥2w_{m} \geq
        2wm​≥2(the width multiplier).
      - >-
        DeBERTa Model with a span classification head on top for extractive
        question-answering tasks like SQuAD (a linear layers on top of the
        hidden-states output to computespan start logitsandspan end logits).
        Kind: static class ofmodels
      - >-
        The minimum length of the sequence to be generated. Corresponds to the
        length of the input prompt +min_new_tokens. Its effect is overridden
        bymin_new_tokens, if also set. Kind: instance property
        ofGenerationConfigDefault:0
  - source_sentence: >-
      How can I efficiently upload models from supported libraries like
      Transformers to the Hugging Face Hub for improved information retrieval?
    sentences:
      - >-
        🤗 Diffusers is compatible with Habana Gaudi through 🤗Optimum. Follow
        theinstallationguide to install the SynapseAI and Gaudi drivers, and
        then install Optimum Habana: To generate images with Stable Diffusion 1
        and 2 on Gaudi, you need to instantiate two instances: When you
        initialize the pipeline, you have to specifyuse_habana=Trueto deploy it
        on HPUs and to get the fastest possible generation, you should enableHPU
        graphswithuse_hpu_graphs=True. Finally, specify aGaudiConfigwhich can be
        downloaded from theHabanaorganization on the Hub. Now you can call the
        pipeline to generate images by batches from one or several prompts: For
        more information, check out 🤗 Optimum Habana’sdocumentationand
        theexampleprovided in the official GitHub repository.
      - 'While training and evaluating we record the following reward metrics:'
      - >-
        First check if your model is from a library that has built-in support to
        push to/load from the Hub, like Transformers, Diffusers, Timm, Asteroid,
        etc.:https://huggingface.co/docs/hub/models-libraries. Below we’ll show
        how easy this is for a library like Transformers: Some libraries, like
        Transformers, support loadingcode from the Hub. This is a way to make
        your model work with Transformers using thetrust_remote_code=Trueflag.
        You may want to consider this option instead of a full-fledged library
        integration.
  - source_sentence: >-
      How can I use Shiny for Python to build and deploy a Hugging Face Space
      application?
    sentences:
      - >-
        Shiny for Pythonis a pure Python implementation of Shiny. This gives you
        access to all of the great features of Shiny like reactivity, complex
        layouts, and modules without needing to use R. Shiny for Python is ideal
        for Hugging Face applications because it integrates smoothly with other
        Hugging Face tools. To get started deploying a Space, click this button
        to select your hardware and specify if you want a public or private
        Space. The Space template will populate a few files to get your app
        started. app.py This file defines your app’s logic. To learn more about
        how to modify this file, seethe Shiny for Python documentation. As your
        app gets more complex, it’s a good idea to break your application logic
        up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is
        very minimal because the library doesn’t have many system dependencies,
        but you may need to modify this file if your application has additional
        system dependencies. The one essential feature of this file is that it
        exposes and runs the app on the port specified in the space README file
        (which is 7860 by default). requirements.txt The Space will
        automatically install dependencies listed in the requirements.txt file.
        Note that you must include shiny in this file.
      - >-
        (**kwargs) A context manager that will add each keyword argument passed
        toos.environand remove them when exiting. Will convert the values
        inkwargsto strings and upper-case all the keys. () A context manager
        that will temporarily clear environment variables. When this context
        exits, the previous environment variables will be back.
        (mixed_precision= 'no'save_location: str =
        '/github/home/.cache/huggingface/accelerate/default_config.yaml'use_xpu:
        bool = False) Parameters Creates and saves a basic cluster config to be
        used on a local machine with potentially multiple GPUs. Will also set
        CPU if it is a CPU-only machine. When setting up 🤗 Accelerate for the
        first time, rather than runningaccelerate
        config[~utils.write_basic_config] can be used as an alternative for
        quick configuration. (local_process_index: intverbose:
        typing.Optional[bool] = None) Parameters Assigns the current process to
        a specific NUMA node. Ideally most efficient when having at least 2 cpus
        per node. This result is cached between calls. If you want to override
        it, please useaccelerate.utils.environment.override_numa_afifnity.
        (local_process_index: intverbose: typing.Optional[bool] = None)
        Parameters Overrides whatever NUMA affinity is set for the current
        process. This is very taxing and requires recalculating the affinity to
        set, ideally you should useutils.environment.set_numa_affinityinstead.
        (func_or_cls) Decorator to clean up accelerate environment variables set
        by the decorated class or function. In some circumstances, calling
        certain classes or functions can result in accelerate env vars being set
        and not being cleaned up afterwards. As an example, when calling:
        TrainingArguments(fp16=True, …) The following env var will be set:
        ACCELERATE_MIXED_PRECISION=fp16 This can affect subsequent code, since
        the env var takes precedence over TrainingArguments(fp16=False). This is
        especially relevant for unit testing, where we want to avoid the
        individual tests to have side effects on one another. Decorate the unit
        test function or whole class with this decorator to ensure that after
        each test, the env vars are cleaned up. This works for both
        unittest.TestCase and normal classes (pytest); it also works when
        decorating the parent class.
      - >-
        Performs a real-valued forward FFT on the given input buffer and stores
        the result in the given output buffer. The input buffer must contain
        real values only, while the output buffer will contain complex values.
        The input and output buffers must be different. Kind: instance method
        ofP2FFTThrows: The output buffer. The input buffer containing real
        values.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
    'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
    'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 26.77 tokens
    • max: 189 tokens
    • min: 5 tokens
    • mean: 116.82 tokens
    • max: 256 tokens
  • Samples:
    anchor positive
    How can I configure the TextEncoderOnnxConfig class for optimal ONNX export of a text encoder model intended for information retrieval? (config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.
    How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations? The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.
    How can I manage access tokens to secure my organization's resources? Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 1024
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 700 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 700 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 26.76 tokens
    • max: 67 tokens
    • min: 3 tokens
    • mean: 115.51 tokens
    • max: 256 tokens
  • Samples:
    anchor positive
    How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object? Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.
    How can the generationlogits_process.NoBadWordsLogitsProcessor static class be effectively integrated into a retrieval model to improve filtering of inappropriate content? Kind: static class ofgeneration/logits_process
    How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance? (model= Noneconfig= Nonekwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonekwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 1024
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • warmup_steps: 50
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 50
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.5076 100 0.308 -
1.0152 200 0.179 -
1.5228 300 0.127 0.0739
2.0305 400 0.0828 -
2.5381 500 0.0528 -
3.0457 600 0.0576 0.0436
3.5533 700 0.0396 -
1.0152 200 0.0262 0.0379
2.0305 400 0.0159 0.0360
3.0457 600 0.0082 0.0340

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 4.0.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}