---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:13734
- loss:MultipleNegativesRankingLoss
base_model: intfloat/e5-small-v2
widget:
- source_sentence: predict
  sentences:
  - "    def _compute_score_samples(self, X, subsample_features):\n        \"\"\"\n\
    \        Compute the score of each samples in X going through the extra trees.\n\
    \n        Parameters\n        ----------\n        X : array-like or sparse matrix\n\
    \            Data matrix.\n\n        subsample_features : bool\n            Whether\
    \ features should be subsampled.\n\n        Returns\n        -------\n       \
    \ scores : ndarray of shape (n_samples,)\n            The score of each sample\
    \ in X.\n        \"\"\"\n        n_samples = X.shape[0]\n\n        depths = np.zeros(n_samples,\
    \ order=\"f\")\n\n        average_path_length_max_samples = _average_path_length([self._max_samples])\n\
    \n        # Note: we use default n_jobs value, i.e. sequential computation, which\n\
    \        # we expect to be more performant that parallelizing for small number\n\
    \        # of samples, e.g. < 1k samples. Default n_jobs value can be overridden\n\
    \        # by using joblib.parallel_backend context manager around\n        #\
    \ ._compute_score_samples. Using a higher n_jobs may speed up the\n        # computation\
    \ of the scores, e.g. for > 1k samples. See\n        # https://github.com/scikit-learn/scikit-learn/pull/28622\
    \ for more\n        # details.\n        lock = threading.Lock()\n        Parallel(\n\
    \            verbose=self.verbose,\n            require=\"sharedmem\",\n     \
    \   )(\n            delayed(_parallel_compute_tree_depths)(\n                tree,\n\
    \                X,\n                features if subsample_features else None,\n\
    \                self._decision_path_lengths[tree_idx],\n                self._average_path_length_per_tree[tree_idx],\n\
    \                depths,\n                lock,\n            )\n            for\
    \ tree_idx, (tree, features) in enumerate(\n                zip(self.estimators_,\
    \ self.estimators_features_)\n            )\n        )\n\n        denominator\
    \ = len(self.estimators_) * average_path_length_max_samples\n        scores =\
    \ 2 ** (\n            # For a single training sample, denominator and depth are\
    \ 0.\n            # Therefore, we set the score manually to 1.\n            -np.divide(\n\
    \                depths, denominator, out=np.ones_like(depths), where=denominator\
    \ != 0\n            )\n        )\n        return scores"
  - "        def predict(self, X):\n            return np.zeros(X.shape[0])"
  - "def test_dist_threshold_invalid_parameters():\n    X = [[0], [1]]\n    with pytest.raises(ValueError,\
    \ match=\"Exactly one of \"):\n        AgglomerativeClustering(n_clusters=None,\
    \ distance_threshold=None).fit(X)\n\n    with pytest.raises(ValueError, match=\"\
    Exactly one of \"):\n        AgglomerativeClustering(n_clusters=2, distance_threshold=1).fit(X)\n\
    \n    X = [[0], [1]]\n    with pytest.raises(ValueError, match=\"compute_full_tree\
    \ must be True if\"):\n        AgglomerativeClustering(\n            n_clusters=None,\
    \ distance_threshold=1, compute_full_tree=False\n        ).fit(X)"
- source_sentence: sklearn tags
  sentences:
  - "    def __sklearn_tags__(self):\n        tags = super().__sklearn_tags__()\n\
    \        tags.input_tags.sparse = True\n        return tags"
  - "class SelectFdr(_BaseFilter):\n    \"\"\"Filter: Select the p-values for an estimated\
    \ false discovery rate.\n\n    This uses the Benjamini-Hochberg procedure. ``alpha``\
    \ is an upper bound\n    on the expected false discovery rate.\n\n    Read more\
    \ in the :ref:`User Guide <univariate_feature_selection>`.\n\n    Parameters\n\
    \    ----------\n    score_func : callable, default=f_classif\n        Function\
    \ taking two arrays X and y, and returning a pair of arrays\n        (scores,\
    \ pvalues).\n        Default is f_classif (see below \"See Also\"). The default\
    \ function only\n        works with classification tasks.\n\n    alpha : float,\
    \ default=5e-2\n        The highest uncorrected p-value for features to keep.\n\
    \n    Attributes\n    ----------\n    scores_ : array-like of shape (n_features,)\n\
    \        Scores of features.\n\n    pvalues_ : array-like of shape (n_features,)\n\
    \        p-values of feature scores.\n\n    n_features_in_ : int\n        Number\
    \ of features seen during :term:`fit`.\n\n        .. versionadded:: 0.24\n\n \
    \   feature_names_in_ : ndarray of shape (`n_features_in_`,)\n        Names of\
    \ features seen during :term:`fit`. Defined only when `X`\n        has feature\
    \ names that are all strings.\n\n        .. versionadded:: 1.0\n\n    See Also\n\
    \    --------\n    f_classif : ANOVA F-value between label/feature for classification\
    \ tasks.\n    mutual_info_classif : Mutual information for a discrete target.\n\
    \    chi2 : Chi-squared stats of non-negative features for classification tasks.\n\
    \    f_regression : F-value between label/feature for regression tasks.\n    mutual_info_regression\
    \ : Mutual information for a continuous target.\n    SelectPercentile : Select\
    \ features based on percentile of the highest\n        scores.\n    SelectKBest\
    \ : Select features based on the k highest scores.\n    SelectFpr : Select features\
    \ based on a false positive rate test.\n    SelectFwe : Select features based\
    \ on family-wise error rate.\n    GenericUnivariateSelect : Univariate feature\
    \ selector with configurable\n        mode.\n\n    References\n    ----------\n\
    \    https://en.wikipedia.org/wiki/False_discovery_rate\n\n    Examples\n    --------\n\
    \    >>> from sklearn.datasets import load_breast_cancer\n    >>> from sklearn.feature_selection\
    \ import SelectFdr, chi2\n    >>> X, y = load_breast_cancer(return_X_y=True)\n\
    \    >>> X.shape\n    (569, 30)\n    >>> X_new = SelectFdr(chi2, alpha=0.01).fit_transform(X,\
    \ y)\n    >>> X_new.shape\n    (569, 16)\n    \"\"\"\n\n    _parameter_constraints:\
    \ dict = {\n        **_BaseFilter._parameter_constraints,\n        \"alpha\":\
    \ [Interval(Real, 0, 1, closed=\"both\")],\n    }\n\n    def __init__(self, score_func=f_classif,\
    \ *, alpha=5e-2):\n        super().__init__(score_func=score_func)\n        self.alpha\
    \ = alpha\n\n    def _get_support_mask(self):\n        check_is_fitted(self)\n\
    \n        n_features = len(self.pvalues_)\n        sv = np.sort(self.pvalues_)\n\
    \        selected = sv[\n            sv <= float(self.alpha) / n_features * np.arange(1,\
    \ n_features + 1)\n        ]\n        if selected.size == 0:\n            return\
    \ np.zeros_like(self.pvalues_, dtype=bool)\n        return self.pvalues_ <= selected.max()"
  - "def test_absolute_error():\n    # For coverage only.\n    X, y = make_regression(n_samples=500,\
    \ random_state=0)\n    gbdt = HistGradientBoostingRegressor(loss=\"absolute_error\"\
    , random_state=0)\n    gbdt.fit(X, y)\n    assert gbdt.score(X, y) > 0.9"
- source_sentence: test lsvc intercept scaling zero
  sentences:
  - "class BaggingClassifier(ClassifierMixin, BaseBagging):\n    \"\"\"A Bagging classifier.\n\
    \n    A Bagging classifier is an ensemble meta-estimator that fits base\n    classifiers\
    \ each on random subsets of the original dataset and then\n    aggregate their\
    \ individual predictions (either by voting or by averaging)\n    to form a final\
    \ prediction. Such a meta-estimator can typically be used as\n    a way to reduce\
    \ the variance of a black-box estimator (e.g., a decision\n    tree), by introducing\
    \ randomization into its construction procedure and\n    then making an ensemble\
    \ out of it.\n\n    This algorithm encompasses several works from the literature.\
    \ When random\n    subsets of the dataset are drawn as random subsets of the samples,\
    \ then\n    this algorithm is known as Pasting [1]_. If samples are drawn with\n\
    \    replacement, then the method is known as Bagging [2]_. When random subsets\n\
    \    of the dataset are drawn as random subsets of the features, then the method\n\
    \    is known as Random Subspaces [3]_. Finally, when base estimators are built\n\
    \    on subsets of both samples and features, then the method is known as\n  \
    \  Random Patches [4]_.\n\n    Read more in the :ref:`User Guide <bagging>`.\n\
    \n    .. versionadded:: 0.15\n\n    Parameters\n    ----------\n    estimator\
    \ : object, default=None\n        The base estimator to fit on random subsets\
    \ of the dataset.\n        If None, then the base estimator is a\n        :class:`~sklearn.tree.DecisionTreeClassifier`.\n\
    \n        .. versionadded:: 1.2\n           `base_estimator` was renamed to `estimator`.\n\
    \n    n_estimators : int, default=10\n        The number of base estimators in\
    \ the ensemble.\n\n    max_samples : int or float, default=1.0\n        The number\
    \ of samples to draw from X to train each base estimator (with\n        replacement\
    \ by default, see `bootstrap` for more details).\n\n        - If int, then draw\
    \ `max_samples` samples.\n        - If float, then draw `max_samples * X.shape[0]`\
    \ samples.\n\n    max_features : int or float, default=1.0\n        The number\
    \ of features to draw from X to train each base estimator (\n        without replacement\
    \ by default, see `bootstrap_features` for more\n        details).\n\n       \
    \ - If int, then draw `max_features` features.\n        - If float, then draw\
    \ `max(1, int(max_features * n_features_in_))` features.\n\n    bootstrap : bool,\
    \ default=True\n        Whether samples are drawn with replacement. If False,\
    \ sampling\n        without replacement is performed.\n\n    bootstrap_features\
    \ : bool, default=False\n        Whether features are drawn with replacement.\n\
    \n    oob_score : bool, default=False\n        Whether to use out-of-bag samples\
    \ to estimate\n        the generalization error. Only available if bootstrap=True.\n\
    \n    warm_start : bool, default=False\n        When set to True, reuse the solution\
    \ of the previous call to fit\n        and add more estimators to the ensemble,\
    \ otherwise, just fit\n        a whole new ensemble. See :term:`the Glossary <warm_start>`.\n\
    \n        .. versionadded:: 0.17\n           *warm_start* constructor parameter.\n\
    \n    n_jobs : int, default=None\n        The number of jobs to run in parallel\
    \ for both :meth:`fit` and\n        :meth:`predict`. ``None`` means 1 unless in\
    \ a\n        :obj:`joblib.parallel_backend` context. ``-1`` means using all\n\
    \        processors. See :term:`Glossary <n_jobs>` for more details.\n\n    random_state\
    \ : int, RandomState instance or None, default=None\n        Controls the random\
    \ resampling of the original dataset\n        (sample wise and feature wise).\n\
    \        If the base estimator accepts a `random_state` attribute, a different\n\
    \        seed is generated for each instance in the ensemble.\n        Pass an\
    \ int for reproducible output across multiple function calls.\n        See :term:`Glossary\
    \ <random_state>`.\n\n    verbose : int, default=0\n        Controls the verbosity\
    \ when fitting and predicting.\n\n    Attributes\n    ----------\n    estimator_\
    \ : estimator\n        The base estimator from which the ensemble is grown.\n\n\
    \        .. versionadded:: 1.2\n           `base_estimator_` was renamed to `estimator_`.\n\
    \n    n_features_in_ : int\n        Number of features seen during :term:`fit`.\n\
    \n        .. versionadded:: 0.24\n\n    feature_names_in_ : ndarray of shape (`n_features_in_`,)\n\
    \        Names of features seen during :term:`fit`. Defined only when `X`\n  \
    \      has feature names that are all strings.\n\n        .. versionadded:: 1.0\n\
    \n    estimators_ : list of estimators\n        The collection of fitted base\
    \ estimators.\n\n    estimators_samples_ : list of arrays\n        The subset\
    \ of drawn samples (i.e., the in-bag samples) for each base\n        estimator.\
    \ Each subset is defined by an array of the indices selected.\n\n    estimators_features_\
    \ : list of arrays\n        The subset of drawn features for each base estimator.\n\
    \n    classes_ : ndarray of shape (n_classes,)\n        The classes labels.\n\n\
    \    n_classes_ : int or list\n        The number of classes.\n\n    oob_score_\
    \ : float\n        Score of the training dataset obtained using an out-of-bag\
    \ estimate.\n        This attribute exists only when ``oob_score`` is True.\n\n\
    \    oob_decision_function_ : ndarray of shape (n_samples, n_classes)\n      \
    \  Decision function computed with out-of-bag estimate on the training\n     \
    \   set. If n_estimators is small it might be possible that a data point\n   \
    \     was never left out during the bootstrap. In this case,\n        `oob_decision_function_`\
    \ might contain NaN. This attribute exists\n        only when ``oob_score`` is\
    \ True.\n\n    See Also\n    --------\n    BaggingRegressor : A Bagging regressor.\n\
    \n    References\n    ----------\n\n    .. [1] L. Breiman, \"Pasting small votes\
    \ for classification in large\n           databases and on-line\", Machine Learning,\
    \ 36(1), 85-103, 1999.\n\n    .. [2] L. Breiman, \"Bagging predictors\", Machine\
    \ Learning, 24(2), 123-140,\n           1996.\n\n    .. [3] T. Ho, \"The random\
    \ subspace method for constructing decision\n           forests\", Pattern Analysis\
    \ and Machine Intelligence, 20(8), 832-844,\n           1998.\n\n    .. [4] G.\
    \ Louppe and P. Geurts, \"Ensembles on Random Patches\", Machine\n           Learning\
    \ and Knowledge Discovery in Databases, 346-361, 2012.\n\n    Examples\n    --------\n\
    \    >>> from sklearn.svm import SVC\n    >>> from sklearn.ensemble import BaggingClassifier\n\
    \    >>> from sklearn.datasets import make_classification\n    >>> X, y = make_classification(n_samples=100,\
    \ n_features=4,\n    ...                            n_informative=2, n_redundant=0,\n\
    \    ...                            random_state=0, shuffle=False)\n    >>> clf\
    \ = BaggingClassifier(estimator=SVC(),\n    ...                         n_estimators=10,\
    \ random_state=0).fit(X, y)\n    >>> clf.predict([[0, 0, 0, 0]])\n    array([1])\n\
    \    \"\"\"\n\n    def __init__(\n        self,\n        estimator=None,\n   \
    \     n_estimators=10,\n        *,\n        max_samples=1.0,\n        max_features=1.0,\n\
    \        bootstrap=True,\n        bootstrap_features=False,\n        oob_score=False,\n\
    \        warm_start=False,\n        n_jobs=None,\n        random_state=None,\n\
    \        verbose=0,\n    ):\n        super().__init__(\n            estimator=estimator,\n\
    \            n_estimators=n_estimators,\n            max_samples=max_samples,\n\
    \            max_features=max_features,\n            bootstrap=bootstrap,\n  \
    \          bootstrap_features=bootstrap_features,\n            oob_score=oob_score,\n\
    \            warm_start=warm_start,\n            n_jobs=n_jobs,\n            random_state=random_state,\n\
    \            verbose=verbose,\n        )\n\n    def _get_estimator(self):\n  \
    \      \"\"\"Resolve which estimator to return (default is DecisionTreeClassifier)\"\
    \"\"\n        if self.estimator is None:\n            return DecisionTreeClassifier()\n\
    \        return self.estimator\n\n    def _set_oob_score(self, X, y):\n      \
    \  n_samples = y.shape[0]\n        n_classes_ = self.n_classes_\n\n        predictions\
    \ = np.zeros((n_samples, n_classes_))\n\n        for estimator, samples, features\
    \ in zip(\n            self.estimators_, self.estimators_samples_, self.estimators_features_\n\
    \        ):\n            # Create mask for OOB samples\n            mask = ~indices_to_mask(samples,\
    \ n_samples)\n\n            if hasattr(estimator, \"predict_proba\"):\n      \
    \          predictions[mask, :] += estimator.predict_proba(\n                \
    \    (X[mask, :])[:, features]\n                )\n\n            else:\n     \
    \           p = estimator.predict((X[mask, :])[:, features])\n               \
    \ j = 0\n\n                for i in range(n_samples):\n                    if\
    \ mask[i]:\n                        predictions[i, p[j]] += 1\n              \
    \          j += 1\n\n        if (predictions.sum(axis=1) == 0).any():\n      \
    \      warn(\n                \"Some inputs do not have OOB scores. \"\n     \
    \           \"This probably means too few estimators were used \"\n          \
    \      \"to compute any reliable oob estimates.\"\n            )\n\n        oob_decision_function\
    \ = predictions / predictions.sum(axis=1)[:, np.newaxis]\n        oob_score =\
    \ accuracy_score(y, np.argmax(predictions, axis=1))\n\n        self.oob_decision_function_\
    \ = oob_decision_function\n        self.oob_score_ = oob_score\n\n    def _validate_y(self,\
    \ y):\n        y = column_or_1d(y, warn=True)\n        check_classification_targets(y)\n\
    \        self.classes_, y = np.unique(y, return_inverse=True)\n        self.n_classes_\
    \ = len(self.classes_)\n\n        return y\n\n    def predict(self, X, **params):\n\
    \        \"\"\"Predict class for X.\n\n        The predicted class of an input\
    \ sample is computed as the class with\n        the highest mean predicted probability.\
    \ If base estimators do not\n        implement a ``predict_proba`` method, then\
    \ it resorts to voting.\n\n        Parameters\n        ----------\n        X :\
    \ {array-like, sparse matrix} of shape (n_samples, n_features)\n            The\
    \ training input samples. Sparse matrices are accepted only if\n            they\
    \ are supported by the base estimator.\n\n        **params : dict\n          \
    \  Parameters routed to the `predict_proba` (if available) or the `predict`\n\
    \            method (otherwise) of the sub-estimators via the metadata routing\
    \ API.\n\n            .. versionadded:: 1.7\n\n                Only available\
    \ if\n                `sklearn.set_config(enable_metadata_routing=True)` is set.\
    \ See\n                :ref:`Metadata Routing User Guide <metadata_routing>` for\
    \ more\n                details.\n\n        Returns\n        -------\n       \
    \ y : ndarray of shape (n_samples,)\n            The predicted classes.\n    \
    \    \"\"\"\n        _raise_for_params(params, self, \"predict\")\n\n        predicted_probabilitiy\
    \ = self.predict_proba(X, **params)\n        return self.classes_.take((np.argmax(predicted_probabilitiy,\
    \ axis=1)), axis=0)\n\n    def predict_proba(self, X, **params):\n        \"\"\
    \"Predict class probabilities for X.\n\n        The predicted class probabilities\
    \ of an input sample is computed as\n        the mean predicted class probabilities\
    \ of the base estimators in the\n        ensemble. If base estimators do not implement\
    \ a ``predict_proba``\n        method, then it resorts to voting and the predicted\
    \ class probabilities\n        of an input sample represents the proportion of\
    \ estimators predicting\n        each class.\n\n        Parameters\n        ----------\n\
    \        X : {array-like, sparse matrix} of shape (n_samples, n_features)\n  \
    \          The training input samples. Sparse matrices are accepted only if\n\
    \            they are supported by the base estimator.\n\n        **params : dict\n\
    \            Parameters routed to the `predict_proba` (if available) or the `predict`\n\
    \            method (otherwise) of the sub-estimators via the metadata routing\
    \ API.\n\n            .. versionadded:: 1.7\n\n                Only available\
    \ if\n                `sklearn.set_config(enable_metadata_routing=True)` is set.\
    \ See\n                :ref:`Metadata Routing User Guide <metadata_routing>` for\
    \ more\n                details.\n\n        Returns\n        -------\n       \
    \ p : ndarray of shape (n_samples, n_classes)\n            The class probabilities\
    \ of the input samples. The order of the\n            classes corresponds to that\
    \ in the attribute :term:`classes_`.\n        \"\"\"\n        _raise_for_params(params,\
    \ self, \"predict_proba\")\n\n        check_is_fitted(self)\n        # Check data\n\
    \        X = validate_data(\n            self,\n            X,\n            accept_sparse=[\"\
    csr\", \"csc\"],\n            dtype=None,\n            ensure_all_finite=False,\n\
    \            reset=False,\n        )\n\n        if _routing_enabled():\n     \
    \       routed_params = process_routing(self, \"predict_proba\", **params)\n \
    \       else:\n            routed_params = Bunch()\n            routed_params.estimator\
    \ = Bunch(predict_proba=Bunch())\n\n        # Parallel loop\n        n_jobs, _,\
    \ starts = _partition_estimators(self.n_estimators, self.n_jobs)\n\n        all_proba\
    \ = Parallel(\n            n_jobs=n_jobs, verbose=self.verbose, **self._parallel_args()\n\
    \        )(\n            delayed(_parallel_predict_proba)(\n                self.estimators_[starts[i]\
    \ : starts[i + 1]],\n                self.estimators_features_[starts[i] : starts[i\
    \ + 1]],\n                X,\n                self.n_classes_,\n             \
    \   predict_params=routed_params.estimator.get(\"predict\", None),\n         \
    \       predict_proba_params=routed_params.estimator.get(\"predict_proba\", None),\n\
    \            )\n            for i in range(n_jobs)\n        )\n\n        # Reduce\n\
    \        proba = sum(all_proba) / self.n_estimators\n\n        return proba\n\n\
    \    def predict_log_proba(self, X, **params):\n        \"\"\"Predict class log-probabilities\
    \ for X.\n\n        The predicted class log-probabilities of an input sample is\
    \ computed as\n        the log of the mean predicted class probabilities of the\
    \ base\n        estimators in the ensemble.\n\n        Parameters\n        ----------\n\
    \        X : {array-like, sparse matrix} of shape (n_samples, n_features)\n  \
    \          The training input samples. Sparse matrices are accepted only if\n\
    \            they are supported by the base estimator.\n\n        **params : dict\n\
    \            Parameters routed to the `predict_log_proba`, the `predict_proba`\
    \ or the\n            `proba` method of the sub-estimators via the metadata routing\
    \ API. The\n            routing is tried in the mentioned order depending on whether\
    \ this method is\n            available on the sub-estimator.\n\n            ..\
    \ versionadded:: 1.7\n\n                Only available if\n                `sklearn.set_config(enable_metadata_routing=True)`\
    \ is set. See\n                :ref:`Metadata Routing User Guide <metadata_routing>`\
    \ for more\n                details.\n\n        Returns\n        -------\n   \
    \     p : ndarray of shape (n_samples, n_classes)\n            The class log-probabilities\
    \ of the input samples. The order of the\n            classes corresponds to that\
    \ in the attribute :term:`classes_`.\n        \"\"\"\n        _raise_for_params(params,\
    \ self, \"predict_log_proba\")\n\n        check_is_fitted(self)\n\n        if\
    \ hasattr(self.estimator_, \"predict_log_proba\"):\n            # Check data\n\
    \            X = validate_data(\n                self,\n                X,\n \
    \               accept_sparse=[\"csr\", \"csc\"],\n                dtype=None,\n\
    \                ensure_all_finite=False,\n                reset=False,\n    \
    \        )\n\n            if _routing_enabled():\n                routed_params\
    \ = process_routing(self, \"predict_log_proba\", **params)\n            else:\n\
    \                routed_params = Bunch()\n                routed_params.estimator\
    \ = Bunch(predict_log_proba=Bunch())\n\n            # Parallel loop\n        \
    \    n_jobs, _, starts = _partition_estimators(self.n_estimators, self.n_jobs)\n\
    \n            all_log_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose)(\n\
    \                delayed(_parallel_predict_log_proba)(\n                    self.estimators_[starts[i]\
    \ : starts[i + 1]],\n                    self.estimators_features_[starts[i] :\
    \ starts[i + 1]],\n                    X,\n                    self.n_classes_,\n\
    \                    params=routed_params.estimator.predict_log_proba,\n     \
    \           )\n                for i in range(n_jobs)\n            )\n\n     \
    \       # Reduce\n            log_proba = all_log_proba[0]\n\n            for\
    \ j in range(1, len(all_log_proba)):\n                log_proba = np.logaddexp(log_proba,\
    \ all_log_proba[j])\n\n            log_proba -= np.log(self.n_estimators)\n\n\
    \        else:\n            log_proba = np.log(self.predict_proba(X, **params))\n\
    \n        return log_proba\n\n    @available_if(\n        _estimator_has(\"decision_function\"\
    , delegates=(\"estimators_\", \"estimator\"))\n    )\n    def decision_function(self,\
    \ X, **params):\n        \"\"\"Average of the decision functions of the base classifiers.\n\
    \n        Parameters\n        ----------\n        X : {array-like, sparse matrix}\
    \ of shape (n_samples, n_features)\n            The training input samples. Sparse\
    \ matrices are accepted only if\n            they are supported by the base estimator.\n\
    \n        **params : dict\n            Parameters routed to the `decision_function`\
    \ method of the sub-estimators\n            via the metadata routing API.\n\n\
    \            .. versionadded:: 1.7\n\n                Only available if\n    \
    \            `sklearn.set_config(enable_metadata_routing=True)` is set. See\n\
    \                :ref:`Metadata Routing User Guide <metadata_routing>` for more\n\
    \                details.\n\n        Returns\n        -------\n        score :\
    \ ndarray of shape (n_samples, k)\n            The decision function of the input\
    \ samples. The columns correspond\n            to the classes in sorted order,\
    \ as they appear in the attribute\n            ``classes_``. Regression and binary\
    \ classification are special\n            cases with ``k == 1``, otherwise ``k==n_classes``.\n\
    \        \"\"\"\n        _raise_for_params(params, self, \"decision_function\"\
    )\n\n        check_is_fitted(self)\n\n        # Check data\n        X = validate_data(\n\
    \            self,\n            X,\n            accept_sparse=[\"csr\", \"csc\"\
    ],\n            dtype=None,\n            ensure_all_finite=False,\n          \
    \  reset=False,\n        )\n\n        if _routing_enabled():\n            routed_params\
    \ = process_routing(self, \"decision_function\", **params)\n        else:\n  \
    \          routed_params = Bunch()\n            routed_params.estimator = Bunch(decision_function=Bunch())\n\
    \n        # Parallel loop\n        n_jobs, _, starts = _partition_estimators(self.n_estimators,\
    \ self.n_jobs)\n\n        all_decisions = Parallel(n_jobs=n_jobs, verbose=self.verbose)(\n\
    \            delayed(_parallel_decision_function)(\n                self.estimators_[starts[i]\
    \ : starts[i + 1]],\n                self.estimators_features_[starts[i] : starts[i\
    \ + 1]],\n                X,\n                params=routed_params.estimator.decision_function,\n\
    \            )\n            for i in range(n_jobs)\n        )\n\n        # Reduce\n\
    \        decisions = sum(all_decisions) / self.n_estimators\n\n        return\
    \ decisions"
  - "    def get_n_splits(self, X=None, y=None, groups=None):\n        return self.n_splits"
  - "def test_lsvc_intercept_scaling_zero():\n    # Test that intercept_scaling is\
    \ ignored when fit_intercept is False\n\n    lsvc = svm.LinearSVC(fit_intercept=False)\n\
    \    lsvc.fit(X, Y)\n    assert lsvc.intercept_ == 0.0"
- source_sentence: test power transformer 1d
  sentences:
  - "def test_power_transformer_1d():\n    X = np.abs(X_1col)\n\n    for standardize\
    \ in [True, False]:\n        pt = PowerTransformer(method=\"box-cox\", standardize=standardize)\n\
    \n        X_trans = pt.fit_transform(X)\n        X_trans_func = power_transform(X,\
    \ method=\"box-cox\", standardize=standardize)\n\n        X_expected, lambda_expected\
    \ = stats.boxcox(X.flatten())\n\n        if standardize:\n            X_expected\
    \ = scale(X_expected)\n\n        assert_almost_equal(X_expected.reshape(-1, 1),\
    \ X_trans)\n        assert_almost_equal(X_expected.reshape(-1, 1), X_trans_func)\n\
    \n        assert_almost_equal(X, pt.inverse_transform(X_trans))\n        assert_almost_equal(lambda_expected,\
    \ pt.lambdas_[0])\n\n        assert len(pt.lambdas_) == X.shape[1]\n        assert\
    \ isinstance(pt.lambdas_, np.ndarray)"
  - "def test_hdbscan_feature_array():\n    \"\"\"\n    Tests that HDBSCAN works with\
    \ feature array, including an arbitrary\n    goodness of fit check. Note that\
    \ the check is a simple heuristic.\n    \"\"\"\n    labels = HDBSCAN().fit_predict(X)\n\
    \n    # Check that clustering is arbitrarily good\n    # This is a heuristic to\
    \ guard against regression\n    check_label_quality(labels)"
  - "def test_pca_initialization_not_compatible_with_sparse_input(csr_container):\n\
    \    # Sparse input matrices cannot use PCA initialization.\n    tsne = TSNE(init=\"\
    pca\", learning_rate=100.0, perplexity=1)\n    with pytest.raises(TypeError, match=\"\
    PCA initialization.*\"):\n        tsne.fit_transform(csr_container([[0, 5], [5,\
    \ 0]]))"
- source_sentence: Evaluate predicted target values for X relative to y_true
  sentences:
  - "def test_hdbscan_usable_inputs(X, kwargs):\n    \"\"\"\n    Tests that HDBSCAN\
    \ works correctly for array-likes and precomputed inputs\n    with non-finite\
    \ points.\n    \"\"\"\n    HDBSCAN(min_samples=1, **kwargs).fit(X)"
  - "    def __call__(self, estimator, X, y_true, sample_weight=None, **kwargs):\n\
    \        \"\"\"Evaluate predicted target values for X relative to y_true.\n\n\
    \        Parameters\n        ----------\n        estimator : object\n        \
    \    Trained estimator to use for scoring. Must have a predict_proba\n       \
    \     method; the output of that is used to compute the score.\n\n        X :\
    \ {array-like, sparse matrix}\n            Test data that will be fed to estimator.predict.\n\
    \n        y_true : array-like\n            Gold standard target values for X.\n\
    \n        sample_weight : array-like of shape (n_samples,), default=None\n   \
    \         Sample weights.\n\n        **kwargs : dict\n            Other parameters\
    \ passed to the scorer. Refer to\n            :func:`set_score_request` for more\
    \ details.\n\n            Only available if `enable_metadata_routing=True`. See\
    \ the\n            :ref:`User Guide <metadata_routing>`.\n\n            .. versionadded::\
    \ 1.3\n\n        Returns\n        -------\n        score : float\n           \
    \ Score function applied to prediction of estimator on X.\n        \"\"\"\n  \
    \      # TODO (1.8): remove in 1.8 (scoring=\"max_error\" has been deprecated\
    \ in 1.6)\n        if self._deprecation_msg is not None:\n            warnings.warn(\n\
    \                self._deprecation_msg, category=DeprecationWarning, stacklevel=2\n\
    \            )\n\n        _raise_for_params(kwargs, self, None)\n\n        _kwargs\
    \ = copy.deepcopy(kwargs)\n        if sample_weight is not None:\n           \
    \ _kwargs[\"sample_weight\"] = sample_weight\n\n        return self._score(partial(_cached_call,\
    \ None), estimator, X, y_true, **_kwargs)"
  - '        def set_inverse_transform_request(self, **kwargs): pass'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on intfloat/e5-small-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2) <!-- at revision ffb93f3bd4047442299a41ebb6fa998a38507c52 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: PeftModelForFeatureExtraction 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Evaluate predicted target values for X relative to y_true',
    '    def __call__(self, estimator, X, y_true, sample_weight=None, **kwargs):\n        """Evaluate predicted target values for X relative to y_true.\n\n        Parameters\n        ----------\n        estimator : object\n            Trained estimator to use for scoring. Must have a predict_proba\n            method; the output of that is used to compute the score.\n\n        X : {array-like, sparse matrix}\n            Test data that will be fed to estimator.predict.\n\n        y_true : array-like\n            Gold standard target values for X.\n\n        sample_weight : array-like of shape (n_samples,), default=None\n            Sample weights.\n\n        **kwargs : dict\n            Other parameters passed to the scorer. Refer to\n            :func:`set_score_request` for more details.\n\n            Only available if `enable_metadata_routing=True`. See the\n            :ref:`User Guide <metadata_routing>`.\n\n            .. versionadded:: 1.3\n\n        Returns\n        -------\n        score : float\n            Score function applied to prediction of estimator on X.\n        """\n        # TODO (1.8): remove in 1.8 (scoring="max_error" has been deprecated in 1.6)\n        if self._deprecation_msg is not None:\n            warnings.warn(\n                self._deprecation_msg, category=DeprecationWarning, stacklevel=2\n            )\n\n        _raise_for_params(kwargs, self, None)\n\n        _kwargs = copy.deepcopy(kwargs)\n        if sample_weight is not None:\n            _kwargs["sample_weight"] = sample_weight\n\n        return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)',
    'def test_hdbscan_usable_inputs(X, kwargs):\n    """\n    Tests that HDBSCAN works correctly for array-likes and precomputed inputs\n    with non-finite points.\n    """\n    HDBSCAN(min_samples=1, **kwargs).fit(X)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 13,734 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence_0                                                                       | sentence_1                                                                          |
  |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                           | string                                                                              |
  | details | <ul><li>min: 3 tokens</li><li>mean: 8.78 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 233.15 tokens</li><li>max: 512 tokens</li></ul> |
* Samples:
  | sentence_0                                     | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
  |:-----------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>Get the estimator</code>                 | <code>    def _get_estimator(self):<br>        """Get the estimator.<br><br>        Returns<br>        -------<br>        estimator_ : estimator object<br>            The cloned estimator object.<br>        """<br>        # TODO(1.8): remove and only keep clone(self.estimator)<br>        if self.estimator is None and self.base_estimator != "deprecated":<br>            estimator_ = clone(self.base_estimator)<br><br>            warn(<br>                (<br>                    "`base_estimator` has been deprecated in 1.6 and will be removed"<br>                    " in 1.8. Please use `estimator` instead."<br>                ),<br>                FutureWarning,<br>            )<br>        # TODO(1.8) remove<br>        elif self.estimator is None and self.base_estimator == "deprecated":<br>            raise ValueError(<br>                "You must pass an estimator to SelfTrainingClassifier. Use `estimator`."<br>            )<br>        elif self.estimator is not None and self.base_estimator != "deprecated":<br>            raise ValueError(<br>                "You must p...</code>       |
  | <code>Gaussian Naive Bayes (GaussianNB)</code> | <code>class GaussianNB(_BaseNB):<br>    """<br>    Gaussian Naive Bayes (GaussianNB).<br><br>    Can perform online updates to model parameters via :meth:`partial_fit`.<br>    For details on algorithm used to update feature means and variance online,<br>    see `Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque<br>    <http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf>`_.<br><br>    Read more in the :ref:`User Guide <gaussian_naive_bayes>`.<br><br>    Parameters<br>    ----------<br>    priors : array-like of shape (n_classes,), default=None<br>        Prior probabilities of the classes. If specified, the priors are not<br>        adjusted according to the data.<br><br>    var_smoothing : float, default=1e-9<br>        Portion of the largest variance of all features that is added to<br>        variances for calculation stability.<br><br>        .. versionadded:: 0.20<br><br>    Attributes<br>    ----------<br>    class_count_ : ndarray of shape (n_classes,)<br>        number of training samples observed in each class.<br><br>    class_pri...</code> |
  | <code>test rfe cv n jobs</code>                | <code>def test_rfe_cv_n_jobs(global_random_seed):<br>    generator = check_random_state(global_random_seed)<br>    iris = load_iris()<br>    X = np.c_[iris.data, generator.normal(size=(len(iris.data), 6))]<br>    y = iris.target<br><br>    rfecv = RFECV(estimator=SVC(kernel="linear"))<br>    rfecv.fit(X, y)<br>    rfecv_ranking = rfecv.ranking_<br><br>    rfecv_cv_results_ = rfecv.cv_results_<br><br>    rfecv.set_params(n_jobs=2)<br>    rfecv.fit(X, y)<br>    assert_array_almost_equal(rfecv.ranking_, rfecv_ranking)<br><br>    assert rfecv_cv_results_.keys() == rfecv.cv_results_.keys()<br>    for key in rfecv_cv_results_.keys():<br>        assert rfecv_cv_results_[key] == pytest.approx(rfecv.cv_results_[key])</code>                                                                                                                                                                                                                                                                                                                                                                                         |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `fp16`: True
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step | Training Loss |
|:------:|:----:|:-------------:|
| 0.5821 | 500  | 0.6129        |


### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 3.4.1
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->