gte-modernbert-base / README.md
amentaphd's picture
Upload folder using huggingface_hub
44e0275 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:46338
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: Alibaba-NLP/gte-modernbert-base
widget:
  - source_sentence: >-
      What are the specific points and subparagraphs mentioned in the context of
      Article 4(3) that relate to the introductory wording and how do they
      connect to the provisions outlined in Article 3(1)?
    sentences:
      - >-
        51 - Article 2, points 52, 53,54, 55 and 56 - Article 3 - Article 4(1)
        Article 3(1), first subparagraph Article 4(2), first subparagraph
        Article 4(2), second subparagraph Article 3(1), second subparagraph,
        introductory wording Article 4(3), first subparagraph, introductory
        wording Article 3(1), second subparagraph, points (a) and (b) Article
        4(3), first subparagraph, points (a) and (b) Article 3(1), second
        subparagraph, point (c) - Article 3(1), second subparagraph, point (d)
        Article 4(3), first subparagraph, point (c) Article 3(1), third
        subparagraph, introductory wording - - Article 4(3), first subparagraph,
        point (d), introductory wording - Article 4(3), first subparagraph,
        points (d)(i), (ii) and (iii) Article 3(1), third subparagraph, point
        (a) Article 4(3), first subparagraph, point (d)(iv) - Article 4(3),
        first subparagraph, point (e), introductory wording Article 3(1), third
        subparagraph, point (b) Article 4(3), first subparagraph, point (e)(i)
        Article 3(1), third subparagraph, point (c) Article 4(3), first
        subparagraph, point (e)(ii) Article 3(1), third subparagraph, point (d)
        Article 4(3), first subparagraph, point (e)(iii) Article 3(1), third
        subparagraph, point (e) - - Article 4(3), first subparagraph, point
        (e)(iv) Article 3(2) and (3) - Article 3(4) Article 35(6) Article 3(5)
        and (6) - - Article 4(4) - Article 4(5) Article 4(6) Article 4(7) -
        Article 5 Article 5(1), first subparagraph Article 6(1), first
        subparagraph Article 5(1), second subparagraph Article 6(1), fifth
        subparagraph - Article 6(1), second and third subparagraph Article 5(1),
        third subparagraph Article 6(1), fourth subparagraph Article 5(1),
        fourth and fifth subparagraph - Article 5(2) - Article 6(2) Article
        6(2), second subparagraph Article 5(3) Article 6(3) Article 5(4) Article
        6(4) Article 5(5) Article 6(5) Article 5(5), first subparagraph, point
        (b) Article 6(5), second subparagraph, point (c) - Article 6(5), second
        subparagraph, point (b) Article 5(6) Article 6(6) - Article 6(6), second
        subparagraph, point (a) Article 5(6), second subparagraph Article 6(6),
        second subparagraph, point (b) Article 5(6), third subparagraph Article
        6(6), third subparagraph Article 5(7) - Article 6(1), first subparagraph
        Article 7(1), first
      - >-
        ii.


        measures to protect against retaliation its own workers who are
        whistleblowers in accordance with the applicable law transposing
        Directive (EU) 2019/1937 of the European Parliament and of the Council (
        121 );


        (d)


        where the undertaking has no policies on the protection of
        whistle-blowers ( 122 ), it shall state this and whether it has plans to
        implement them and the timetable for implementation;


        (e)


        beyond the procedures to follow-up on reports by whistleblowers in
        accordance with the applicable law transposing Directive (EU) 2019/1937,
        whether the undertaking has procedures to investigate business conduct
        incidents , including incidents of corruption and bribery , promptly,
        independently and objectively;


        (f)


        where applicable, whether the undertaking has in place policies with
        respect to animal welfare;


        (g)


        the undertaking’s policy for training within the organisation on
        business conduct, including target audience, frequency and depth of
        coverage; and


        (h)


        the functions within the undertaking that are most at risk in respect of
        corruption and bribery .


        Undertakings that are subject to legal requirements under national law
        transposing Directive (EU) 2019/1937, or to equivalent legal
        requirements with regard to the protection of whistle-blowers, may
        comply with the disclosure specified in paragraph 10 (d) by stating that
        they are subject to those legal requirements.


        Disclosure Requirement G1-2  Management of relationships with suppliers


        The undertaking shall provide information about the management of its
        relationships with its suppliers and its impacts on its supply chain.


        The objective of this Disclosure Requirement is to provide an
        understanding of the undertaking’s management of its procurement process
        including fair behaviour with suppliers .


        The undertaking shall provide a description of its policy to prevent
        late payments, specifically to SMEs.


        The disclosure required under paragraph 12 shall include the following
        information:


        (a)


        the undertaking’s approach to its relationships with its suppliers ,
        taking account of risks to the undertaking related to its supply chain
        and of impacts on sustainability matters ; and


        (b)


        whether and how it takes into account social and environmental criteria
        for the selection of its suppliers.


        Disclosure Requirement G1-3  Prevention and detection of corruption and
        bribery


        The undertaking shall provide information about its system to prevent
        and detect, investigate, and respond to allegations or incidents
        relating to corruption and bribery including the related training.


        The objective of this Disclosure Requirement is to provide transparency
        on the key procedures of the undertaking to prevent, detect, and address
        allegations about corruption and bribery . This includes the training
        provided to own workers and/or information provided internally or to
        suppliers .


        The disclosure required under paragraph 16 shall include the following
        information:


        (a)


        a description of the procedures in place to prevent, detect, and address
        allegations or incidents of corruption and bribery ;


        (b)


        whether the investigators or investigating committee are separate from
        the chain of management involved in the matter; and


        (c)


        the process, if any, to report outcomes to the administrative,
        management and supervisory bodies .


        Where the undertaking has no such procedures in place, it shall disclose
        this fact and, where applicable, its plans to adopt them.


        The disclosures required by paragraph 16 shall include information about
        how the undertaking communicates its policies to those for whom they are
        relevant to ensure that the policy is accessible and that they
        understand its implications.


        The disclosure required by paragraph 16 shall include information about
        the following with respect to training:


        (a)


        the nature, scope and depth of anti- corruption and anti- bribery
        training programmes offered or required by the undertaking;


        (b)


        the percentage of functions-at-risk covered by training programmes; and


        (c)


        the extent to which training is given to members of the administrative,
        management and supervisory bodies.


        Metrics and targets


        Disclosure Requirement G1-4  Incidents of corruption or bribery


        The undertaking shall provide information on incidents of corruption or
        bribery during the reporting period.
      - >-
        (39)


        ‘algorithmic trading’ means trading in financial instruments where a
        computer algorithm automatically determines individual parameters of
        orders such as whether to initiate the order, the timing, price or
        quantity of the order or how to manage the order after its submission,
        with limited or no human intervention, and does not include any system
        that is only used for the purpose of routing orders to one or more
        trading venues or for the processing of orders involving no
        determination of any trading parameters or for the confirmation of
        orders or the post-trade processing of executed transactions;


        (40)


        ‘high-frequency algorithmic trading technique’ means an algorithmic
        trading technique characterised by:


        (a)
  - source_sentence: >-
      What action does the Commission take if the scheme owner fails to address
      the deficiencies and the scheme no longer meets the criteria in Annex IV?
    sentences:
      - >-
        2.


        Implementing partners shall fill out the Scoreboard for their proposals
        for financing and investment operations.


        3.


        The Scoreboard shall cover the following elements:


        (a)


        a description of the proposed financing or investment operation;


        (b)


        how the proposed financing or investment operation contributes to EU
        policy objectives;


        (c)


        a description of additionality;


        (d)


        a description of the market failure or suboptimal investment situation;


        (e)


        the financial and technical contribution by the implementing partner;


        (f)


        the impact of the investment;


        (g)


        the financial profile of the financing or investment operation;


        (h)


        complementary indicators.


        4.


        The Commission is empowered to adopt delegated acts in accordance with
        Article 34 in order to supplement this Regulation by establishing
        additional elements of the Scoreboard, including detailed rules for the
        Scoreboard to be used by the implementing partners.


        Article 23


        Policy check


        1.


        The Commission shall conduct a check to confirm that the financing and
        investment operations proposed by the implementing partners other than
        the EIB comply with Union law and policies.


        2.


        EIB financing and investment operations that fall within the scope of
        this Regulation shall not be covered by the EU guarantee where the
        Commission delivers an unfavourable opinion within the framework of the
        procedure provided for in Article 19 of the EIB Statute.


        ▼M1


        3.


        In the context of the procedures referred to in paragraphs 1 and 2 of
        this Article, the Commission shall take into account any Sovereignty
        Seal awarded in accordance with Article 4 of Regulation (EU) 2024/795 to
        a project.


        ▼B


        Article 24


        Investment Committee


        1.


        A fully independent investment committee shall be established for the
        InvestEU Fund (the ‘Investment Committee’). The Investment Committee
        shall:


        (a)


        examine the proposals for financing and investment operations submitted
        by implementing partners for coverage under the EU guarantee that have
        passed the policy check referred to in Article 23(1) of this Regulation
        or that have received a favourable opinion within the framework of the
        procedure provided for in Article 19 of the EIB Statute;


        (b)
      - >-
        (6) |  The maritime transport sector is subject to strong international
        competition. Major differences in regulatory burdens across flag states
        have often led to unwanted practices such as the reflagging of ships.
        The sector’s intrinsic global character underlines the importance of a
        flag-neutral approach and of a favourable regulatory environment, which
        would help to attract new investment and safeguard the competitiveness
        of Union ports, shipowners and ship operators.
      - >-
        8.


        Where the scheme owner fails or refuses to take the necessary remedial
        action and where the Commission has determined that the deficiencies
        referred to in paragraph 6 of this Article mean that the scheme no
        longer fulfils the criteria laid down in Annex IV, or of the recognised
        subset of those criteria, the Commission shall withdraw the recognition
        of the scheme by means of implementing acts. Those implementing acts
        shall be adopted in accordance with the examination procedure referred
        to in Article 39(3).


        9.
  - source_sentence: >-
      What roles do upstream and downstream business partners play in the
      overall production and distribution process as described?
    sentences:
      - >-
        (25) The chain of activities should cover activities of a company’s
        upstream business partners related to the production of goods or the
        provision of services by the company, including the design, extraction,
        sourcing, manufacture, transport, storage and supply of raw materials,
        products or parts of the products and development of the product or the
        service, and activities of a company’s downstream business partners
        related to the distribution, transport and storage of the product, where
        the business partners carry out those activities for the company or on
        behalf of the company. This Directive should not cover the disposal of
        the product. In addition, under this Directive the chain of activities
        should not encompass the distribution,
      - >-
        7.


        Any actor in the supply chain who is required to prepare a chemical
        safety report according to Articles 14 or 37 shall place the relevant
        exposure scenarios (including use and exposure categories where
        appropriate) in an annex to the safety data sheet covering identified
        uses and including specific conditions resulting from the application of
        Section 3 of Annex XI.


        Any downstream user shall include relevant exposure scenarios, and use
        other relevant information, from the safety data sheet supplied to him
        when compiling his own safety data sheet for identified uses.
      - >-
        8.


        Authorisations shall be subject to a time-limited review without
        prejudice to any decision on a future review period and shall normally
        be subject to conditions, including monitoring. The duration of the
        time-limited review for any authorisation shall be determined on a
        case-by-case basis taking into account all relevant information
        including the elements listed in paragraph 4(a) to (d), as appropriate.


        9.


        The authorisation shall specify:


        (a)


        the person(s) to whom the authorisation is granted;


        (b)


        the identity of the substance(s);


        (c)


        the use(s) for which the authorisation is granted;


        (d)


        any conditions under which the authorisation is granted;


        (e)


        the time-limited review period;


        (f)


        any monitoring arrangement.


        10.
  - source_sentence: >-
      What conditions must be met for the stability study in organic solvents to
      be deemed unnecessary for a substance?
    sentences:
      - >-
        AR 23. When disclosing information required under paragraph 29 for the
        purpose of setting targets the undertaking shall consider the need for
        an informed and willing consent of local and indigenous peoples , the
        need for appropriate consultations and the need to respect the decisions
        of these communities.


        AR 24. The targets related to material impacts may be presented in a
        table as illustrated below:


        Type of target according to mitigation hierarchy Baseline value and base
        year Target value and geographical scope Connected policy or legislation
        if relevant 2025 2030 Up to 2050 Avoidance Minimisation Rehabilitation
        and restoration Compensation or offsets
      - >-
        1.


        Member States shall, in accordance with paragraph 2, draw up a register
        of producers, including producers supplying EEE by means of distance
        communication. That register shall serve to monitor compliance with the
        requirements of this Directive.


        Producers supplying EEE by means of distance communication as defined in
        Article 3(1)(f)(iv) shall be registered in the Member State that they
        sell to. Where such producers are not registered in the Member State
        that they are selling to, they shall be registered through their
        authorised representatives as referred to in Article 17(2).


        2.


        Member States shall ensure that:


        (a)


        each producer, or each authorised representative where appointed under
        Article 17, is registered as required and has the possibility of
        entering online in their national register all relevant information
        reflecting that producer’s activities in that Member State;


        (b)


        upon registering, each producer, or each authorised representative where
        appointed under Article 17, provides the information set out in Annex X,
        Part A, undertaking to update it as appropriate;


        (c)


        each producer, or each authorised representative where appointed under
        Article 17, provides the information set out in Annex X, Part B;


        (d)


        national registers provide links to other national registers on their
        website to facilitate, in all Member States, registration of producers
        or, where appointed under Article 17, authorised representatives.


        3.


        In order to ensure uniform conditions for the implementation of this
        Article, the Commission shall adopt implementing acts establishing the
        format for registration and reporting and the frequency of reporting to
        the register. Those implementing acts shall be adopted in accordance
        with the examination procedure referred to in Article 21(2).


        4.


        Member States shall collect information, including substantiated
        estimates, on an annual basis, on the quantities and categories of EEE
        placed on their markets, collected through all routes, prepared for
        re-use, recycled and recovered within the Member State, and on
        separately collected WEEE exported, by weight.


        ▼M1 —————


        ▼M1


        6.
      - >-
        COLUMN 1 STANDARD INFORMATION REQUIRED COLUMN 2 SPECIFIC RULES FOR
        ADAPTATION FROM COLUMN 1 7.15. Stability in organic solvents and
        identity of relevant degradation products Only required if stability of
        the substance is considered to be critical. 7.15. The study does not
        need to be conducted if the substance is inorganic. 7.16. Dissociation
        constant 7.16. The study does not need to be conducted if: — the
        substance is hydrolytically unstable (half-life less than 12 hours) or
        is readily oxidisable in water, or ►M70 ◄ ►M64 — or based on the
        structure, the substance does not have any chemical group that can
        dissociate. ◄ 7.17. Viscosity ►M64 For hydrocarbon substances the
        kinematic viscosity shall be determined at 40 °C. ◄
  - source_sentence: >-
      How is 'associated undertaking' defined, and what criteria determine the
      significant influence of one undertaking over another in terms of voting
      rights?
    sentences:
      - >-
        ▼B


        (6)


        ‘purchase price’ means the price payable and any incidental expenses
        minus any incidental reductions in the cost of acquisition;


        (7)


        ‘production cost’ means the purchase price of raw materials, consumables
        and other costs directly attributable to the item in question. Member
        States shall permit or require the inclusion of a reasonable proportion
        of fixed or variable overhead costs indirectly attributable to the item
        in question, to the extent that they relate to the period of production.
        Distribution costs shall not be included;


        (8)


        ‘value adjustment’ means the adjustments intended to take account of
        changes in the values of individual assets established at the balance
        sheet date, whether the change is final or not;


        (9)


        ‘parent undertaking’ means an undertaking which controls one or more
        subsidiary undertakings;


        (10)


        ‘subsidiary undertaking’ means an undertaking controlled by a parent
        undertaking, including any subsidiary undertaking of an ultimate parent
        undertaking;


        (11)


        ‘group’ means a parent undertaking and all its subsidiary undertakings;


        (12)


        ‘affiliated undertakings’ means any two or more undertakings within a
        group;


        (13)


        ‘associated undertaking’ means an undertaking in which another
        undertaking has a participating interest, and over whose operating and
        financial policies that other undertaking exercises significant
        influence. An undertaking is presumed to exercise a significant
        influence over another undertaking where it has 20 % or more of the
        shareholders' or members' voting rights in that other undertaking;


        (14)


        ‘investment undertakings’ means:


        (a)


        undertakings the sole object of which is to invest their funds in
        various securities, real property and other assets, with the sole aim of
        spreading investment risks and giving their shareholders the benefit of
        the results of the management of their assets,


        (b)


        undertakings associated with investment undertakings with fixed capital,
        if the sole object of those associated undertakings is to acquire fully
        paid shares issued by those investment undertakings without prejudice to
        point (h) of Article 22(1) of Directive 2012/30/EU;


        (15)
      - >-
        and non-European non-financial corporations not subject to the
        disclosure obligations laid down in Directive 2013/34/EU. That
        information may be disclosed only once, based on counterparties’
        turnover alignment for the general-purpose lending loans, as in the case
        of the GAR. The first disclosure reference date of this template is as
        of 31 December 2024. Institutions are not required to disclose this
        information before 1 January 2025. ---|---|---
      - >-
        ANNEX II


        Due diligence statement


        Information to be contained in the due diligence statement in accordance
        with Article 4(2):


        1.


        Operator’s name, address and, in the event of relevant commodities and
        relevant products entering or leaving the market, the Economic Operators
        Registration and Identification (EORI) number in accordance with Article
        9 of Regulation (EU) No 952/2013.


        2.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.6910063870188158
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9109269808389435
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9461418953909891
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9742793026065941
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6910063870188158
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.30364232694631454
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18922837907819778
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09742793026065939
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6910063870188158
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9109269808389435
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9461418953909891
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9742793026065941
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8471731447814336
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.804833419644399
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8061197699360279
            name: Cosine Map@100

SentenceTransformer based on Alibaba-NLP/gte-modernbert-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-modernbert-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "How is 'associated undertaking' defined, and what criteria determine the significant influence of one undertaking over another in terms of voting rights?",
    "▼B\n\n(6)\n\n‘purchase price’ means the price payable and any incidental expenses minus any incidental reductions in the cost of acquisition;\n\n(7)\n\n‘production cost’ means the purchase price of raw materials, consumables and other costs directly attributable to the item in question. Member States shall permit or require the inclusion of a reasonable proportion of fixed or variable overhead costs indirectly attributable to the item in question, to the extent that they relate to the period of production. Distribution costs shall not be included;\n\n(8)\n\n‘value adjustment’ means the adjustments intended to take account of changes in the values of individual assets established at the balance sheet date, whether the change is final or not;\n\n(9)\n\n‘parent undertaking’ means an undertaking which controls one or more subsidiary undertakings;\n\n(10)\n\n‘subsidiary undertaking’ means an undertaking controlled by a parent undertaking, including any subsidiary undertaking of an ultimate parent undertaking;\n\n(11)\n\n‘group’ means a parent undertaking and all its subsidiary undertakings;\n\n(12)\n\n‘affiliated undertakings’ means any two or more undertakings within a group;\n\n(13)\n\n‘associated undertaking’ means an undertaking in which another undertaking has a participating interest, and over whose operating and financial policies that other undertaking exercises significant influence. An undertaking is presumed to exercise a significant influence over another undertaking where it has 20 % or more of the shareholders' or members' voting rights in that other undertaking;\n\n(14)\n\n‘investment undertakings’ means:\n\n(a)\n\nundertakings the sole object of which is to invest their funds in various securities, real property and other assets, with the sole aim of spreading investment risks and giving their shareholders the benefit of the results of the management of their assets,\n\n(b)\n\nundertakings associated with investment undertakings with fixed capital, if the sole object of those associated undertakings is to acquire fully paid shares issued by those investment undertakings without prejudice to point (h) of Article 22(1) of Directive 2012/30/EU;\n\n(15)",
    'and non-European non-financial corporations not subject to the disclosure obligations laid down in Directive 2013/34/EU. That information may be disclosed only once, based on counterparties’ turnover alignment for the general-purpose lending loans, as in the case of the GAR. The first disclosure reference date of this template is as of 31 December 2024. Institutions are not required to disclose this information before 1 January 2025. ---|---|---',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.691
cosine_accuracy@3 0.9109
cosine_accuracy@5 0.9461
cosine_accuracy@10 0.9743
cosine_precision@1 0.691
cosine_precision@3 0.3036
cosine_precision@5 0.1892
cosine_precision@10 0.0974
cosine_recall@1 0.691
cosine_recall@3 0.9109
cosine_recall@5 0.9461
cosine_recall@10 0.9743
cosine_ndcg@10 0.8472
cosine_mrr@10 0.8048
cosine_map@100 0.8061

Training Details

Training Dataset

Unnamed Dataset

  • Size: 46,338 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 13 tokens
    • mean: 34.18 tokens
    • max: 251 tokens
    • min: 7 tokens
    • mean: 231.33 tokens
    • max: 2146 tokens
  • Samples:
    sentence_0 sentence_1
    How is 'energy efficiency' defined in the context of Directive (EU) 2018/2001? of Directive (EU) 2018/2001; --- --- (8) ‘energy efficiency’ means the ratio of output of performance, service, goods or energy to input of energy; --- --- (9) ‘energy savings’ means an amount of saved energy determined by measuring or estimating consumption, or both,, before and after the implementation of an energy efficiency improvement measure, whilst ensuring normalisation for external conditions that affect energy consumption; --- --- (10) ‘energy efficiency improvement’ means an increase in energy efficiency as a result of any technological, behavioural or economic changes; --- --- (11) ‘energy service’ means the physical benefit, utility or good derived from a combination of energy with energy-efficient technology or with action,
    What are the sources of information that the external experts will use to create the list of conflict-affected and high-risk areas? 2.

    The Commission shall call upon external expertise that will provide an indicative, non-exhaustive, regularly updated list of conflict-affected and high-risk areas. That list shall be based on the external experts' analysis of the handbook referred to in paragraph 1 and existing information from, inter alia, academics and supply chain due diligence schemes. Union importers sourcing from areas which are not mentioned on that list shall also maintain their responsibility to comply with the due diligence obligations under this Regulation.

    Article 15

    Committee procedure

    1.

    The Commission shall be assisted by a committee. That committee shall be a committee within the meaning of Regulation (EU) No 182/2011.

    2.
    What is the maximum time frame for completing the undertaking according to the technical specifications set out in Annexes II and III after the Directive enters into force? is undertaken according to the technical specifications set out in Annexes II and III and that it is completed at the latest four years after the date of entry into force of this Directive.

    2. The analyses and reviews mentioned under paragraph 1 shall be reviewed, and if necessary updated at the latest 13 years after the date of entry into force of this Directive and every six years thereafter.

    Article 6

    Register of protected areas
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 4
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss cosine_ndcg@10
0.0432 500 0.358 -
0.0863 1000 0.1048 -
0.1295 1500 0.0827 -
0.1726 2000 0.067 0.7969
0.2158 2500 0.0491 -
0.2590 3000 0.0831 -
0.3021 3500 0.062 -
0.3453 4000 0.0657 0.8050
0.3884 4500 0.0522 -
0.4316 5000 0.049 -
0.4748 5500 0.0426 -
0.5179 6000 0.0708 0.8215
0.5611 6500 0.0236 -
0.6042 7000 0.024 -
0.6474 7500 0.0256 -
0.6905 8000 0.041 0.8105
0.7337 8500 0.0285 -
0.7769 9000 0.0249 -
0.8200 9500 0.0368 -
0.8632 10000 0.0588 0.8118
0.9063 10500 0.0386 -
0.9495 11000 0.0456 -
0.9927 11500 0.0399 -
1.0 11585 - 0.8184
1.0358 12000 0.0424 0.8239
1.0790 12500 0.0107 -
1.1221 13000 0.0279 -
1.1653 13500 0.0236 -
1.2085 14000 0.024 0.8193
1.2516 14500 0.0143 -
1.2948 15000 0.0118 -
1.3379 15500 0.0078 -
1.3811 16000 0.023 0.8217
1.4243 16500 0.0239 -
1.4674 17000 0.0335 -
1.5106 17500 0.0119 -
1.5537 18000 0.0411 0.8292
1.5969 18500 0.0168 -
1.6401 19000 0.0059 -
1.6832 19500 0.0234 -
1.7264 20000 0.0184 0.8366
1.7695 20500 0.0128 -
1.8127 21000 0.0166 -
1.8558 21500 0.0181 -
1.8990 22000 0.0148 0.8353
1.9422 22500 0.0225 -
1.9853 23000 0.0158 -
2.0 23170 - 0.8360
2.0285 23500 0.0123 -
2.0716 24000 0.0173 0.8329
2.1148 24500 0.0167 -
2.1580 25000 0.0125 -
2.2011 25500 0.013 -
2.2443 26000 0.0079 0.8338
2.2874 26500 0.007 -
2.3306 27000 0.0171 -
2.3738 27500 0.0058 -
2.4169 28000 0.0048 0.8405
2.4601 28500 0.005 -
2.5032 29000 0.0141 -
2.5464 29500 0.0132 -
2.5896 30000 0.006 0.8461
2.6327 30500 0.0095 -
2.6759 31000 0.0061 -
2.7190 31500 0.0107 -
2.7622 32000 0.0157 0.8451
2.8054 32500 0.005 -
2.8485 33000 0.0087 -
2.8917 33500 0.0064 -
2.9348 34000 0.005 0.8449
2.9780 34500 0.0115 -
3.0 34755 - 0.8451
3.0211 35000 0.0079 -
3.0643 35500 0.0045 -
3.1075 36000 0.0029 0.8443
3.1506 36500 0.0161 -
3.1938 37000 0.0144 -
3.2369 37500 0.0076 -
3.2801 38000 0.0157 0.8500
3.3233 38500 0.0039 -
3.3664 39000 0.0045 -
3.4096 39500 0.0033 -
3.4527 40000 0.0064 0.8434
3.4959 40500 0.0054 -
3.5391 41000 0.0061 -
3.5822 41500 0.0051 -
3.6254 42000 0.0019 0.8472

Framework Versions

  • Python: 3.10.15
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu126
  • Accelerate: 1.5.2
  • Datasets: 3.4.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}