File size: 48,524 Bytes
44e0275 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 |
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:46338
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Alibaba-NLP/gte-modernbert-base
widget:
- source_sentence: What are the specific points and subparagraphs mentioned in the
context of Article 4(3) that relate to the introductory wording and how do they
connect to the provisions outlined in Article 3(1)?
sentences:
- 51 - Article 2, points 52, 53,54, 55 and 56 - Article 3 - Article 4(1) Article
3(1), first subparagraph Article 4(2), first subparagraph Article 4(2), second
subparagraph Article 3(1), second subparagraph, introductory wording Article 4(3),
first subparagraph, introductory wording Article 3(1), second subparagraph, points
(a) and (b) Article 4(3), first subparagraph, points (a) and (b) Article 3(1),
second subparagraph, point (c) - Article 3(1), second subparagraph, point (d)
Article 4(3), first subparagraph, point (c) Article 3(1), third subparagraph,
introductory wording - - Article 4(3), first subparagraph, point (d), introductory
wording - Article 4(3), first subparagraph, points (d)(i), (ii) and (iii) Article
3(1), third subparagraph, point (a) Article 4(3), first subparagraph, point (d)(iv)
- Article 4(3), first subparagraph, point (e), introductory wording Article 3(1),
third subparagraph, point (b) Article 4(3), first subparagraph, point (e)(i) Article
3(1), third subparagraph, point (c) Article 4(3), first subparagraph, point (e)(ii)
Article 3(1), third subparagraph, point (d) Article 4(3), first subparagraph,
point (e)(iii) Article 3(1), third subparagraph, point (e) - - Article 4(3), first
subparagraph, point (e)(iv) Article 3(2) and (3) - Article 3(4) Article 35(6)
Article 3(5) and (6) - - Article 4(4) - Article 4(5) Article 4(6) Article 4(7)
- Article 5 Article 5(1), first subparagraph Article 6(1), first subparagraph
Article 5(1), second subparagraph Article 6(1), fifth subparagraph - Article 6(1),
second and third subparagraph Article 5(1), third subparagraph Article 6(1), fourth
subparagraph Article 5(1), fourth and fifth subparagraph - Article 5(2) - Article
6(2) Article 6(2), second subparagraph Article 5(3) Article 6(3) Article 5(4)
Article 6(4) Article 5(5) Article 6(5) Article 5(5), first subparagraph, point
(b) Article 6(5), second subparagraph, point (c) - Article 6(5), second subparagraph,
point (b) Article 5(6) Article 6(6) - Article 6(6), second subparagraph, point
(a) Article 5(6), second subparagraph Article 6(6), second subparagraph, point
(b) Article 5(6), third subparagraph Article 6(6), third subparagraph Article
5(7) - Article 6(1), first subparagraph Article 7(1), first
- 'ii.
measures to protect against retaliation its own workers who are whistleblowers
in accordance with the applicable law transposing Directive (EU) 2019/1937 of
the European Parliament and of the Council ( 121 );
(d)
where the undertaking has no policies on the protection of whistle-blowers ( 122
), it shall state this and whether it has plans to implement them and the timetable
for implementation;
(e)
beyond the procedures to follow-up on reports by whistleblowers in accordance
with the applicable law transposing Directive (EU) 2019/1937, whether the undertaking
has procedures to investigate business conduct incidents , including incidents
of corruption and bribery , promptly, independently and objectively;
(f)
where applicable, whether the undertaking has in place policies with respect to
animal welfare;
(g)
the undertaking’s policy for training within the organisation on business conduct,
including target audience, frequency and depth of coverage; and
(h)
the functions within the undertaking that are most at risk in respect of corruption
and bribery .
Undertakings that are subject to legal requirements under national law transposing
Directive (EU) 2019/1937, or to equivalent legal requirements with regard to the
protection of whistle-blowers, may comply with the disclosure specified in paragraph
10 (d) by stating that they are subject to those legal requirements.
Disclosure Requirement G1-2 – Management of relationships with suppliers
The undertaking shall provide information about the management of its relationships
with its suppliers and its impacts on its supply chain.
The objective of this Disclosure Requirement is to provide an understanding of
the undertaking’s management of its procurement process including fair behaviour
with suppliers .
The undertaking shall provide a description of its policy to prevent late payments,
specifically to SMEs.
The disclosure required under paragraph 12 shall include the following information:
(a)
the undertaking’s approach to its relationships with its suppliers , taking account
of risks to the undertaking related to its supply chain and of impacts on sustainability
matters ; and
(b)
whether and how it takes into account social and environmental criteria for the
selection of its suppliers.
Disclosure Requirement G1-3 – Prevention and detection of corruption and bribery
The undertaking shall provide information about its system to prevent and detect,
investigate, and respond to allegations or incidents relating to corruption and
bribery including the related training.
The objective of this Disclosure Requirement is to provide transparency on the
key procedures of the undertaking to prevent, detect, and address allegations
about corruption and bribery . This includes the training provided to own workers
and/or information provided internally or to suppliers .
The disclosure required under paragraph 16 shall include the following information:
(a)
a description of the procedures in place to prevent, detect, and address allegations
or incidents of corruption and bribery ;
(b)
whether the investigators or investigating committee are separate from the chain
of management involved in the matter; and
(c)
the process, if any, to report outcomes to the administrative, management and
supervisory bodies .
Where the undertaking has no such procedures in place, it shall disclose this
fact and, where applicable, its plans to adopt them.
The disclosures required by paragraph 16 shall include information about how the
undertaking communicates its policies to those for whom they are relevant to ensure
that the policy is accessible and that they understand its implications.
The disclosure required by paragraph 16 shall include information about the following
with respect to training:
(a)
the nature, scope and depth of anti- corruption and anti- bribery training programmes
offered or required by the undertaking;
(b)
the percentage of functions-at-risk covered by training programmes; and
(c)
the extent to which training is given to members of the administrative, management
and supervisory bodies.
Metrics and targets
Disclosure Requirement G1-4 – Incidents of corruption or bribery
The undertaking shall provide information on incidents of corruption or bribery
during the reporting period.'
- '(39)
‘algorithmic trading’ means trading in financial instruments where a computer
algorithm automatically determines individual parameters of orders such as whether
to initiate the order, the timing, price or quantity of the order or how to manage
the order after its submission, with limited or no human intervention, and does
not include any system that is only used for the purpose of routing orders to
one or more trading venues or for the processing of orders involving no determination
of any trading parameters or for the confirmation of orders or the post-trade
processing of executed transactions;
(40)
‘high-frequency algorithmic trading technique’ means an algorithmic trading technique
characterised by:
(a)'
- source_sentence: What action does the Commission take if the scheme owner fails
to address the deficiencies and the scheme no longer meets the criteria in Annex
IV?
sentences:
- '2.
Implementing partners shall fill out the Scoreboard for their proposals for financing
and investment operations.
3.
The Scoreboard shall cover the following elements:
(a)
a description of the proposed financing or investment operation;
(b)
how the proposed financing or investment operation contributes to EU policy objectives;
(c)
a description of additionality;
(d)
a description of the market failure or suboptimal investment situation;
(e)
the financial and technical contribution by the implementing partner;
(f)
the impact of the investment;
(g)
the financial profile of the financing or investment operation;
(h)
complementary indicators.
4.
The Commission is empowered to adopt delegated acts in accordance with Article
34 in order to supplement this Regulation by establishing additional elements
of the Scoreboard, including detailed rules for the Scoreboard to be used by the
implementing partners.
Article 23
Policy check
1.
The Commission shall conduct a check to confirm that the financing and investment
operations proposed by the implementing partners other than the EIB comply with
Union law and policies.
2.
EIB financing and investment operations that fall within the scope of this Regulation
shall not be covered by the EU guarantee where the Commission delivers an unfavourable
opinion within the framework of the procedure provided for in Article 19 of the
EIB Statute.
▼M1
3.
In the context of the procedures referred to in paragraphs 1 and 2 of this Article,
the Commission shall take into account any Sovereignty Seal awarded in accordance
with Article 4 of Regulation (EU) 2024/795 to a project.
▼B
Article 24
Investment Committee
1.
A fully independent investment committee shall be established for the InvestEU
Fund (the ‘Investment Committee’). The Investment Committee shall:
(a)
examine the proposals for financing and investment operations submitted by implementing
partners for coverage under the EU guarantee that have passed the policy check
referred to in Article 23(1) of this Regulation or that have received a favourable
opinion within the framework of the procedure provided for in Article 19 of the
EIB Statute;
(b)'
- (6) | The maritime transport sector is subject to strong international competition.
Major differences in regulatory burdens across flag states have often led to unwanted
practices such as the reflagging of ships. The sector’s intrinsic global character
underlines the importance of a flag-neutral approach and of a favourable regulatory
environment, which would help to attract new investment and safeguard the competitiveness
of Union ports, shipowners and ship operators.
- '8.
Where the scheme owner fails or refuses to take the necessary remedial action
and where the Commission has determined that the deficiencies referred to in paragraph
6 of this Article mean that the scheme no longer fulfils the criteria laid down
in Annex IV, or of the recognised subset of those criteria, the Commission shall
withdraw the recognition of the scheme by means of implementing acts. Those implementing
acts shall be adopted in accordance with the examination procedure referred to
in Article 39(3).
9.'
- source_sentence: What roles do upstream and downstream business partners play in
the overall production and distribution process as described?
sentences:
- (25) The chain of activities should cover activities of a company’s upstream business
partners related to the production of goods or the provision of services by the
company, including the design, extraction, sourcing, manufacture, transport, storage
and supply of raw materials, products or parts of the products and development
of the product or the service, and activities of a company’s downstream business
partners related to the distribution, transport and storage of the product, where
the business partners carry out those activities for the company or on behalf
of the company. This Directive should not cover the disposal of the product. In
addition, under this Directive the chain of activities should not encompass the
distribution,
- '7.
Any actor in the supply chain who is required to prepare a chemical safety report
according to Articles 14 or 37 shall place the relevant exposure scenarios (including
use and exposure categories where appropriate) in an annex to the safety data
sheet covering identified uses and including specific conditions resulting from
the application of Section 3 of Annex XI.
Any downstream user shall include relevant exposure scenarios, and use other relevant
information, from the safety data sheet supplied to him when compiling his own
safety data sheet for identified uses.'
- '8.
Authorisations shall be subject to a time-limited review without prejudice to
any decision on a future review period and shall normally be subject to conditions,
including monitoring. The duration of the time-limited review for any authorisation
shall be determined on a case-by-case basis taking into account all relevant information
including the elements listed in paragraph 4(a) to (d), as appropriate.
9.
The authorisation shall specify:
(a)
the person(s) to whom the authorisation is granted;
(b)
the identity of the substance(s);
(c)
the use(s) for which the authorisation is granted;
(d)
any conditions under which the authorisation is granted;
(e)
the time-limited review period;
(f)
any monitoring arrangement.
10.'
- source_sentence: What conditions must be met for the stability study in organic
solvents to be deemed unnecessary for a substance?
sentences:
- 'AR 23. When disclosing information required under paragraph 29 for the purpose
of setting targets the undertaking shall consider the need for an informed and
willing consent of local and indigenous peoples , the need for appropriate consultations
and the need to respect the decisions of these communities.
AR 24. The targets related to material impacts may be presented in a table as
illustrated below:
Type of target according to mitigation hierarchy Baseline value and base year
Target value and geographical scope Connected policy or legislation if relevant
2025 2030 Up to 2050 Avoidance Minimisation Rehabilitation and restoration Compensation
or offsets'
- '1.
Member States shall, in accordance with paragraph 2, draw up a register of producers,
including producers supplying EEE by means of distance communication. That register
shall serve to monitor compliance with the requirements of this Directive.
Producers supplying EEE by means of distance communication as defined in Article
3(1)(f)(iv) shall be registered in the Member State that they sell to. Where such
producers are not registered in the Member State that they are selling to, they
shall be registered through their authorised representatives as referred to in
Article 17(2).
2.
Member States shall ensure that:
(a)
each producer, or each authorised representative where appointed under Article
17, is registered as required and has the possibility of entering online in their
national register all relevant information reflecting that producer’s activities
in that Member State;
(b)
upon registering, each producer, or each authorised representative where appointed
under Article 17, provides the information set out in Annex X, Part A, undertaking
to update it as appropriate;
(c)
each producer, or each authorised representative where appointed under Article
17, provides the information set out in Annex X, Part B;
(d)
national registers provide links to other national registers on their website
to facilitate, in all Member States, registration of producers or, where appointed
under Article 17, authorised representatives.
3.
In order to ensure uniform conditions for the implementation of this Article,
the Commission shall adopt implementing acts establishing the format for registration
and reporting and the frequency of reporting to the register. Those implementing
acts shall be adopted in accordance with the examination procedure referred to
in Article 21(2).
4.
Member States shall collect information, including substantiated estimates, on
an annual basis, on the quantities and categories of EEE placed on their markets,
collected through all routes, prepared for re-use, recycled and recovered within
the Member State, and on separately collected WEEE exported, by weight.
▼M1 —————
▼M1
6.'
- 'COLUMN 1 STANDARD INFORMATION REQUIRED COLUMN 2 SPECIFIC RULES FOR ADAPTATION
FROM COLUMN 1 7.15. Stability in organic solvents and identity of relevant degradation
products Only required if stability of the substance is considered to be critical.
7.15. The study does not need to be conducted if the substance is inorganic. 7.16.
Dissociation constant 7.16. The study does not need to be conducted if: — the
substance is hydrolytically unstable (half-life less than 12 hours) or is readily
oxidisable in water, or ►M70 ◄ ►M64 — or based on the structure, the substance
does not have any chemical group that can dissociate. ◄ 7.17. Viscosity ►M64 For
hydrocarbon substances the kinematic viscosity shall be determined at 40 °C. ◄'
- source_sentence: How is 'associated undertaking' defined, and what criteria determine
the significant influence of one undertaking over another in terms of voting rights?
sentences:
- '▼B
(6)
‘purchase price’ means the price payable and any incidental expenses minus any
incidental reductions in the cost of acquisition;
(7)
‘production cost’ means the purchase price of raw materials, consumables and other
costs directly attributable to the item in question. Member States shall permit
or require the inclusion of a reasonable proportion of fixed or variable overhead
costs indirectly attributable to the item in question, to the extent that they
relate to the period of production. Distribution costs shall not be included;
(8)
‘value adjustment’ means the adjustments intended to take account of changes in
the values of individual assets established at the balance sheet date, whether
the change is final or not;
(9)
‘parent undertaking’ means an undertaking which controls one or more subsidiary
undertakings;
(10)
‘subsidiary undertaking’ means an undertaking controlled by a parent undertaking,
including any subsidiary undertaking of an ultimate parent undertaking;
(11)
‘group’ means a parent undertaking and all its subsidiary undertakings;
(12)
‘affiliated undertakings’ means any two or more undertakings within a group;
(13)
‘associated undertaking’ means an undertaking in which another undertaking has
a participating interest, and over whose operating and financial policies that
other undertaking exercises significant influence. An undertaking is presumed
to exercise a significant influence over another undertaking where it has 20 %
or more of the shareholders'' or members'' voting rights in that other undertaking;
(14)
‘investment undertakings’ means:
(a)
undertakings the sole object of which is to invest their funds in various securities,
real property and other assets, with the sole aim of spreading investment risks
and giving their shareholders the benefit of the results of the management of
their assets,
(b)
undertakings associated with investment undertakings with fixed capital, if the
sole object of those associated undertakings is to acquire fully paid shares issued
by those investment undertakings without prejudice to point (h) of Article 22(1)
of Directive 2012/30/EU;
(15)'
- and non-European non-financial corporations not subject to the disclosure obligations
laid down in Directive 2013/34/EU. That information may be disclosed only once,
based on counterparties’ turnover alignment for the general-purpose lending loans,
as in the case of the GAR. The first disclosure reference date of this template
is as of 31 December 2024. Institutions are not required to disclose this information
before 1 January 2025. ---|---|---
- 'ANNEX II
Due diligence statement
Information to be contained in the due diligence statement in accordance with
Article 4(2):
1.
Operator’s name, address and, in the event of relevant commodities and relevant
products entering or leaving the market, the Economic Operators Registration and
Identification (EORI) number in accordance with Article 9 of Regulation (EU) No
952/2013.
2.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.6910063870188158
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9109269808389435
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9461418953909891
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9742793026065941
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6910063870188158
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.30364232694631454
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18922837907819778
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09742793026065939
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6910063870188158
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9109269808389435
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9461418953909891
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9742793026065941
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8471731447814336
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.804833419644399
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8061197699360279
name: Cosine Map@100
---
# SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) <!-- at revision bc02f0a92d1b6dd82108036f6cb4b7b423fb7434 -->
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"How is 'associated undertaking' defined, and what criteria determine the significant influence of one undertaking over another in terms of voting rights?",
"▼B\n\n(6)\n\n‘purchase price’ means the price payable and any incidental expenses minus any incidental reductions in the cost of acquisition;\n\n(7)\n\n‘production cost’ means the purchase price of raw materials, consumables and other costs directly attributable to the item in question. Member States shall permit or require the inclusion of a reasonable proportion of fixed or variable overhead costs indirectly attributable to the item in question, to the extent that they relate to the period of production. Distribution costs shall not be included;\n\n(8)\n\n‘value adjustment’ means the adjustments intended to take account of changes in the values of individual assets established at the balance sheet date, whether the change is final or not;\n\n(9)\n\n‘parent undertaking’ means an undertaking which controls one or more subsidiary undertakings;\n\n(10)\n\n‘subsidiary undertaking’ means an undertaking controlled by a parent undertaking, including any subsidiary undertaking of an ultimate parent undertaking;\n\n(11)\n\n‘group’ means a parent undertaking and all its subsidiary undertakings;\n\n(12)\n\n‘affiliated undertakings’ means any two or more undertakings within a group;\n\n(13)\n\n‘associated undertaking’ means an undertaking in which another undertaking has a participating interest, and over whose operating and financial policies that other undertaking exercises significant influence. An undertaking is presumed to exercise a significant influence over another undertaking where it has 20 % or more of the shareholders' or members' voting rights in that other undertaking;\n\n(14)\n\n‘investment undertakings’ means:\n\n(a)\n\nundertakings the sole object of which is to invest their funds in various securities, real property and other assets, with the sole aim of spreading investment risks and giving their shareholders the benefit of the results of the management of their assets,\n\n(b)\n\nundertakings associated with investment undertakings with fixed capital, if the sole object of those associated undertakings is to acquire fully paid shares issued by those investment undertakings without prejudice to point (h) of Article 22(1) of Directive 2012/30/EU;\n\n(15)",
'and non-European non-financial corporations not subject to the disclosure obligations laid down in Directive 2013/34/EU. That information may be disclosed only once, based on counterparties’ turnover alignment for the general-purpose lending loans, as in the case of the GAR. The first disclosure reference date of this template is as of 31 December 2024. Institutions are not required to disclose this information before 1 January 2025. ---|---|---',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Information Retrieval
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.691 |
| cosine_accuracy@3 | 0.9109 |
| cosine_accuracy@5 | 0.9461 |
| cosine_accuracy@10 | 0.9743 |
| cosine_precision@1 | 0.691 |
| cosine_precision@3 | 0.3036 |
| cosine_precision@5 | 0.1892 |
| cosine_precision@10 | 0.0974 |
| cosine_recall@1 | 0.691 |
| cosine_recall@3 | 0.9109 |
| cosine_recall@5 | 0.9461 |
| cosine_recall@10 | 0.9743 |
| **cosine_ndcg@10** | **0.8472** |
| cosine_mrr@10 | 0.8048 |
| cosine_map@100 | 0.8061 |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 46,338 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 13 tokens</li><li>mean: 34.18 tokens</li><li>max: 251 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 231.33 tokens</li><li>max: 2146 tokens</li></ul> |
* Samples:
| sentence_0 | sentence_1 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>How is 'energy efficiency' defined in the context of Directive (EU) 2018/2001?</code> | <code>of Directive (EU) 2018/2001; --- --- (8) ‘energy efficiency’ means the ratio of output of performance, service, goods or energy to input of energy; --- --- (9) ‘energy savings’ means an amount of saved energy determined by measuring or estimating consumption, or both,, before and after the implementation of an energy efficiency improvement measure, whilst ensuring normalisation for external conditions that affect energy consumption; --- --- (10) ‘energy efficiency improvement’ means an increase in energy efficiency as a result of any technological, behavioural or economic changes; --- --- (11) ‘energy service’ means the physical benefit, utility or good derived from a combination of energy with energy-efficient technology or with action,</code> |
| <code>What are the sources of information that the external experts will use to create the list of conflict-affected and high-risk areas?</code> | <code>2.<br><br>The Commission shall call upon external expertise that will provide an indicative, non-exhaustive, regularly updated list of conflict-affected and high-risk areas. That list shall be based on the external experts' analysis of the handbook referred to in paragraph 1 and existing information from, inter alia, academics and supply chain due diligence schemes. Union importers sourcing from areas which are not mentioned on that list shall also maintain their responsibility to comply with the due diligence obligations under this Regulation.<br><br>Article 15<br><br>Committee procedure<br><br>1.<br><br>The Commission shall be assisted by a committee. That committee shall be a committee within the meaning of Regulation (EU) No 182/2011.<br><br>2.</code> |
| <code>What is the maximum time frame for completing the undertaking according to the technical specifications set out in Annexes II and III after the Directive enters into force?</code> | <code>is undertaken according to the technical specifications set out in Annexes II and III and that it is completed at the latest four years after the date of entry into force of this Directive.<br><br>2. The analyses and reviews mentioned under paragraph 1 shall be reviewed, and if necessary updated at the latest 13 years after the date of entry into force of this Directive and every six years thereafter.<br><br>Article 6<br><br>Register of protected areas</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 4
- `num_train_epochs`: 4
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 4
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
</details>
### Training Logs
| Epoch | Step | Training Loss | cosine_ndcg@10 |
|:------:|:-----:|:-------------:|:--------------:|
| 0.0432 | 500 | 0.358 | - |
| 0.0863 | 1000 | 0.1048 | - |
| 0.1295 | 1500 | 0.0827 | - |
| 0.1726 | 2000 | 0.067 | 0.7969 |
| 0.2158 | 2500 | 0.0491 | - |
| 0.2590 | 3000 | 0.0831 | - |
| 0.3021 | 3500 | 0.062 | - |
| 0.3453 | 4000 | 0.0657 | 0.8050 |
| 0.3884 | 4500 | 0.0522 | - |
| 0.4316 | 5000 | 0.049 | - |
| 0.4748 | 5500 | 0.0426 | - |
| 0.5179 | 6000 | 0.0708 | 0.8215 |
| 0.5611 | 6500 | 0.0236 | - |
| 0.6042 | 7000 | 0.024 | - |
| 0.6474 | 7500 | 0.0256 | - |
| 0.6905 | 8000 | 0.041 | 0.8105 |
| 0.7337 | 8500 | 0.0285 | - |
| 0.7769 | 9000 | 0.0249 | - |
| 0.8200 | 9500 | 0.0368 | - |
| 0.8632 | 10000 | 0.0588 | 0.8118 |
| 0.9063 | 10500 | 0.0386 | - |
| 0.9495 | 11000 | 0.0456 | - |
| 0.9927 | 11500 | 0.0399 | - |
| 1.0 | 11585 | - | 0.8184 |
| 1.0358 | 12000 | 0.0424 | 0.8239 |
| 1.0790 | 12500 | 0.0107 | - |
| 1.1221 | 13000 | 0.0279 | - |
| 1.1653 | 13500 | 0.0236 | - |
| 1.2085 | 14000 | 0.024 | 0.8193 |
| 1.2516 | 14500 | 0.0143 | - |
| 1.2948 | 15000 | 0.0118 | - |
| 1.3379 | 15500 | 0.0078 | - |
| 1.3811 | 16000 | 0.023 | 0.8217 |
| 1.4243 | 16500 | 0.0239 | - |
| 1.4674 | 17000 | 0.0335 | - |
| 1.5106 | 17500 | 0.0119 | - |
| 1.5537 | 18000 | 0.0411 | 0.8292 |
| 1.5969 | 18500 | 0.0168 | - |
| 1.6401 | 19000 | 0.0059 | - |
| 1.6832 | 19500 | 0.0234 | - |
| 1.7264 | 20000 | 0.0184 | 0.8366 |
| 1.7695 | 20500 | 0.0128 | - |
| 1.8127 | 21000 | 0.0166 | - |
| 1.8558 | 21500 | 0.0181 | - |
| 1.8990 | 22000 | 0.0148 | 0.8353 |
| 1.9422 | 22500 | 0.0225 | - |
| 1.9853 | 23000 | 0.0158 | - |
| 2.0 | 23170 | - | 0.8360 |
| 2.0285 | 23500 | 0.0123 | - |
| 2.0716 | 24000 | 0.0173 | 0.8329 |
| 2.1148 | 24500 | 0.0167 | - |
| 2.1580 | 25000 | 0.0125 | - |
| 2.2011 | 25500 | 0.013 | - |
| 2.2443 | 26000 | 0.0079 | 0.8338 |
| 2.2874 | 26500 | 0.007 | - |
| 2.3306 | 27000 | 0.0171 | - |
| 2.3738 | 27500 | 0.0058 | - |
| 2.4169 | 28000 | 0.0048 | 0.8405 |
| 2.4601 | 28500 | 0.005 | - |
| 2.5032 | 29000 | 0.0141 | - |
| 2.5464 | 29500 | 0.0132 | - |
| 2.5896 | 30000 | 0.006 | 0.8461 |
| 2.6327 | 30500 | 0.0095 | - |
| 2.6759 | 31000 | 0.0061 | - |
| 2.7190 | 31500 | 0.0107 | - |
| 2.7622 | 32000 | 0.0157 | 0.8451 |
| 2.8054 | 32500 | 0.005 | - |
| 2.8485 | 33000 | 0.0087 | - |
| 2.8917 | 33500 | 0.0064 | - |
| 2.9348 | 34000 | 0.005 | 0.8449 |
| 2.9780 | 34500 | 0.0115 | - |
| 3.0 | 34755 | - | 0.8451 |
| 3.0211 | 35000 | 0.0079 | - |
| 3.0643 | 35500 | 0.0045 | - |
| 3.1075 | 36000 | 0.0029 | 0.8443 |
| 3.1506 | 36500 | 0.0161 | - |
| 3.1938 | 37000 | 0.0144 | - |
| 3.2369 | 37500 | 0.0076 | - |
| 3.2801 | 38000 | 0.0157 | 0.8500 |
| 3.3233 | 38500 | 0.0039 | - |
| 3.3664 | 39000 | 0.0045 | - |
| 3.4096 | 39500 | 0.0033 | - |
| 3.4527 | 40000 | 0.0064 | 0.8434 |
| 3.4959 | 40500 | 0.0054 | - |
| 3.5391 | 41000 | 0.0061 | - |
| 3.5822 | 41500 | 0.0051 | - |
| 3.6254 | 42000 | 0.0019 | 0.8472 |
### Framework Versions
- Python: 3.10.15
- Sentence Transformers: 3.4.1
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu126
- Accelerate: 1.5.2
- Datasets: 3.4.1
- Tokenizers: 0.21.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |