SentenceTransformer based on seongil-dn/unsupervised_20m_3800

This is a sentence-transformers model finetuned from seongil-dn/unsupervised_20m_3800. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: seongil-dn/unsupervised_20m_3800
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seongil-dn/bge-m3-3800_steps_v2_234")
# Run inference
sentences = [
    'The Battle of Northampton was fought in which century?',
    'Battle of Megiddo (15th century BC) Battle of Megiddo (15th century BC) The Battle of Megiddo (15th century BC) was fought between Egyptian forces under the command of Pharaoh Thutmose III and a large rebellious coalition of Canaanite vassal states led by the king of Kadesh. It is the first battle to have been recorded in what is accepted as relatively reliable detail. Megiddo is also the first recorded use of the composite bow and the first body count. All details of the battle come from Egyptian sources—primarily the hieroglyphic writings on the Hall of Annals in the Temple of Amun-Re at Karnak, Thebes (now Luxor),',
    'Northampton Sand Northampton Sand The Northampton Sand, sometimes called the Northamptonshire Sand is a geological formation of Jurassic age found in the East Midlands of England. Particularly in the twentieth century, it has been of economic importance as a source of iron ore, but is now worked much less. The Northampton Sand Formation constitutes the lowest part of the Inferior Oolite Series and lies on the upper Lias clay. It attains a maximum thickness of up to to the north and west of Northampton where it lies in a subterranean basin. In the south, it fades out around Towcester. Northward from the',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,921,571 training samples
  • Columns: anchor, positive, negative, negative_2, negative_3, negative_4, and negative_5
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative negative_2 negative_3 negative_4 negative_5
    type string string string string string string string
    details
    • min: 9 tokens
    • mean: 22.32 tokens
    • max: 119 tokens
    • min: 127 tokens
    • mean: 157.21 tokens
    • max: 276 tokens
    • min: 122 tokens
    • mean: 154.65 tokens
    • max: 212 tokens
    • min: 122 tokens
    • mean: 155.52 tokens
    • max: 218 tokens
    • min: 122 tokens
    • mean: 156.04 tokens
    • max: 284 tokens
    • min: 124 tokens
    • mean: 156.3 tokens
    • max: 268 tokens
    • min: 121 tokens
    • mean: 156.15 tokens
    • max: 249 tokens
  • Samples:
    anchor positive negative negative_2 negative_3 negative_4 negative_5
    What African country is projected to pass the United States in population by the year 2055? African immigration to the United States officially 40,000 African immigrants, although it has been estimated that the population is actually four times this number when considering undocumented immigrants. The majority of these immigrants were born in Ethiopia, Egypt, Nigeria, and South Africa. African immigrants like many other immigrant groups are likely to establish and find success in small businesses. Many Africans that have seen the social and economic stability that comes from ethnic enclaves such as Chinatowns have recently been establishing ethnic enclaves of their own at much higher rates to reap the benefits of such communities. Such examples include Little Ethiopia in Los Angeles and What Will Happen to the Gang Next Year? watching television at the time of the broadcast. This made it the lowest-rated episode in "30 Rock"'s history. and a decrease from the previous episode "The Return of Avery Jessup" (2.92 million) What Will Happen to the Gang Next Year? "What Will Happen to the Gang Next Year?" is the twenty-second and final episode of the sixth season of the American television comedy series "30 Rock", and the 125th overall episode of the series. It was directed by Michael Engler, and written by Matt Hubbard. The episode originally aired on the National Broadcasting Company (NBC) network in the United States Christianity in the United States Christ is the fifth-largest denomination, the largest Pentecostal church, and the largest traditionally African-American denomination in the nation. Among Eastern Christian denominations, there are several Eastern Orthodox and Oriental Orthodox churches, with just below 1 million adherents in the US, or 0.4% of the total population. Christianity was introduced to the Americas as it was first colonized by Europeans beginning in the 16th and 17th centuries. Going forward from its foundation, the United States has been called a Protestant nation by a variety of sources. Immigration further increased Christian numbers. Today most Christian churches in the United States are either What Will Happen to the Gang Next Year? What Will Happen to the Gang Next Year? "What Will Happen to the Gang Next Year?" is the twenty-second and final episode of the sixth season of the American television comedy series "30 Rock", and the 125th overall episode of the series. It was directed by Michael Engler, and written by Matt Hubbard. The episode originally aired on the National Broadcasting Company (NBC) network in the United States on May 17, 2012. In the episode, Jack (Alec Baldwin) and Avery (Elizabeth Banks) seek to renew their vows; Criss (James Marsden) sets out to show Liz (Tina Fey) he can pay History of the Jews in the United States Representatives by Rep. Samuel Dickstein (D; New York). This also failed to pass. During the Holocaust, fewer than 30,000 Jews a year reached the United States, and some were turned away due to immigration policies. The U.S. did not change its immigration policies until 1948. Currently, laws requiring teaching of the Holocaust are on the books in five states. The Holocaust had a profound impact on the community in the United States, especially after 1960, as Jews tried to comprehend what had happened, and especially to commemorate and grapple with it when looking to the future. Abraham Joshua Heschel summarized Public holidays in the United States will have very few customers that day. The labor force in the United States comprises about 62% (as of 2014) of the general population. In the United States, 97% of the private sector businesses determine what days this sector of the population gets paid time off, according to a study by the Society for Human Resource Management. The following holidays are observed by the majority of US businesses with paid time off: This list of holidays is based off the official list of federal holidays by year from the US Government. The holidays however are at the discretion of employers
    Which is the largest species of the turtle family? Turtle tortoise, "testudo". "Terrapin" comes from an Algonquian word for turtle. Some languages do not have this distinction, as all of these are referred to by the same name. For example, in Spanish, the word "tortuga" is used for turtles, tortoises, and terrapins. A sea-dwelling turtle is "tortuga marina", a freshwater species "tortuga de río", and a tortoise "tortuga terrestre". The largest living chelonian is the leatherback sea turtle ("Dermochelys coriacea"), which reaches a shell length of and can reach a weight of over . Freshwater turtles are generally smaller, but with the largest species, the Asian softshell turtle "Pelochelys cantorii", Convention on the Conservation of Migratory Species of Wild Animals take joint action. At May 2018, there were 126 Parties to the Convention. The CMS Family covers a great diversity of migratory species. The Appendices of CMS include many mammals, including land mammals, marine mammals and bats; birds; fish; reptiles and one insect. Among the instruments, AEWA covers 254 species of birds that are ecologically dependent on wetlands for at least part of their annual cycle. EUROBATS covers 52 species of bat, the Memorandum of Understanding on the Conservation of Migratory Sharks seven species of shark, the IOSEA Marine Turtle MOU six species of marine turtle and the Raptors MoU Razor-backed musk turtle Razor-backed musk turtle The razor-backed musk turtle ("Sternotherus carinatus") is a species of turtle in the family Kinosternidae. The species is native to the southern United States. There are no subspecies that are recognized as being valid. "S. carinatus" is found in the states of Alabama, Arkansas, Louisiana, Mississippi, Oklahoma, and Texas. The razor-backed musk turtle grows to a straight carapace length of about . It has a brown-colored carapace, with black markings at the edges of each scute. The carapace has a distinct, sharp keel down the center of its length, giving the species its common name. The body African helmeted turtle African helmeted turtle The African helmeted turtle ("Pelomedusa subrufa"), also known commonly as the marsh terrapin, the crocodile turtle, or in the pet trade as the African side-necked turtle, is a species of omnivorous side-necked terrapin in the family Pelomedusidae. The species naturally occurs in fresh and stagnant water bodies throughout much of Sub-Saharan Africa, and in southern Yemen. The marsh terrapin is typically a rather small turtle, with most individuals being less than in straight carapace length, but one has been recorded with a length of . It has a black or brown carapace. The top of the tail Box turtle Box turtle Box turtles are North American turtles of the genus Terrapene. Although box turtles are superficially similar to tortoises in terrestrial habits and overall appearance, they are actually members of the American pond turtle family (Emydidae). The twelve taxa which are distinguished in the genus are distributed over four species. They are largely characterized by having a domed shell, which is hinged at the bottom, allowing the animal to close its shell tightly to escape predators. The genus name "Terrapene" was coined by Merrem in 1820 as a genus separate from "Emys" for those species which had a sternum Vallarta mud turtle Vallarta mud turtle The Vallarta mud turtle ("Kinosternon vogti") is a recently identified species of mud turtle in the family Kinosternidae. While formerly considered conspecific with the Jalisco mud turtle, further studies indicated that it was a separate species. It can be identified by a combination of the number of plastron and carapace scutes, body size, and the distinctive yellow rostral shield in males. It is endemic to Mexican state of Jalisco. It is only known from a few human-created or human-affected habitats (such as small streams and ponds) found around Puerto Vallarta. It is one of only 3 species
    How many gallons of beer are in an English barrel? Low-alcohol beer Prohibition in the United States. Near beer could not legally be labeled as "beer" and was officially classified as a "cereal beverage". The public, however, almost universally called it "near beer". The most popular "near beer" was Bevo, brewed by the Anheuser-Busch company. The Pabst company brewed "Pablo", Miller brewed "Vivo", and Schlitz brewed "Famo". Many local and regional breweries stayed in business by marketing their own near-beers. By 1921 production of near beer had reached over 300 million US gallons (1 billion L) a year (36 L/s). A popular illegal practice was to add alcohol to near beer. The Keg terms "half-barrel" and "quarter-barrel" are derived from the U.S. beer barrel, legally defined as being equal to 31 U.S. gallons (this is not the same volume as some other units also known as "barrels"). A 15.5 U.S. gallon keg is also equal to: However, beer kegs can come in many sizes: In European countries the most common keg size is 50 liters. This includes the UK, which uses a non-metric standard keg of 11 imperial gallons, which is coincidentally equal to . The German DIN 6647-1 and DIN 6647-2 have also defined kegs in the sizes of 30 and 20 Beer in Chile craft beers. They are generally low or very low volume producers. In Chile there are more than 150 craft beer producers distributed along the 15 Chilean Regions. The list below includes: Beer in Chile The primary beer brewed and consumed in Chile is pale lager, though the country also has a tradition of brewing corn beer, known as chicha. Chile’s beer history has a strong German influence – some of the bigger beer producers are from the country’s southern lake district, a region populated by a great number of German immigrants during the 19th century. Chile also produces English ale-style Barrel variation. In modern times, produce barrels for all dry goods, excepting cranberries, contain 7,056 cubic inches, about 115.627 L. Barrel A barrel, cask, or tun is a hollow cylindrical container, traditionally made of wooden staves bound by wooden or metal hoops. Traditionally, the barrel was a standard size of measure referring to a set capacity or weight of a given commodity. For example, in the UK a barrel of beer refers to a quantity of . Wine was shipped in barrels of . Modern wooden barrels for wine-making are either made of French common oak ("Quercus robur") and white oak The Rare Barrel The Rare Barrel The Rare Barrel is a brewery and brewpub in Berkeley, California, United States, that exclusively produces sour beers. Founders Jay Goodwin and Alex Wallash met while attending UCSB. They started home-brewing in their apartment and decided that they would one day start a brewery together. Goodwin started working at The Bruery, where he worked his way from a production assistant to brewer, eventually becoming the head of their barrel aging program. The Rare Barrel brewed its first batch of beer in February 2013, and opened its tasting room on December 27, 2013. The Rare Barrel was named Barrel (unit) Barrel (unit) A barrel is one of several units of volume applied in various contexts; there are dry barrels, fluid barrels (such as the UK beer barrel and US beer barrel), oil barrels and so on. For historical reasons the volumes of some barrel units are roughly double the volumes of others; volumes in common usage range from about . In many connections the term "drum" is used almost interchangeably with "barrel". Since medieval times the term barrel as a unit of measure has had various meanings throughout Europe, ranging from about 100 litres to 1000 litres. The name was
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
      (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01}
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 2048
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_ratio: 0.05
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 2048
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.4145 97 0.3308
0.4188 98 0.3059
0.4231 99 0.3524
0.4274 100 0.3363
0.4316 101 0.3255
0.4359 102 0.3182
0.4402 103 0.3246
0.4444 104 0.3361
0.4487 105 0.3289
0.4530 106 0.3138
0.4573 107 0.3189
0.4615 108 0.316
0.4658 109 0.3267
0.4701 110 0.3395
0.4744 111 0.3028
0.4786 112 0.319
0.4829 113 0.3126
0.4872 114 0.3101
0.4915 115 0.3064
0.4957 116 0.3016
0.5 117 0.3324
0.5043 118 0.3182
0.5085 119 0.294
0.5128 120 0.3008
0.5171 121 0.312
0.5214 122 0.2975
0.5256 123 0.3023
0.5299 124 0.3251
0.5342 125 0.3043
0.5385 126 0.3201
0.5427 127 0.3097
0.5470 128 0.3132
0.5513 129 0.2934
0.5556 130 0.3266
0.5598 131 0.2935
0.5641 132 0.3052
0.5684 133 0.2859
0.5726 134 0.3079
0.5769 135 0.295
0.5812 136 0.2996
0.5855 137 0.3045
0.5897 138 0.2977
0.5940 139 0.3009
0.5983 140 0.2953
0.6026 141 0.3007
0.6068 142 0.3187
0.6111 143 0.3015
0.6154 144 0.3064
0.6197 145 0.2843
0.6239 146 0.3063
0.6282 147 0.304
0.6325 148 0.2998
0.6368 149 0.3077
0.6410 150 0.2975
0.6453 151 0.3165
0.6496 152 0.2961
0.6538 153 0.2939
0.6581 154 0.2963
0.6624 155 0.3109
0.6667 156 0.2873
0.6709 157 0.3028
0.6752 158 0.2937
0.6795 159 0.2839
0.6838 160 0.294
0.6880 161 0.3066
0.6923 162 0.2859
0.6966 163 0.3017
0.7009 164 0.2947
0.7051 165 0.2884
0.7094 166 0.3055
0.7137 167 0.2744
0.7179 168 0.2789
0.7222 169 0.2838
0.7265 170 0.2759
0.7308 171 0.2908
0.7350 172 0.2984
0.7393 173 0.2932
0.7436 174 0.3061
0.7479 175 0.2862
0.7521 176 0.2795
0.7564 177 0.2826
0.7607 178 0.2962
0.7650 179 0.281
0.7692 180 0.2853
0.7735 181 0.2794
0.7778 182 0.2822
0.7821 183 0.2969
0.7863 184 0.2773
0.7906 185 0.2834
0.7949 186 0.2826
0.7991 187 0.2832
0.8034 188 0.3031
0.8077 189 0.2873
0.8120 190 0.2987
0.8162 191 0.2791
0.8205 192 0.2773
0.8248 193 0.2699
0.8291 194 0.2991
0.8333 195 0.275
0.8376 196 0.2985
0.8419 197 0.2927
0.8462 198 0.2685
0.8504 199 0.2941
0.8547 200 0.3009
0.8590 201 0.3009
0.8632 202 0.2899
0.8675 203 0.3024
0.8718 204 0.2975
0.8761 205 0.2754
0.8803 206 0.2834
0.8846 207 0.2839
0.8889 208 0.2804
0.8932 209 0.2882
0.8974 210 0.2848
0.9017 211 0.2723
0.9060 212 0.2877
0.9103 213 0.2998
0.9145 214 0.3007
0.9188 215 0.2825
0.9231 216 0.2794
0.9274 217 0.2786
0.9316 218 0.2671
0.9359 219 0.2743
0.9402 220 0.2859
0.9444 221 0.2804
0.9487 222 0.2797
0.9530 223 0.2818
0.9573 224 0.2758
0.9615 225 0.2798
0.9658 226 0.2805
0.9701 227 0.2649
0.9744 228 0.2854
0.9786 229 0.2791
0.9829 230 0.2729
0.9872 231 0.2817
0.9915 232 0.2796
0.9957 233 0.305
1.0 234 0.2922

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for seongil-dn/bge-m3-3800_steps_v2_234

Finetuned
(8)
this model