tomaarsen HF staff commited on
Commit
3b7f8ce
·
verified ·
1 Parent(s): 2387344

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +465 -0
  2. config.json +39 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +51 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +58 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,465 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - generated_from_trainer
6
+ - dataset_size:100000
7
+ - loss:LambdaLoss
8
+ base_model: almanach/camembertv2-base
9
+ pipeline_tag: text-ranking
10
+ library_name: sentence-transformers
11
+ metrics:
12
+ - map
13
+ - mrr@10
14
+ - ndcg@10
15
+ co2_eq_emissions:
16
+ emissions: 264.3941099722424
17
+ energy_consumed: 0.6801974519612515
18
+ source: codecarbon
19
+ training_type: fine-tuning
20
+ on_cloud: false
21
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
22
+ ram_total_size: 31.777088165283203
23
+ hours_used: 1.921
24
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
25
+ model-index:
26
+ - name: CrossEncoder based on almanach/camembertv2-base
27
+ results:
28
+ - task:
29
+ type: cross-encoder-reranking
30
+ name: Cross Encoder Reranking
31
+ dataset:
32
+ name: swim ir dev
33
+ type: swim_ir_dev
34
+ metrics:
35
+ - type: map
36
+ value: 0.6059
37
+ name: Map
38
+ - type: mrr@10
39
+ value: 0.6052
40
+ name: Mrr@10
41
+ - type: ndcg@10
42
+ value: 0.6217
43
+ name: Ndcg@10
44
+ ---
45
+
46
+ # CrossEncoder based on almanach/camembertv2-base
47
+
48
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [almanach/camembertv2-base](https://huggingface.co/almanach/camembertv2-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
49
+
50
+ ## Model Details
51
+
52
+ ### Model Description
53
+ - **Model Type:** Cross Encoder
54
+ - **Base model:** [almanach/camembertv2-base](https://huggingface.co/almanach/camembertv2-base) <!-- at revision 704f48f2c01dcf3d6ca5992133d59078bdaac26a -->
55
+ - **Maximum Sequence Length:** 1024 tokens
56
+ - **Number of Output Labels:** 1 label
57
+ <!-- - **Training Dataset:** Unknown -->
58
+ <!-- - **Language:** Unknown -->
59
+ <!-- - **License:** Unknown -->
60
+
61
+ ### Model Sources
62
+
63
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
64
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
65
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
66
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
67
+
68
+ ## Usage
69
+
70
+ ### Direct Usage (Sentence Transformers)
71
+
72
+ First install the Sentence Transformers library:
73
+
74
+ ```bash
75
+ pip install -U sentence-transformers
76
+ ```
77
+
78
+ Then you can load this model and run inference.
79
+ ```python
80
+ from sentence_transformers import CrossEncoder
81
+
82
+ # Download from the 🤗 Hub
83
+ model = CrossEncoder("tomaarsen/reranker-camembertv2-base-fr-lambda")
84
+ # Get scores for pairs of texts
85
+ pairs = [
86
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
87
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
88
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
89
+ ]
90
+ scores = model.predict(pairs)
91
+ print(scores.shape)
92
+ # (3,)
93
+
94
+ # Or rank different texts based on similarity to a single text
95
+ ranks = model.rank(
96
+ 'How many calories in an egg',
97
+ [
98
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
99
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
100
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
101
+ ]
102
+ )
103
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
104
+ ```
105
+
106
+ <!--
107
+ ### Direct Usage (Transformers)
108
+
109
+ <details><summary>Click to see the direct usage in Transformers</summary>
110
+
111
+ </details>
112
+ -->
113
+
114
+ <!--
115
+ ### Downstream Usage (Sentence Transformers)
116
+
117
+ You can finetune this model on your own dataset.
118
+
119
+ <details><summary>Click to expand</summary>
120
+
121
+ </details>
122
+ -->
123
+
124
+ <!--
125
+ ### Out-of-Scope Use
126
+
127
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
128
+ -->
129
+
130
+ ## Evaluation
131
+
132
+ ### Metrics
133
+
134
+ #### Cross Encoder Reranking
135
+
136
+ * Dataset: `swim_ir_dev`
137
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
138
+ ```json
139
+ {
140
+ "at_k": 10,
141
+ "always_rerank_positives": false
142
+ }
143
+ ```
144
+
145
+ | Metric | Value |
146
+ |:------------|:---------------------|
147
+ | map | 0.6059 (+0.1333) |
148
+ | mrr@10 | 0.6052 (+0.1371) |
149
+ | **ndcg@10** | **0.6217 (+0.1206)** |
150
+
151
+ <!--
152
+ ## Bias, Risks and Limitations
153
+
154
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
155
+ -->
156
+
157
+ <!--
158
+ ### Recommendations
159
+
160
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
161
+ -->
162
+
163
+ ## Training Details
164
+
165
+ ### Training Dataset
166
+
167
+ #### Unnamed Dataset
168
+
169
+ * Size: 100,000 training samples
170
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
171
+ * Approximate statistics based on the first 1000 samples:
172
+ | | query | docs | labels |
173
+ |:--------|:-----------------------------------------------------------------------------------------------|:-----------------------------------|:-----------------------------------|
174
+ | type | string | list | list |
175
+ | details | <ul><li>min: 0 characters</li><li>mean: 37.74 characters</li><li>max: 157 characters</li></ul> | <ul><li>size: 6 elements</li></ul> | <ul><li>size: 6 elements</li></ul> |
176
+ * Samples:
177
+ | query | docs | labels |
178
+ |:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
179
+ | <code></code> | <code>['L\'ambitus exigé par le rôle-titre est plus problématique : la plus haute note est un "si" aigu, ce qui n\'est pas anormal pour une soprano ou une mezzo-soprano, alors que la plus basse est un "sol" bémol grave dans le registre alto (et normalement au-dessous du registre d\'une mezzo-soprano "standard"). Compte tenu d\'une telle tessiture, qui ressemble à celle de nombreux rôles de mezzo comme Carmen et Amneris, on pourrait croire qu\'un soprano aigu n\'est pas essentiel à la pièce, mais c\'est bien le contraire ; la plupart des sopranos graves qui ont abordé ce rôle ont imposé un tel effort à leur voix tout au long de l\'opéra, qu\'elles se retrouvaient épuisées au moment de la scène finale (la partie la plus éprouvante pour le rôle-titre). Ce rôle est l\'exemple classique de la différence qui existe entre tessiture et ambitus : tandis que des mezzos peuvent exécuter une note aigüe (comme dans "Carmen"), ou même soutenir temporairement une tessiture tendue, il est impossible pour un...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
180
+ | <code></code> | <code>["Les saisons 2 à 6 sont produites par Télé-Vision V Inc., filiale de Groupe Télé-Vision Inc. Lors de la saison d'hiver 2006, l'émission était animée par Isabelle Maréchal et Virginie Coossa. Pour les saisons 3,4,5 Marie Plourde a remplacé Isabelle Maréchal, alors que Virginie Coossa est demeurée coanimatrice. Lors des saisons 5 et 6, Kim Rusk, la gagnante de la saison 3, était la coanimatrice. La saison 6 sera animée par Pierre-Yves Lord.", ',{"type": "ExternalData", "service":"geoshape","ids": "Q40","properties": {"fill":"#FF0000","stroke-width":0,"description": "Autriche"}}]', '! scope=col width="10%" | Pages ! scope=col width="25%" | Auteur(s) ! scope=col width="65%" | Titre ! scope=col width="2%" |', '~CH-CH-CH-CH-CH-CH-CH’~ → ~CH-CH-CH=CH + CH-CH-CH’~ ou ~CH’~ + CH=CH-CH-CH-CH-CH', '! style="text-align:center; background: #aabccc;"|Modèle ! style="text-align:center; background: #aabccc;"|Image ! style="text-align:center; background: #aabccc;"|Origine ! style="text-align:center; b...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
181
+ | <code></code> | <code>['En 1963, Bernard et Françoise Moitessier quittent le port de Marseille, pour un voyage de noces. Ils prennent le détroit de Gibraltar et se dirigent vers les îles Canaries où il retrouve Pierre Deshumeurs, le compagnon du "Snark". Les enfants de Françoise les rejoignent le temps des vacances scolaires. Les Moitessier poursuivent ensuite vers les Antilles, puis le canal de Panama, avant de s\'arrêter longuement dans l\'archipel des Galápagos, où certaines îles reculées de toutes civilisations accueillent une faune et une flore exceptionnelles qui retiennent l\'attention du couple. Ils rejoignent ensuite la Polynésie française où ils restent plusieurs mois.', ',{"type": "ExternalData", "service":"geoshape","ids": "Q40","properties": {"fill":"#FF0000","stroke-width":0,"description": "Autriche"}}]', '! scope=col width="10%" | Pages ! scope=col width="25%" | Auteur(s) ! scope=col width="65%" | Titre ! scope=col width="2%" |', '~CH-CH-CH-CH-CH-CH-CH’~ → ~CH-CH-CH=CH + CH-CH-CH’~ ou ~CH’~ +...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
182
+ * Loss: [<code>LambdaLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#lambdaloss) with these parameters:
183
+ ```json
184
+ {
185
+ "weighting_scheme": "sentence_transformers.cross_encoder.losses.LambdaLoss.NDCGLoss2PPScheme",
186
+ "k": null,
187
+ "sigma": 1.0,
188
+ "eps": 1e-10,
189
+ "reduction_log": "binary",
190
+ "activation_fct": "torch.nn.modules.linear.Identity",
191
+ "mini_batch_size": 8
192
+ }
193
+ ```
194
+
195
+ ### Training Hyperparameters
196
+ #### Non-Default Hyperparameters
197
+
198
+ - `eval_strategy`: steps
199
+ - `per_device_train_batch_size`: 16
200
+ - `per_device_eval_batch_size`: 16
201
+ - `learning_rate`: 2e-05
202
+ - `num_train_epochs`: 1
203
+ - `warmup_ratio`: 0.1
204
+ - `seed`: 12
205
+ - `bf16`: True
206
+ - `load_best_model_at_end`: True
207
+
208
+ #### All Hyperparameters
209
+ <details><summary>Click to expand</summary>
210
+
211
+ - `overwrite_output_dir`: False
212
+ - `do_predict`: False
213
+ - `eval_strategy`: steps
214
+ - `prediction_loss_only`: True
215
+ - `per_device_train_batch_size`: 16
216
+ - `per_device_eval_batch_size`: 16
217
+ - `per_gpu_train_batch_size`: None
218
+ - `per_gpu_eval_batch_size`: None
219
+ - `gradient_accumulation_steps`: 1
220
+ - `eval_accumulation_steps`: None
221
+ - `torch_empty_cache_steps`: None
222
+ - `learning_rate`: 2e-05
223
+ - `weight_decay`: 0.0
224
+ - `adam_beta1`: 0.9
225
+ - `adam_beta2`: 0.999
226
+ - `adam_epsilon`: 1e-08
227
+ - `max_grad_norm`: 1.0
228
+ - `num_train_epochs`: 1
229
+ - `max_steps`: -1
230
+ - `lr_scheduler_type`: linear
231
+ - `lr_scheduler_kwargs`: {}
232
+ - `warmup_ratio`: 0.1
233
+ - `warmup_steps`: 0
234
+ - `log_level`: passive
235
+ - `log_level_replica`: warning
236
+ - `log_on_each_node`: True
237
+ - `logging_nan_inf_filter`: True
238
+ - `save_safetensors`: True
239
+ - `save_on_each_node`: False
240
+ - `save_only_model`: False
241
+ - `restore_callback_states_from_checkpoint`: False
242
+ - `no_cuda`: False
243
+ - `use_cpu`: False
244
+ - `use_mps_device`: False
245
+ - `seed`: 12
246
+ - `data_seed`: None
247
+ - `jit_mode_eval`: False
248
+ - `use_ipex`: False
249
+ - `bf16`: True
250
+ - `fp16`: False
251
+ - `fp16_opt_level`: O1
252
+ - `half_precision_backend`: auto
253
+ - `bf16_full_eval`: False
254
+ - `fp16_full_eval`: False
255
+ - `tf32`: None
256
+ - `local_rank`: 0
257
+ - `ddp_backend`: None
258
+ - `tpu_num_cores`: None
259
+ - `tpu_metrics_debug`: False
260
+ - `debug`: []
261
+ - `dataloader_drop_last`: False
262
+ - `dataloader_num_workers`: 0
263
+ - `dataloader_prefetch_factor`: None
264
+ - `past_index`: -1
265
+ - `disable_tqdm`: False
266
+ - `remove_unused_columns`: True
267
+ - `label_names`: None
268
+ - `load_best_model_at_end`: True
269
+ - `ignore_data_skip`: False
270
+ - `fsdp`: []
271
+ - `fsdp_min_num_params`: 0
272
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
273
+ - `fsdp_transformer_layer_cls_to_wrap`: None
274
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
275
+ - `deepspeed`: None
276
+ - `label_smoothing_factor`: 0.0
277
+ - `optim`: adamw_torch
278
+ - `optim_args`: None
279
+ - `adafactor`: False
280
+ - `group_by_length`: False
281
+ - `length_column_name`: length
282
+ - `ddp_find_unused_parameters`: None
283
+ - `ddp_bucket_cap_mb`: None
284
+ - `ddp_broadcast_buffers`: False
285
+ - `dataloader_pin_memory`: True
286
+ - `dataloader_persistent_workers`: False
287
+ - `skip_memory_metrics`: True
288
+ - `use_legacy_prediction_loop`: False
289
+ - `push_to_hub`: False
290
+ - `resume_from_checkpoint`: None
291
+ - `hub_model_id`: None
292
+ - `hub_strategy`: every_save
293
+ - `hub_private_repo`: None
294
+ - `hub_always_push`: False
295
+ - `gradient_checkpointing`: False
296
+ - `gradient_checkpointing_kwargs`: None
297
+ - `include_inputs_for_metrics`: False
298
+ - `include_for_metrics`: []
299
+ - `eval_do_concat_batches`: True
300
+ - `fp16_backend`: auto
301
+ - `push_to_hub_model_id`: None
302
+ - `push_to_hub_organization`: None
303
+ - `mp_parameters`:
304
+ - `auto_find_batch_size`: False
305
+ - `full_determinism`: False
306
+ - `torchdynamo`: None
307
+ - `ray_scope`: last
308
+ - `ddp_timeout`: 1800
309
+ - `torch_compile`: False
310
+ - `torch_compile_backend`: None
311
+ - `torch_compile_mode`: None
312
+ - `dispatch_batches`: None
313
+ - `split_batches`: None
314
+ - `include_tokens_per_second`: False
315
+ - `include_num_input_tokens_seen`: False
316
+ - `neftune_noise_alpha`: None
317
+ - `optim_target_modules`: None
318
+ - `batch_eval_metrics`: False
319
+ - `eval_on_start`: False
320
+ - `use_liger_kernel`: False
321
+ - `eval_use_gather_object`: False
322
+ - `average_tokens_across_devices`: False
323
+ - `prompts`: None
324
+ - `batch_sampler`: batch_sampler
325
+ - `multi_dataset_batch_sampler`: proportional
326
+
327
+ </details>
328
+
329
+ ### Training Logs
330
+ | Epoch | Step | Training Loss | swim_ir_dev_ndcg@10 |
331
+ |:-------:|:--------:|:-------------:|:--------------------:|
332
+ | -1 | -1 | - | 0.0784 (-0.4228) |
333
+ | 0.0002 | 1 | 2.0475 | - |
334
+ | 0.016 | 100 | 2.065 | - |
335
+ | 0.032 | 200 | 1.9662 | - |
336
+ | 0.048 | 300 | 0.9965 | - |
337
+ | 0.064 | 400 | 0.7667 | - |
338
+ | 0.08 | 500 | 0.6547 | 0.5961 (+0.0950) |
339
+ | 0.096 | 600 | 0.5899 | - |
340
+ | 0.112 | 700 | 0.5331 | - |
341
+ | 0.128 | 800 | 0.4637 | - |
342
+ | 0.144 | 900 | 0.4826 | - |
343
+ | 0.16 | 1000 | 0.4249 | 0.6012 (+0.1000) |
344
+ | 0.176 | 1100 | 0.4271 | - |
345
+ | 0.192 | 1200 | 0.4071 | - |
346
+ | 0.208 | 1300 | 0.3594 | - |
347
+ | 0.224 | 1400 | 0.401 | - |
348
+ | 0.24 | 1500 | 0.4171 | 0.5900 (+0.0888) |
349
+ | 0.256 | 1600 | 0.3728 | - |
350
+ | 0.272 | 1700 | 0.3242 | - |
351
+ | 0.288 | 1800 | 0.3665 | - |
352
+ | 0.304 | 1900 | 0.3367 | - |
353
+ | 0.32 | 2000 | 0.3259 | 0.6134 (+0.1122) |
354
+ | 0.336 | 2100 | 0.381 | - |
355
+ | 0.352 | 2200 | 0.3289 | - |
356
+ | 0.368 | 2300 | 0.3234 | - |
357
+ | 0.384 | 2400 | 0.3794 | - |
358
+ | 0.4 | 2500 | 0.3322 | 0.6070 (+0.1058) |
359
+ | 0.416 | 2600 | 0.3139 | - |
360
+ | 0.432 | 2700 | 0.3427 | - |
361
+ | 0.448 | 2800 | 0.3162 | - |
362
+ | 0.464 | 2900 | 0.2899 | - |
363
+ | 0.48 | 3000 | 0.3571 | 0.6166 (+0.1155) |
364
+ | 0.496 | 3100 | 0.3312 | - |
365
+ | 0.512 | 3200 | 0.3082 | - |
366
+ | 0.528 | 3300 | 0.2839 | - |
367
+ | 0.544 | 3400 | 0.3649 | - |
368
+ | 0.56 | 3500 | 0.325 | 0.6108 (+0.1097) |
369
+ | 0.576 | 3600 | 0.3042 | - |
370
+ | 0.592 | 3700 | 0.2785 | - |
371
+ | 0.608 | 3800 | 0.3095 | - |
372
+ | 0.624 | 3900 | 0.3053 | - |
373
+ | 0.64 | 4000 | 0.293 | 0.6131 (+0.1119) |
374
+ | 0.656 | 4100 | 0.2987 | - |
375
+ | 0.672 | 4200 | 0.2675 | - |
376
+ | 0.688 | 4300 | 0.2977 | - |
377
+ | 0.704 | 4400 | 0.2881 | - |
378
+ | 0.72 | 4500 | 0.2862 | 0.6186 (+0.1174) |
379
+ | 0.736 | 4600 | 0.2996 | - |
380
+ | 0.752 | 4700 | 0.2724 | - |
381
+ | 0.768 | 4800 | 0.2442 | - |
382
+ | 0.784 | 4900 | 0.2923 | - |
383
+ | **0.8** | **5000** | **0.2691** | **0.6217 (+0.1206)** |
384
+ | 0.816 | 5100 | 0.3042 | - |
385
+ | 0.832 | 5200 | 0.2654 | - |
386
+ | 0.848 | 5300 | 0.3059 | - |
387
+ | 0.864 | 5400 | 0.2571 | - |
388
+ | 0.88 | 5500 | 0.2741 | 0.6174 (+0.1162) |
389
+ | 0.896 | 5600 | 0.3009 | - |
390
+ | 0.912 | 5700 | 0.2669 | - |
391
+ | 0.928 | 5800 | 0.2272 | - |
392
+ | 0.944 | 5900 | 0.2673 | - |
393
+ | 0.96 | 6000 | 0.2674 | 0.6194 (+0.1182) |
394
+ | 0.976 | 6100 | 0.2551 | - |
395
+ | 0.992 | 6200 | 0.2981 | - |
396
+ | -1 | -1 | - | 0.6217 (+0.1206) |
397
+
398
+ * The bold row denotes the saved checkpoint.
399
+
400
+ ### Environmental Impact
401
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
402
+ - **Energy Consumed**: 0.680 kWh
403
+ - **Carbon Emitted**: 0.264 kg of CO2
404
+ - **Hours Used**: 1.921 hours
405
+
406
+ ### Training Hardware
407
+ - **On Cloud**: No
408
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
409
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
410
+ - **RAM Size**: 31.78 GB
411
+
412
+ ### Framework Versions
413
+ - Python: 3.11.6
414
+ - Sentence Transformers: 3.5.0.dev0
415
+ - Transformers: 4.49.0
416
+ - PyTorch: 2.6.0+cu124
417
+ - Accelerate: 1.4.0
418
+ - Datasets: 3.3.2
419
+ - Tokenizers: 0.21.0
420
+
421
+ ## Citation
422
+
423
+ ### BibTeX
424
+
425
+ #### Sentence Transformers
426
+ ```bibtex
427
+ @inproceedings{reimers-2019-sentence-bert,
428
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
429
+ author = "Reimers, Nils and Gurevych, Iryna",
430
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
431
+ month = "11",
432
+ year = "2019",
433
+ publisher = "Association for Computational Linguistics",
434
+ url = "https://arxiv.org/abs/1908.10084",
435
+ }
436
+ ```
437
+
438
+ #### LambdaLoss
439
+ ```bibtex
440
+ @inproceedings{wang2018lambdaloss,
441
+ title={The lambdaloss framework for ranking metric optimization},
442
+ author={Wang, Xuanhui and Li, Cheng and Golbandi, Nadav and Bendersky, Michael and Najork, Marc},
443
+ booktitle={Proceedings of the 27th ACM international conference on information and knowledge management},
444
+ pages={1313--1322},
445
+ year={2018}
446
+ }
447
+ ```
448
+
449
+ <!--
450
+ ## Glossary
451
+
452
+ *Clearly define terms in order to be accessible across audiences.*
453
+ -->
454
+
455
+ <!--
456
+ ## Model Card Authors
457
+
458
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
459
+ -->
460
+
461
+ <!--
462
+ ## Model Card Contact
463
+
464
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
465
+ -->
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "almanach/camembertv2-base",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 768,
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "id2label": {
15
+ "0": "LABEL_0"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "LABEL_0": 0
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "max_position_embeddings": 1025,
24
+ "model_name": "camembertv2-base",
25
+ "model_type": "roberta",
26
+ "num_attention_heads": 12,
27
+ "num_hidden_layers": 12,
28
+ "pad_token_id": 0,
29
+ "position_biased_input": true,
30
+ "position_embedding_type": "absolute",
31
+ "sentence_transformers": {
32
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
33
+ },
34
+ "torch_dtype": "float32",
35
+ "transformers_version": "4.49.0",
36
+ "type_vocab_size": 1,
37
+ "use_cache": true,
38
+ "vocab_size": 32768
39
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3c2fd3bdf02aeb792131d2ee2b3ae0ab07b68ec9a7edd54bf9e8293270bf9f0
3
+ size 446428756
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "[MASK]",
52
+ "model_max_length": 1024,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "trim_offsets": true,
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff