tonyshaw commited on
Commit
6e1dd60
·
verified ·
1 Parent(s): 9ce44d0

Push model using huggingface_hub.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: Is a residential portion of a building that sells alcohol considered "licensed
9
+ premises" in indiana
10
+ - text: In Michigan, what criteria do courts consider in granting grandparent visitation
11
+ rights?
12
+ - text: Ohio aggravated arson cases
13
+ - text: In Texas, what protections exist for whistleblowers?
14
+ - text: What did you have for breakfast?
15
+ metrics:
16
+ - accuracy
17
+ pipeline_tag: text-classification
18
+ library_name: setfit
19
+ inference: true
20
+ base_model: nomic-ai/nomic-embed-text-v1.5
21
+ ---
22
+
23
+ # SetFit with nomic-ai/nomic-embed-text-v1.5
24
+
25
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
26
+
27
+ The model has been trained using an efficient few-shot learning technique that involves:
28
+
29
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
30
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
31
+
32
+ ## Model Details
33
+
34
+ ### Model Description
35
+ - **Model Type:** SetFit
36
+ - **Sentence Transformer body:** [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)
37
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
38
+ - **Maximum Sequence Length:** 8192 tokens
39
+ - **Number of Classes:** 7 classes
40
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
41
+ <!-- - **Language:** Unknown -->
42
+ <!-- - **License:** Unknown -->
43
+
44
+ ### Model Sources
45
+
46
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
47
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
48
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
49
+
50
+ ### Model Labels
51
+ | Label | Examples |
52
+ |:------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
53
+ | Term of Art Interpretations & Application | <ul><li>'How do courts in Illinois define "constructive eviction"?'</li><li>'How do Pennsylvania courts define "reasonable suspicion" in DUI cases?'</li><li>'definition of ex parte'</li></ul> |
54
+ | Out of Scope | <ul><li>'Has Capt. Ashley Heiberger ever testified as an expert witness?'</li><li>'Have you recently attended any weddings or special celebrations?'</li><li>'Have you seen any good movies lately?'</li></ul> |
55
+ | SDR | <ul><li>'Gonzalez et al. v. Mexico'</li><li>'2021 U.S. Dist. LEXIS 14890'</li><li>'Elizabeth Holmes Theranos ORDER DENYING MOTION FOR RELEASE PENDING APPEAL'</li></ul> |
56
+ | Identify Current Law | <ul><li>'Does Michigan have a statute of repose?'</li><li>'Mississippi law concerning challenges to changes made in updated HOA regulations'</li><li>'cases on nurse liability for making medication dosage mistake in kentucky'</li></ul> |
57
+ | Agent decision | <ul><li>'Search for USPTO Patent Decisions: BPAI and PTAB discussing the integration of a judicial exception into practical applications'</li><li>'Are there any EPA Environmental Appeals Board Decisions regarding the guidelines for establishing a "critical habitat" for wildlife?'</li><li>'Find Merit Systems Protection Board decisions regarding when the plain language of a statute must be treated as controlling'</li></ul> |
58
+ | Q&A - Complex | <ul><li>'Are bloodhounds considered reliable for establishing probable cause in Idaho?'</li><li>'What are the requirements to file a class action lawsuit in Florida?'</li><li>'Can a corporation be held liable for damages caused by an employee driving under the influence of alcohol in New York?'</li></ul> |
59
+ | Practical Guidance | <ul><li>'What does an "Election of Remedy" clause involve in an indemnity agreement? T'</li><li>'Where is Private Company Corporate Governance Board Resolutions Resource Kit T'</li><li>'If I start a law firm in Michigan, what types of employee leave do I need to provide compared to my current firm in Ohio? T'</li></ul> |
60
+
61
+ ## Uses
62
+
63
+ ### Direct Use for Inference
64
+
65
+ First install the SetFit library:
66
+
67
+ ```bash
68
+ pip install setfit
69
+ ```
70
+
71
+ Then you can load this model and run inference.
72
+
73
+ ```python
74
+ from setfit import SetFitModel
75
+
76
+ # Download from the 🤗 Hub
77
+ model = SetFitModel.from_pretrained("tonyshaw/setfit_pg_70h_nomic-v1.5")
78
+ # Run inference
79
+ preds = model("Ohio aggravated arson cases")
80
+ ```
81
+
82
+ <!--
83
+ ### Downstream Use
84
+
85
+ *List how someone could finetune this model on their own dataset.*
86
+ -->
87
+
88
+ <!--
89
+ ### Out-of-Scope Use
90
+
91
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
92
+ -->
93
+
94
+ <!--
95
+ ## Bias, Risks and Limitations
96
+
97
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
98
+ -->
99
+
100
+ <!--
101
+ ### Recommendations
102
+
103
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
104
+ -->
105
+
106
+ ## Training Details
107
+
108
+ ### Training Set Metrics
109
+ | Training set | Min | Median | Max |
110
+ |:-------------|:----|:--------|:----|
111
+ | Word count | 1 | 11.2193 | 98 |
112
+
113
+ | Label | Training Sample Count |
114
+ |:------------------------------------------|:----------------------|
115
+ | Agent decision | 130 |
116
+ | Identify Current Law | 500 |
117
+ | Out of Scope | 100 |
118
+ | Practical Guidance | 41 |
119
+ | Q&A - Complex | 500 |
120
+ | SDR | 500 |
121
+ | Term of Art Interpretations & Application | 500 |
122
+
123
+ ### Training Hyperparameters
124
+ - batch_size: (16, 16)
125
+ - num_epochs: (1, 1)
126
+ - max_steps: -1
127
+ - sampling_strategy: oversampling
128
+ - num_iterations: 10
129
+ - body_learning_rate: (2e-05, 2e-05)
130
+ - head_learning_rate: 2e-05
131
+ - loss: CosineSimilarityLoss
132
+ - distance_metric: cosine_distance
133
+ - margin: 0.25
134
+ - end_to_end: False
135
+ - use_amp: False
136
+ - warmup_proportion: 0.1
137
+ - l2_weight: 0.01
138
+ - seed: 42
139
+ - eval_max_steps: -1
140
+ - load_best_model_at_end: False
141
+
142
+ ### Training Results
143
+ | Epoch | Step | Training Loss | Validation Loss |
144
+ |:------:|:----:|:-------------:|:---------------:|
145
+ | 0.0004 | 1 | 0.2703 | - |
146
+ | 0.0176 | 50 | 0.2289 | - |
147
+ | 0.0352 | 100 | 0.2032 | - |
148
+ | 0.0528 | 150 | 0.0951 | - |
149
+ | 0.0704 | 200 | 0.0434 | - |
150
+ | 0.0881 | 250 | 0.026 | - |
151
+ | 0.1057 | 300 | 0.0299 | - |
152
+ | 0.1233 | 350 | 0.02 | - |
153
+ | 0.1409 | 400 | 0.0136 | - |
154
+ | 0.1585 | 450 | 0.013 | - |
155
+ | 0.1761 | 500 | 0.0147 | - |
156
+ | 0.1937 | 550 | 0.0144 | - |
157
+ | 0.2113 | 600 | 0.0052 | - |
158
+ | 0.2290 | 650 | 0.0067 | - |
159
+ | 0.2466 | 700 | 0.0021 | - |
160
+ | 0.2642 | 750 | 0.0038 | - |
161
+ | 0.2818 | 800 | 0.006 | - |
162
+ | 0.2994 | 850 | 0.0039 | - |
163
+ | 0.3170 | 900 | 0.0007 | - |
164
+ | 0.3346 | 950 | 0.0003 | - |
165
+ | 0.3522 | 1000 | 0.0002 | - |
166
+ | 0.3698 | 1050 | 0.0026 | - |
167
+ | 0.3875 | 1100 | 0.0027 | - |
168
+ | 0.4051 | 1150 | 0.0003 | - |
169
+ | 0.4227 | 1200 | 0.0012 | - |
170
+ | 0.4403 | 1250 | 0.0022 | - |
171
+ | 0.4579 | 1300 | 0.0027 | - |
172
+ | 0.4755 | 1350 | 0.0014 | - |
173
+ | 0.4931 | 1400 | 0.0008 | - |
174
+ | 0.5107 | 1450 | 0.0001 | - |
175
+ | 0.5284 | 1500 | 0.0013 | - |
176
+ | 0.5460 | 1550 | 0.0001 | - |
177
+ | 0.5636 | 1600 | 0.0011 | - |
178
+ | 0.5812 | 1650 | 0.0 | - |
179
+ | 0.5988 | 1700 | 0.001 | - |
180
+ | 0.6164 | 1750 | 0.0001 | - |
181
+ | 0.6340 | 1800 | 0.0002 | - |
182
+ | 0.6516 | 1850 | 0.0 | - |
183
+ | 0.6692 | 1900 | 0.0 | - |
184
+ | 0.6869 | 1950 | 0.0 | - |
185
+ | 0.7045 | 2000 | 0.0 | - |
186
+ | 0.7221 | 2050 | 0.0 | - |
187
+ | 0.7397 | 2100 | 0.0 | - |
188
+ | 0.7573 | 2150 | 0.0 | - |
189
+ | 0.7749 | 2200 | 0.0 | - |
190
+ | 0.7925 | 2250 | 0.001 | - |
191
+ | 0.8101 | 2300 | 0.0 | - |
192
+ | 0.8278 | 2350 | 0.0 | - |
193
+ | 0.8454 | 2400 | 0.0013 | - |
194
+ | 0.8630 | 2450 | 0.0 | - |
195
+ | 0.8806 | 2500 | 0.0001 | - |
196
+ | 0.8982 | 2550 | 0.0004 | - |
197
+ | 0.9158 | 2600 | 0.0 | - |
198
+ | 0.9334 | 2650 | 0.0001 | - |
199
+ | 0.9510 | 2700 | 0.0 | - |
200
+ | 0.9687 | 2750 | 0.0 | - |
201
+ | 0.9863 | 2800 | 0.0 | - |
202
+
203
+ ### Framework Versions
204
+ - Python: 3.11.11
205
+ - SetFit: 1.1.1
206
+ - Sentence Transformers: 3.4.1
207
+ - Transformers: 4.48.3
208
+ - PyTorch: 2.6.0+cu124
209
+ - Datasets: 3.4.1
210
+ - Tokenizers: 0.21.1
211
+
212
+ ## Citation
213
+
214
+ ### BibTeX
215
+ ```bibtex
216
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
217
+ doi = {10.48550/ARXIV.2209.11055},
218
+ url = {https://arxiv.org/abs/2209.11055},
219
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
220
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
221
+ title = {Efficient Few-Shot Learning Without Prompts},
222
+ publisher = {arXiv},
223
+ year = {2022},
224
+ copyright = {Creative Commons Attribution 4.0 International}
225
+ }
226
+ ```
227
+
228
+ <!--
229
+ ## Glossary
230
+
231
+ *Clearly define terms in order to be accessible across audiences.*
232
+ -->
233
+
234
+ <!--
235
+ ## Model Card Authors
236
+
237
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
238
+ -->
239
+
240
+ <!--
241
+ ## Model Card Contact
242
+
243
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
244
+ -->
config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nomic-ai/nomic-embed-text-v1.5",
3
+ "activation_function": "swiglu",
4
+ "architectures": [
5
+ "NomicBertModel"
6
+ ],
7
+ "attn_pdrop": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "nomic-ai/nomic-bert-2048--configuration_hf_nomic_bert.NomicBertConfig",
10
+ "AutoModel": "nomic-ai/nomic-bert-2048--modeling_hf_nomic_bert.NomicBertModel",
11
+ "AutoModelForMaskedLM": "nomic-ai/nomic-bert-2048--modeling_hf_nomic_bert.NomicBertForPreTraining"
12
+ },
13
+ "bos_token_id": null,
14
+ "causal": false,
15
+ "dense_seq_output": true,
16
+ "embd_pdrop": 0.0,
17
+ "eos_token_id": null,
18
+ "fused_bias_fc": true,
19
+ "fused_dropout_add_ln": true,
20
+ "initializer_range": 0.02,
21
+ "layer_norm_epsilon": 1e-12,
22
+ "max_trained_positions": 2048,
23
+ "mlp_fc1_bias": false,
24
+ "mlp_fc2_bias": false,
25
+ "model_type": "nomic_bert",
26
+ "n_embd": 768,
27
+ "n_head": 12,
28
+ "n_inner": 3072,
29
+ "n_layer": 12,
30
+ "n_positions": 8192,
31
+ "pad_vocab_size_multiple": 64,
32
+ "parallel_block": false,
33
+ "parallel_block_tied_norm": false,
34
+ "prenorm": false,
35
+ "qkv_proj_bias": false,
36
+ "reorder_and_upcast_attn": false,
37
+ "resid_pdrop": 0.0,
38
+ "rotary_emb_base": 1000,
39
+ "rotary_emb_fraction": 1.0,
40
+ "rotary_emb_interleaved": false,
41
+ "rotary_emb_scale_base": null,
42
+ "rotary_scaling_factor": null,
43
+ "scale_attn_by_inverse_layer_idx": false,
44
+ "scale_attn_weights": true,
45
+ "summary_activation": null,
46
+ "summary_first_dropout": 0.0,
47
+ "summary_proj_to_labels": true,
48
+ "summary_type": "cls_index",
49
+ "summary_use_proj": true,
50
+ "torch_dtype": "float32",
51
+ "transformers_version": "4.48.3",
52
+ "type_vocab_size": 2,
53
+ "use_cache": true,
54
+ "use_flash_attn": true,
55
+ "use_rms_norm": false,
56
+ "use_xentropy": true,
57
+ "vocab_size": 30528
58
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "Agent decision",
4
+ "Identify Current Law",
5
+ "Out of Scope",
6
+ "Practical Guidance",
7
+ "Q&A - Complex",
8
+ "SDR",
9
+ "Term of Art Interpretations & Application"
10
+ ],
11
+ "normalize_embeddings": false
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd6bca277fc50f599e6fceafac2107d02bf07f817a97667f977bfc054364a373
3
+ size 546938168
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f1d5858b381eeb77cb3bf0b5f080b658088d33119b45e5f36c35c7fd05cebf5
3
+ size 45055
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 8192,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff