gabrielloiseau commited on
Commit
671eba4
·
verified ·
1 Parent(s): 3a6d00b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -96
README.md CHANGED
@@ -15,37 +15,13 @@ language:
15
 
16
  All credits go to [(Rivera-Soto et al. 2021)](https://aclanthology.org/2021.emnlp-main.70/)
17
 
18
- <!--
19
-
20
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1). It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
21
-
22
- ## Model Details
23
 
24
- ### Model Description
25
- - **Model Type:** Sentence Transformer
26
- - **Base model:** [sentence-transformers/paraphrase-distilroberta-base-v1](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1) <!-- at revision 0520e7529d15c250345a95871495ea016ca93754 -->
27
- <!--- **Maximum Sequence Length:** 128 tokens
28
- - **Output Dimensionality:** 512 tokens
29
- - **Similarity Function:** Cosine Similarity
30
- <!-- - **Training Dataset:** Unknown -->
31
- <!-- - **Language:** Unknown -->
32
- <!-- - **License:** Unknown -->
33
- <!--
34
- ### Model Sources
35
 
36
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
37
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
38
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
39
 
40
- ### Full Model Architecture
41
-
42
- ```
43
- SentenceTransformer(
44
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
45
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
46
- (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
47
- )
48
- ```
49
 
50
  ## Usage
51
 
@@ -61,7 +37,6 @@ Then you can load this model and run inference.
61
  ```python
62
  from sentence_transformers import SentenceTransformer
63
 
64
- # Download from the 🤗 Hub
65
  model = SentenceTransformer("gabrielloiseau/LUAR-MUD-sentence-transformers")
66
  # Run inference
67
  sentences = [
@@ -72,78 +47,23 @@ sentences = [
72
  embeddings = model.encode(sentences)
73
  print(embeddings.shape)
74
  # [3, 512]
75
-
76
- # Get the similarity scores for the embeddings
77
- similarities = model.similarity(embeddings, embeddings)
78
- print(similarities.shape)
79
- # [3, 3]
80
  ```
81
 
82
-
83
- ### Direct Usage (Transformers)
84
-
85
- <details><summary>Click to see the direct usage in Transformers</summary>
86
-
87
- </details>
88
- -->
89
-
90
- <!--
91
- ### Downstream Usage (Sentence Transformers)
92
-
93
- You can finetune this model on your own dataset.
94
-
95
- <details><summary>Click to expand</summary>
96
-
97
- </details>
98
- -->
99
-
100
- <!--
101
- ### Out-of-Scope Use
102
-
103
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
104
- -->
105
-
106
- <!--
107
- ## Bias, Risks and Limitations
108
-
109
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
110
- -->
111
-
112
- <!--
113
- ### Recommendations
114
-
115
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
116
- -->
117
- <!--
118
- ## Training Details
119
-
120
- ### Framework Versions
121
- - Python: 3.12.7
122
- - Sentence Transformers: 3.1.1
123
- - Transformers: 4.40.1
124
- - PyTorch: 2.4.1+cu121
125
- - Accelerate:
126
- - Datasets: 3.0.1
127
- - Tokenizers: 0.19.1
128
-
129
  ## Citation
130
 
131
- ### BibTeX
132
-
133
- <!--
134
- ## Glossary
135
 
136
- *Clearly define terms in order to be accessible across audiences.*
137
- -->
138
-
139
- <!--
140
- ## Model Card Authors
 
 
 
141
 
142
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
143
- -->
144
 
145
- <!--
146
- ## Model Card Contact
147
 
148
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
149
- -->
 
15
 
16
  All credits go to [(Rivera-Soto et al. 2021)](https://aclanthology.org/2021.emnlp-main.70/)
17
 
18
+ ---
 
 
 
 
19
 
20
+ Author Style Representations using [LUAR](https://aclanthology.org/2021.emnlp-main.70.pdf).
 
 
 
 
 
 
 
 
 
 
21
 
22
+ The LUAR training and evaluation repository can be found [here](https://github.com/llnl/luar).
 
 
23
 
24
+ This model was trained on the Reddit Million User Dataset (MUD) found [here](https://aclanthology.org/2021.naacl-main.415.pdf).
 
 
 
 
 
 
 
 
25
 
26
  ## Usage
27
 
 
37
  ```python
38
  from sentence_transformers import SentenceTransformer
39
 
 
40
  model = SentenceTransformer("gabrielloiseau/LUAR-MUD-sentence-transformers")
41
  # Run inference
42
  sentences = [
 
47
  embeddings = model.encode(sentences)
48
  print(embeddings.shape)
49
  # [3, 512]
 
 
 
 
 
50
  ```
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Citation
53
 
54
+ If you find this model helpful, feel free to cite:
 
 
 
55
 
56
+ ```
57
+ @inproceedings{uar-emnlp2021,
58
+ author = {Rafael A. Rivera Soto and Olivia Miano and Juanita Ordonez and Barry Chen and Aleem Khan and Marcus Bishop and Nicholas Andrews},
59
+ title = {Learning Universal Authorship Representations},
60
+ booktitle = {EMNLP},
61
+ year = {2021},
62
+ }
63
+ ```
64
 
65
+ ## License
 
66
 
67
+ LUAR is distributed under the terms of the Apache License (Version 2.0).
 
68
 
69
+ All new contributions must be made under the Apache-2.0 licenses.