---
language:
- code
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:94500
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: Primary CD8+ T cells from a subject identified as CL-MCRL, exposed
to the GPR epitope with a dpi (days post-infection) of 87.5.
sentences:
- Cancer cell line (CCL23) derived from a carcinoma patient.
- Primary CD34+ human cells in three-phase in vitro culture, isolated on day 13,
with GG1dd zf vector transduction.
- 23-year-old primary nonETP leukemic blasts from bone marrow.
- source_sentence: Hematopoietic cells with PI-AnnexinV-GFP+CD33+ phenotype from a
xenograft strain NRG-3GS.
sentences:
- H9 embryonic stem cells treated with recombinant Wnt3a for 8 hours in culture.
- iCell Hepatocytes that have been treated with 075\_OLBO\_10 in a study involving
BO class and dose 10.
- 48 hour treatment of colorectal carcinoma cell line HCT116 (colorectal cancer)
with control treatment.
- source_sentence: Memory B cells derived from a female thoracic lymph node, obtained
from a donor in their seventh decade.
sentences:
- Neuron cell type from the Pulvinar of thalamus, derived from a 42-year-old human
individual.
- Germinal center B cell derived from the tonsil tissue of a 3-year-old male with
recurrent tonsillitis.
- B cell sample from a 55-year old female Asian individual with managed systemic
lupus erythematosus (SLE). The cell was derived from peripheral blood mononuclear
cells (PBMCs).
- source_sentence: Pericyte cells, part of the smooth muscle lineage, extracted from
the transition zone of a 74-year-old human prostate.
sentences:
- A CD8-positive, alpha-beta memory T cell, CD45RO-positive, specifically identified
as Tem/Effector cytotoxic T cells, as determined by CellTypist prediction. The
cell was obtained from the lung tissue of a female individual in her eighth decade.
- CD4-positive, alpha-beta T cell sample taken from a 53-year old female Asian individual
with managed systemic lupus erythematosus (SLE).
- Natural killer cell from a 32-year old female of European descent with managed
systemic lupus erythematosus (SLE).
- source_sentence: Sample is a basal cell of prostate epithelium, taken from the transition
zone of the prostate gland in a 72-year old male. It belongs to the Epithelia
lineage and Population BE.
sentences:
- Neuron cell type from a 42-year old male cerebral cortex tissue, specifically
from the rostral gyrus dorsal division of MFC A32, classified as Deep-layer corticothalamic
and 6b.
- Dendritic cell from the transition zone of prostate of a 29-year-old male, specifically
from the EREG+ population.
- Neuron from the mediodorsal nucleus of thalamus, which is part of the medial nuclear
complex of thalamus (MNC) in the thalamic complex, taken from a 42-year-old male
human donor with European ethnicity. The neuron belongs to the Thalamic excitatory
supercluster.
datasets:
- jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation
- jo-mengr/geo_70k_multiplets_natural_language_annotation
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer
results:
- task:
type: triplet
name: Triplet
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy
value: 0.9402857422828674
name: Cosine Accuracy
- type: cosine_accuracy
value: 0.9371428489685059
name: Cosine Accuracy
---
# SentenceTransformer
This is a [sentence-transformers](https://www.SBERT.net) model trained on the [cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation) and [geo_70k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation) datasets. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Maximum Sequence Length:** None tokens
- **Output Dimensionality:** None dimensions
- **Similarity Function:** Cosine Similarity
- **Training Datasets:**
- [cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation)
- [geo_70k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation)
- **Language:** code
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): MMContextEncoder(
(text_encoder): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(28996, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSdpaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(text_adapter): AdapterModule(
(net): Sequential(
(0): Linear(in_features=768, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=2048, bias=True)
(3): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(omics_adapter): AdapterModule(
(net): Sequential(
(0): Linear(in_features=64, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=2048, bias=True)
(3): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jo-mengr/mmcontext-100k-natural_language_annotation-pca-1024")
# Run inference
sentences = [
'Sample is a basal cell of prostate epithelium, taken from the transition zone of the prostate gland in a 72-year old male. It belongs to the Epithelia lineage and Population BE.',
'Neuron cell type from a 42-year old male cerebral cortex tissue, specifically from the rostral gyrus dorsal division of MFC A32, classified as Deep-layer corticothalamic and 6b.',
'Neuron from the mediodorsal nucleus of thalamus, which is part of the medial nuclear complex of thalamus (MNC) in the thalamic complex, taken from a 42-year-old male human donor with European ethnicity. The neuron belongs to the Thalamic excitatory supercluster.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Triplet
* Evaluated with [TripletEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| **cosine_accuracy** | **0.9403** |
#### Triplet
* Evaluated with [TripletEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| **cosine_accuracy** | **0.9371** |
## Training Details
### Training Datasets
#### cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation
* Dataset: [cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation) at [a6241c4](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation/tree/a6241c46b7e108ff9106fd7a1838117096e2c3c6)
* Size: 31,500 training samples
* Columns: anndata_ref
, positive
, negative_1
, and negative_2
* Approximate statistics based on the first 1000 samples:
| | anndata_ref | positive | negative_1 | negative_2 |
|:--------|:-------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------|
| type | dict | string | string | dict |
| details |
{'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_1f1c5c14-5949-4c81-b28e-b272e271b672_570'}
| Stromal cell of ovary, specifically Stroma-2, from a human adult female individual, in S phase of the cell cycle.
| Neuron cell type from a 50-year-old male human thalamic complex, specifically from the ventral anterior nucleus of thalamus within the lateral nuclear complex.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_1b9d8702-5af8-4142-85ed-020eb06ec4f6_19663'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_218acb0f-9f2f-4f76-b90b-15a4b7c7f629_34872'}
| CD8-positive, alpha-beta T cell sample from a 52-year old Asian female with managed systemic lupus erythematosus (SLE).
| Mucosal invariant T cell derived from the spleen of a female in her seventies.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_74cff64f-9da9-4b2a-9b3b-8a04a1598040_4145'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_74cff64f-9da9-4b2a-9b3b-8a04a1598040_7321'}
| Hofbauer cell derived from the decidua basalis tissue of a female individual at 8 post conception week (8_PCW). The sample is a nucleus.
| Regulatory T cell derived from a lymph node of a male individual with advanced non-small cell lung cancer (NSCLC), stage IV, who has a history of smoking.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cZdKEMQFMKGHc6E/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GDgf9MfckNmk2Bf/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/GWrtoRASdZAWdPa/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/FAiRMKztdjLYG23/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TDTo6seSi6qrGTq/download'}}, 'sample_id': 'census_5a73f63f-18a2-49b5-b431-2c469c41a41b_163'}
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
#### geo_70k_multiplets_natural_language_annotation
* Dataset: [geo_70k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation) at [449eb79](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation/tree/449eb79e41b05af4d3e32900144411963f626f8c)
* Size: 63,000 training samples
* Columns: anndata_ref
, positive
, negative_1
, and negative_2
* Approximate statistics based on the first 1000 samples:
| | anndata_ref | positive | negative_1 | negative_2 |
|:--------|:-------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------|
| type | dict | string | string | dict |
| details | {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX3111576'}
| 198Z\_MSCB-067 sample contains primary cells that are neuronal progenitors from patient type WB\_1.
| 31-year-old female Caucasian with ntm disease provided a whole blood sample on July 11, 2016. The baseline FEVPP was 89.74 and FVCpp was 129.41.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX6591734'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX7834244'}
| CD8+ T cells from a healthy skin sample, labeled C4, from plate rep1, well E6, sequencing batch b7, which passed QC, and clustered as 2\_Resid.
| 6-week-old (PCW6) neuronal epithelium tissue from donor HSB325, cultured using C1-72 chip.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX2440281'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX3112138'}
| 201Z\_MSCB-083 is a sample of primary neuronal progenitor cells from patient MD1 with no reported treatment.
| 48-hour sample from HPV-negative UPCI:SCC131 cell line, a head and neck squamous cell carcinoma (HNSCC) cell line, that has not been irradiated.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/mwyWK7cTL3j5ydA/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Tg4TMSg8gDtxJ5x/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/QjSE4s5ZHamjwfi/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/rYEATQXRJsx42Qr/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/cWgZaKPJLsgb5Zo/download'}}, 'sample_id': 'SRX7448263'}
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Evaluation Datasets
#### cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation
* Dataset: [cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation) at [a6241c4](https://huggingface.co/datasets/jo-mengr/cellxgene_pseudo_bulk_35k_multiplets_natural_language_annotation/tree/a6241c46b7e108ff9106fd7a1838117096e2c3c6)
* Size: 3,500 evaluation samples
* Columns: anndata_ref
, positive
, negative_1
, and negative_2
* Approximate statistics based on the first 1000 samples:
| | anndata_ref | positive | negative_1 | negative_2 |
|:--------|:-------------------|:-------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------|
| type | dict | string | string | dict |
| details | {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_842c6f5d-4a94-4eef-8510-8c792d1124bc_6822'}
| Non-classical monocyte cell type, derived from a fresh breast tissue sample of an African American female donor with low breast density, obese BMI, and premenopausal status. The cell was obtained through resection procedure and analyzed using single-cell transcriptomics as part of the Human Breast Cell Atlas (HBCA) study.
| Plasma cells derived from gingival tissue of a 39-year-old female.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_218acb0f-9f2f-4f76-b90b-15a4b7c7f629_23461'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_b46237d1-19c6-4af2-9335-9854634bad16_9825'}
| Enteric neuron cells derived from the ileum tissue at Carnegie stage 22.
| Ciliated cell from the trachea of a 6-12 year-old European male with no SARS-CoV-2 infection, who is a non-smoker and healthy.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_2872f4b0-b171-46e2-abc6-befcf6de6306_2871'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_d7d7e89c-c93a-422d-8958-9b4a90b69558_4209'}
| Activated CD16-positive, CD56-dim natural killer cell taken from a 26-year-old male, activated with CD3, and found to be in G1 phase.
| CD8-positive, alpha-beta thymocyte cell type derived from a 74-year-old male human with European self-reported ethnicity, located in the transition zone of the prostate.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/Zk4EtWao9WKAQKc/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/LET7EG7xi56RqMd/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/5qjxiEJwwdNHTBX/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/z4TQkdxcP3ynBMn/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/6NZ94ZLkLKYyPcY/download'}}, 'sample_id': 'census_535e9336-2d8d-43c3-944d-bcbebe20df8a_18'}
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
#### geo_70k_multiplets_natural_language_annotation
* Dataset: [geo_70k_multiplets_natural_language_annotation](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation) at [449eb79](https://huggingface.co/datasets/jo-mengr/geo_70k_multiplets_natural_language_annotation/tree/449eb79e41b05af4d3e32900144411963f626f8c)
* Size: 7,000 evaluation samples
* Columns: anndata_ref
, positive
, negative_1
, and negative_2
* Approximate statistics based on the first 1000 samples:
| | anndata_ref | positive | negative_1 | negative_2 |
|:--------|:-------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-------------------|
| type | dict | string | string | dict |
| details | {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX16033546'}
| A549 lung adenocarcinoma cell line with ectopic expression of TPK1 p.G48C mutation.
| 3 days after the 4th immunization, blood sample from donor 1033 with low antibody-dependent cellular phagocytosis (ADCP) category.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX10356703'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX8241199'}
| Human fibroblasts at the D7 time point during reprogramming into induced pluripotent stem cells (iPSCs) or hiPSCs.
| CD14+ monocytes from a healthy control participant (ID 2015).
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX14140416'}
|
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX17834359'}
| Whole blood sample from subject HRV15-017, collected at day 1 in the afternoon.
| 59 year old male bronchial epithelial cells with 39 pack years of smoking history and imaging cluster 1.
| {'file_record': {'dataset_path': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kfjX6LkLewqssdN/download', 'embeddings': {'X_geneformer': 'https://nxc-fredato.imbi.uni-freiburg.de/s/kxd2NqJjnMSArf6/download', 'X_hvg': 'https://nxc-fredato.imbi.uni-freiburg.de/s/zqPbdqn5nCgo7rb/download', 'X_pca': 'https://nxc-fredato.imbi.uni-freiburg.de/s/b7sANypKxGyYQ2J/download', 'X_scvi': 'https://nxc-fredato.imbi.uni-freiburg.de/s/TwFF6TWRp9sMxgc/download'}}, 'sample_id': 'SRX5429074'}
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `learning_rate`: 2e-05
- `num_train_epochs`: 8
- `warmup_ratio`: 0.1
- `fp16`: True
- `dataloader_num_workers`: 1
#### All Hyperparameters