File size: 2,125 Bytes
ec9cb9b
 
 
 
 
 
b778da1
ec9cb9b
 
 
 
 
8ee6b79
682d56b
 
 
4c35118
54b187c
4c35118
4ea7e83
4c35118
 
 
 
 
 
54b187c
 
 
 
 
 
 
 
 
4c35118
54b187c
4c35118
 
 
b778da1
4c35118
b778da1
 
 
 
 
 
54b187c
 
 
 
 
 
 
 
 
 
 
 
2f72455
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: apache-2.0
language:
- en
- fa
- ar
inference: true
base_model:
- jinaai/jina-embeddings-v3
pipeline_tag: feature-extraction
tags:
- Embedding
library_name: transformers
---

This is all just for testing purposes.

## my-Jira-embedding-v3

This is a sentence embedding model based on [jinai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3), fine-tuned for the task of embedding text related to Jira tickets.

This model is intended for use in tasks such as:
- Semantic search on Jira ticket descriptions and comments.
- Clustering of similar Jira tickets.
- Text similarity comparison for identifying duplicate or related issues.

## Key Features:
- **Extended Sequence Length:** Supports up to 8192 tokens with RoPE.
- **Task-Specific Embedding:** Customize embeddings through the `task` argument with the following options:
    - `retrieval.query`: Used for query embeddings in asymmetric retrieval tasks
    - `retrieval.passage`: Used for passage embeddings in asymmetric retrieval tasks
    - `separation`: Used for embeddings in clustering and re-ranking applications
    - `classification`: Used for embeddings in classification tasks
    - `text-matching`: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks
- **Matryoshka Embeddings**: Supports flexible embedding sizes (`32, 64, 128, 256, 512, 768, 1024`), allowing for truncating embeddings to fit your application.

## Example:
```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)

task = "retrieval.query"
embeddings = model.encode(
    ["What is the weather like in Berlin today?"],
    task=task,
    prompt_name=task,
)
```

## Limitations

[Discuss any known limitations, e.g., performance on out-of-domain text, potential biases from the training data.]

## Training Data

This model was fine-tuned on a dataset of [Describe your dataset, e.g., a collection of anonymized Jira tickets].

## How to Use

You can use this model with the `sentence-transformers` library: