|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- fa |
|
- ar |
|
inference: true |
|
base_model: |
|
- jinaai/jina-embeddings-v3 |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- Embedding |
|
library_name: transformers |
|
--- |
|
|
|
This is all just for testing purposes. |
|
|
|
## my-Jira-embedding-v3 |
|
|
|
This is a sentence embedding model based on [jinai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3), fine-tuned for the task of embedding text related to Jira tickets. |
|
|
|
This model is intended for use in tasks such as: |
|
- Semantic search on Jira ticket descriptions and comments. |
|
- Clustering of similar Jira tickets. |
|
- Text similarity comparison for identifying duplicate or related issues. |
|
|
|
## Key Features: |
|
- **Extended Sequence Length:** Supports up to 8192 tokens with RoPE. |
|
- **Task-Specific Embedding:** Customize embeddings through the `task` argument with the following options: |
|
- `retrieval.query`: Used for query embeddings in asymmetric retrieval tasks |
|
- `retrieval.passage`: Used for passage embeddings in asymmetric retrieval tasks |
|
- `separation`: Used for embeddings in clustering and re-ranking applications |
|
- `classification`: Used for embeddings in classification tasks |
|
- `text-matching`: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks |
|
- **Matryoshka Embeddings**: Supports flexible embedding sizes (`32, 64, 128, 256, 512, 768, 1024`), allowing for truncating embeddings to fit your application. |
|
|
|
## Example: |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True) |
|
|
|
task = "retrieval.query" |
|
embeddings = model.encode( |
|
["What is the weather like in Berlin today?"], |
|
task=task, |
|
prompt_name=task, |
|
) |
|
``` |
|
|
|
## Limitations |
|
|
|
[Discuss any known limitations, e.g., performance on out-of-domain text, potential biases from the training data.] |
|
|
|
## Training Data |
|
|
|
This model was fine-tuned on a dataset of [Describe your dataset, e.g., a collection of anonymized Jira tickets]. |
|
|
|
## How to Use |
|
|
|
You can use this model with the `sentence-transformers` library: |