metadata
license: apache-2.0
language:
- en
- fa
- ar
inference: true
base_model:
- jinaai/jina-embeddings-v3
pipeline_tag: feature-extraction
tags:
- Embedding
library_name: transformers
This is all just for testing purposes.
my-Jira-embedding-v3
This is a sentence embedding model based on jinai/jina-embeddings-v3, fine-tuned for the task of embedding text related to Jira tickets.
This model is intended for use in tasks such as:
- Semantic search on Jira ticket descriptions and comments.
- Clustering of similar Jira tickets.
- Text similarity comparison for identifying duplicate or related issues.
Key Features:
- Extended Sequence Length: Supports up to 8192 tokens with RoPE.
- Task-Specific Embedding: Customize embeddings through the
task
argument with the following options:retrieval.query
: Used for query embeddings in asymmetric retrieval tasksretrieval.passage
: Used for passage embeddings in asymmetric retrieval tasksseparation
: Used for embeddings in clustering and re-ranking applicationsclassification
: Used for embeddings in classification taskstext-matching
: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks
- Matryoshka Embeddings: Supports flexible embedding sizes (
32, 64, 128, 256, 512, 768, 1024
), allowing for truncating embeddings to fit your application.
Example:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)
task = "retrieval.query"
embeddings = model.encode(
["What is the weather like in Berlin today?"],
task=task,
prompt_name=task,
)
Limitations
[Discuss any known limitations, e.g., performance on out-of-domain text, potential biases from the training data.]
Training Data
This model was fine-tuned on a dataset of [Describe your dataset, e.g., a collection of anonymized Jira tickets].
How to Use
You can use this model with the sentence-transformers
library: