README.md · Sajjad313/my-Jira-embedding-v3 at main

metadata

license: apache-2.0
language:
  - en
  - fa
  - ar
inference: true
base_model:
  - jinaai/jina-embeddings-v3
pipeline_tag: feature-extraction
tags:
  - Embedding
library_name: transformers

This is all just for testing purposes.

my-Jira-embedding-v3

This is a sentence embedding model based on jinai/jina-embeddings-v3, fine-tuned for the task of embedding text related to Jira tickets.

This model is intended for use in tasks such as:

Semantic search on Jira ticket descriptions and comments.
Clustering of similar Jira tickets.
Text similarity comparison for identifying duplicate or related issues.

Key Features:

Extended Sequence Length: Supports up to 8192 tokens with RoPE.
Task-Specific Embedding: Customize embeddings through the task argument with the following options:
- retrieval.query: Used for query embeddings in asymmetric retrieval tasks
- retrieval.passage: Used for passage embeddings in asymmetric retrieval tasks
- separation: Used for embeddings in clustering and re-ranking applications
- classification: Used for embeddings in classification tasks
- text-matching: Used for embeddings in tasks that quantify similarity between two texts, such as STS or symmetric retrieval tasks
Matryoshka Embeddings: Supports flexible embedding sizes (32, 64, 128, 256, 512, 768, 1024), allowing for truncating embeddings to fit your application.

Example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)

task = "retrieval.query"
embeddings = model.encode(
    ["What is the weather like in Berlin today?"],
    task=task,
    prompt_name=task,
)

Limitations

[Discuss any known limitations, e.g., performance on out-of-domain text, potential biases from the training data.]

Training Data

This model was fine-tuned on a dataset of [Describe your dataset, e.g., a collection of anonymized Jira tickets].

How to Use

You can use this model with the sentence-transformers library: