Spaces:

philocifer
/

banner_flip_engine_prototype

Paused

philocifer commited on 21 days ago

Commit

f6a2023

1 Parent(s): 51ed0dc

Added fine tuning evaluation

Files changed (3) hide show

README.md CHANGED Viewed

@@ -59,3 +59,27 @@ Weaknesses:
 3. Noise Sensitivity (0.5952) - Vulnerable to irrelevant/conflicting information
 ### Fine-Tuning Open-Source Embeddings
 https://huggingface.co/philocifer/banner-flip-arctic-embed-l

 3. Noise Sensitivity (0.5952) - Vulnerable to irrelevant/conflicting information
 ### Fine-Tuning Open-Source Embeddings
 https://huggingface.co/philocifer/banner-flip-arctic-embed-l
+### Assessing Performance of Fine-Tuned Embeddings
+| Metric                      | Score   |
+|-----------------------------|---------|
+| Context Recall              | 0.9175  |
+| Faithfulness                | 0.8203  |
+| Factual Correctness         | 0.7225  |
+| Answer Relevancy            | 0.9669  |
+| Context Entity Recall       | 0.5711  |
+| Noise Sensitivity Relevant  | 0.0000  |
+#### Evaluation Comparison
+Significant Improvements
+- Factual Correctness surged 39% (0.52 → 0.72) - Substantially more reliable answers
+- Context Recall jumped 16% (0.79 → 0.92) - Better retrieval of relevant information
+- Answer Relevancy reached near-perfect 0.97 (+7%) - Sharper focus on query intent
+Trade-offs
+- Faithfulness dipped 6% (0.87 → 0.82) - Slightly less strict adherence to source context despite better facts
+Notable Changes
+- Noise Sensitivity collapsed to 0.00 (-100%) - Complete immunity to irrelevant information (requires verification)
+- Entity Recognition improved 31% (0.44 → 0.57) - Remains a relative weakness in the system
+In the second half of the course, I will focus more on improving the SQL agent as it is much better at handling structured data in large volumes.

finetune_eval.py ADDED Viewed

+from rag_agent import load_agent, rag_agent
+from ragas_eval import run_ragas_evaluation
+from synthetic_data_gen import generate_synthetic_data
+from langchain_huggingface import HuggingFaceEmbeddings
+from dotenv import load_dotenv
+import json
+load_dotenv()
+print("Loading fine-tuned embeddings...")
+finetuned_embeddings = HuggingFaceEmbeddings(model_name="philocifer/banner-flip-arctic-embed-l")
+print("Loading fine-tuned RAG agent...")
+finetuned_rag_agent = load_agent(embeddings=finetuned_embeddings, embedding_dimension=1024)
+print("Generating synthetic data...")
+dataset = generate_synthetic_data()
+print("Running fine-tuned RAGAS evaluation...")
+finetuned_result = run_ragas_evaluation(finetuned_rag_agent, dataset)
+print(f"Fine-tuned RAGAS Evaluation Result: {finetuned_result}")
+print("Saving fine-tuned RAGAS evaluation result...")
+with open("ragas_eval/finetuned_result.json", "w") as f:
+    json.dump(finetuned_result, f)
+print("Running base RAGAS evaluation...")
+base_result = run_ragas_evaluation(rag_agent, dataset)
+print(f"Base RAGAS Evaluation Result: {base_result}")
+print("Saving base RAGAS evaluation result...")
+with open("ragas_eval/base_result.json", "w") as f:
+    json.dump(base_result, f)

rag_agent.py CHANGED Viewed

@@ -14,7 +14,7 @@ from tqdm import tqdm
 load_dotenv()
-def load_agent(embeddings=None):
     if embeddings is None:
         embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
@@ -31,7 +31,7 @@ def load_agent(embeddings=None):
     client = QdrantClient(":memory:")
     client.create_collection(
         collection_name="competitor_stores",
-        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
     )
     vector_store = QdrantVectorStore(

 load_dotenv()
+def load_agent(embeddings=None, embedding_dimension=1536):
     if embeddings is None:
         embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
     client = QdrantClient(":memory:")
     client.create_collection(
         collection_name="competitor_stores",
+        vectors_config=VectorParams(size=embedding_dimension, distance=Distance.COSINE),
     )
     vector_store = QdrantVectorStore(