🩺 Eira-0.1 Fine-Tuned Medical Chatbot

A fine-tuned version of the "Eira-0.1" Causal Language Model, designed to answer questions and generate text based on a collection of medical PDFs. This model is optimized for question answering, summarization, and chatbot-style responses grounded in the specific PDF documents it was trained on.

📘 Model Summary

This Kaggle model is a fine-tuned version of the Eira-0.1 Causal Language Model (presumably transformer-based, using AutoModelForCausalLM). It has been specifically adapted to understand and generate text based on content extracted from medical and domain-specific PDF documents.

The model is suited for interactive chat or question-answering tasks where the knowledge base is the document collection itself. It does not possess general world knowledge beyond these documents.

Training Data: Text extracted page-by-page from PDFs in /kaggle/input/dataset using PyMuPDF
Fine-Tuning: Performed over 3 epochs with no validation split
Architecture: AutoModelForCausalLM with tokenizer from base model

🚀 Usage

This model can be used for:

Question Answering: Based strictly on training PDFs
Text Generation: Mimicking tone, structure, and style of the documents
Summarization: Experimental, may require carefully structured prompts

Input / Output

Input: String prompt (e.g., "What is the recommended dosage for drug X?")
Output: Generated string response based on model knowledge

⚠️ Known Limitations

Out-of-Domain Knowledge: Hallucinations likely when asked about topics outside the PDFs
Specificity: Heavily reliant on clarity and structure of source PDFs
Overfitting: No validation set used; generalization may be weak
Repetition: May still repeat phrases during long responses
Prompt Sensitivity: Works best when phrasing is close to original document language

🖥️ System Requirements

Hardware

Training: GPU (used Kaggle GPUs like T4 or P100); CPU works but is much slower
Inference: GPU highly recommended; CPU supported with higher latency

Software

Python 3.x
PyTorch
Hugging Face Transformers
PyMuPDF (fitz)
tqdm

🧪 Implementation Details

Epochs: 3
Batch Size: 2
Tokenizer: Inherited from base model
Text Format: "filename.pdf - Page X:\n[page content]"

🧾 Model Initialization

Base Model: Eira-0.1
Base Path: /kaggle/input/eira0.1
Fine-tuned Output: /kaggle/working/eira_2_finetuned

(You should link the base model card on Hugging Face or the original source if publishing externally.)

📊 Model Stats

Size / Weights / Layers: Inherited from base Eira-0.1 [Details to be added if available]
Disk Size: Same as base model + minor weight updates
Inference Latency: Varies by hardware, prompt length, and decoding parameters

🗂️ Data Overview

Training Data

Source: PDF files from /kaggle/input/dataset
Type: [Insert description — e.g., "clinical guidelines", "patient care manuals", etc.]
Extraction Tool: PyMuPDF
Structure: Page-wise extraction with basic formatting
Size: Depends on total number and length of PDFs

Pre-processing

Whitespace trimming
Appended filename and page info to each text segment

Evaluation Data

Split: None — entire dataset used for training
Held-out Set: Not used in current pipeline

📉 Evaluation Results

Internal Evaluation: None implemented in current training script
Subgroup Performance: Not assessed
Recommendation: Use a test set of similar PDFs and metrics like ROUGE, BLEU, or manual review

⚖️ Fairness & Ethics

Fairness

No fairness metrics or evaluations were conducted
Model reflects potential biases in the training PDFs

Ethics

Misinformation Risk: May generate plausible but incorrect responses
Privacy: Ensure no private/confidential info was in training PDFs
Bias: Output may replicate bias present in source documents
Usage Guidance:
- Should not be used for real clinical advice without human validation
- Use disclaimers and human oversight in production

🚧 Usage Limitations

Sensitive Use Cases: Not suitable for deployment in high-stakes domains (medical/legal) without review
Prompt Engineering: Needed for best results
Scope: Limited to PDF content — model will not generalize beyond this

✅ Mitigation Strategies

Curate and vet training data thoroughly
Add post-processing or filters to generated output
Inform users of limitations and training scope
Include human-in-the-loop review for critical use cases

📦 Example Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "/kaggle/working/eira_2_finetuned"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
model.to("cuda" if torch.cuda.is_available() else "cpu")

def ask_eira(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            inputs.input_ids,
            max_length=200,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.2
        )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
response = ask_eira("What is the treatment protocol for asthma?")
print(response)

bockhealthbharath
/

Eira-0.1