Fine-Tuned CLIP-GPT2 Model for Image Captioning
This is a fine-tuned version of CLIP-GPT2 for real-time image captioning to aid the visually impaired.
Model Details:
- Base Model: CLIP ViT-B/32
- Fine-Tuned On: VizWiz dataset
- Format: SafeTensors
- Usage:
from transformers import CLIPProcessor, CLIPModel from PIL import Image model = CLIPModel.from_pretrained("vidi-deshp/clip-gpt2-finetuned") processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") image = Image.open("sample.jpg") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs)
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support