adamchanadam/Test_Florence-2-FT-DocVQA

This model is fine-tuned from microsoft/Florence-2-base-ft for Document Visual Question Answering (DocVQA) tasks.

Model description

  • Fine-tuned for answering questions about images, specifically focused on logo recognition and company information.
  • The model uses the <DocVQA> prompt to indicate the task type.
  • Number of unique images: 28
  • Number of epochs: 7
  • Learning rate: 1e-06
  • Optimizer: AdamW
  • Early stopping: Patience of 2 epochs, delta of 0.0001

Dataset statistics: Total number of questions for fine-tuning: 560. logo_recognition: 49 (8.75%) brand_identification: 48 (8.57%) visual_elements: 65 (11.61%) text_in_logo: 57 (10.18%) industry_classification: 49 (8.75%) product_service: 55 (9.82%) company_details: 89 (15.89%) negative_sample: 148 (26.43%)

Intended use & limitations

  • Use for answering questions about logos and company information in images
  • Performance may be limited for questions or image content not represented in the training data

Training procedure

  • Images were resized and normalized according to Florence-2's preprocessing standards.
  • The <DocVQA> prompt was used during fine-tuning to indicate the task type.
  • Questions and answers were provided for each image in the training set.
  • Batch size: 4
  • Evaluation metric: Cross-entropy loss on a held-out validation set

For more information, please contact the model creators.

Downloads last month
14
Safetensors
Model size
271M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for adamchanadam/Test_Florence-2-FT-DocVQA

Finetuned
(15)
this model