Fine-tuned TrOCR Model for Portuguese

This repository contains a fine-tuned TrOCR model specifically trained for Optical Character Recognition (OCR) on Portuguese text. It's based on the microsoft/trocr-base-printed model and has been further trained on a dataset of Portuguese text images.

Model Description

The model is a VisionEncoderDecoderModel from the Hugging Face Transformers library. It combines a vision encoder (to process images) and a text decoder (to generate text) for OCR tasks.

  • Base Model: microsoft/trocr-base-printed
  • Fine-tuning Dataset: mazafard/portugues_ocr_dataset_full
  • Language: Portuguese

Intended Use

This model is intended for extracting text from images containing Portuguese text. It can be used for various applications, such as:

  • Digitizing Portuguese books and documents
  • Automating data entry from Portuguese forms and invoices
  • Extracting information from Portuguese screenshots or scanned images

How to Use

1. Install Dependencies:

bash pip install transformers datasets Pillow requests

2. Load the Model and Processor:

python from transformers import VisionEncoderDecoderModel, TrOCRProcessor from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("mazafard/trocr-finetuned_20250422_125947") 
processor = TrOCRProcessor.from_pretrained("mazafard/trocr-finetuned_20250422_125947") 

image = Image.open("path/to/your/image.png").convert("RGB")

pixel_values = processor(image, return_tensors="pt").pixel_values

Generate prediction
generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Limitations

  • The model may not perform well on handwritten text or text with unusual fonts or styles.
  • It might make mistakes on complex layouts or images with low quality.

Training Details

  • Dataset:
  • Training Parameters:
training_args = TrainingArguments(
    output_dir="./trocr-finetuned",
    per_device_train_batch_size=56,
    num_train_epochs=3,
    save_steps=500,
    logging_steps=50,
    learning_rate=5e-5,
    gradient_accumulation_steps=2,
    fp16=True,
    save_total_limit=2,
    remove_unused_columns=False,
     dataloader_num_workers=2,
)

Evaluation

Acknowledgements

  • This model is based on the TrOCR model by Microsoft.

License

Downloads last month
30
Safetensors
Model size
334M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mazafard/trocr-finetuned_20250422_125947

Finetuned
(17)
this model

Dataset used to train mazafard/trocr-finetuned_20250422_125947

Evaluation results