Fine-tuned TrOCR Model for Portuguese
This repository contains a fine-tuned TrOCR model specifically trained for Optical Character Recognition (OCR) on Portuguese text. It's based on the microsoft/trocr-base-printed model and has been further trained on a dataset of Portuguese text images.
Model Description
The model is a VisionEncoderDecoderModel from the Hugging Face Transformers library. It combines a vision encoder (to process images) and a text decoder (to generate text) for OCR tasks.
- Base Model: microsoft/trocr-base-printed
- Fine-tuning Dataset: mazafard/portugues_ocr_dataset_full
- Language: Portuguese
Intended Use
This model is intended for extracting text from images containing Portuguese text. It can be used for various applications, such as:
- Digitizing Portuguese books and documents
- Automating data entry from Portuguese forms and invoices
- Extracting information from Portuguese screenshots or scanned images
How to Use
1. Install Dependencies:
bash pip install transformers datasets Pillow requests
2. Load the Model and Processor:
python from transformers import VisionEncoderDecoderModel, TrOCRProcessor from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("mazafard/trocr-finetuned_20250422_125947")
processor = TrOCRProcessor.from_pretrained("mazafard/trocr-finetuned_20250422_125947")
image = Image.open("path/to/your/image.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
Generate prediction
generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Limitations
- The model may not perform well on handwritten text or text with unusual fonts or styles.
- It might make mistakes on complex layouts or images with low quality.
Training Details
- Dataset:
- Training Parameters:
training_args = TrainingArguments(
output_dir="./trocr-finetuned",
per_device_train_batch_size=56,
num_train_epochs=3,
save_steps=500,
logging_steps=50,
learning_rate=5e-5,
gradient_accumulation_steps=2,
fp16=True,
save_total_limit=2,
remove_unused_columns=False,
dataloader_num_workers=2,
)
Evaluation
Acknowledgements
- This model is based on the TrOCR model by Microsoft.
License
- Downloads last month
- 30
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mazafard/trocr-finetuned_20250422_125947
Base model
microsoft/trocr-base-printedDataset used to train mazafard/trocr-finetuned_20250422_125947
Evaluation results
- Character Error Rate on portugues_ocr_dataset_fullself-reported0.010
- Word Error Rate on portugues_ocr_dataset_fullself-reported0.050