|
--- |
|
license: cc-by-4.0 |
|
datasets: |
|
- CATMuS/medieval-segmentation |
|
pipeline_tag: object-detection |
|
tags: |
|
- medieval |
|
- manuscript |
|
--- |
|
|
|
# Florence 2 Medieval Zone Object Detection |
|
|
|
This is Microsoft's Florence 2 model trained for 10 epochs with [CATMuS Medieval Segmentation dataset](https://huggingface.co/datasets/CATMuS/medieval-segmentation) with a learn rate of `1e-6`. This model would not be possible without the numerous annotators behind the various datasets available on HTR-United (See dataset for details). A special thanks to [Thibault ClΓ©rice](https://huggingface.co/ponteineptique) who converted the original CATMuS dataset (for HTR) to a segmentation dataset. |
|
|
|
# Model Details |
|
|
|
- **Developed by**: [William J.B. Mattingly](https://huggingface.co/wjbmattingly) |
|
- **License**: CC-BY 4.0 |
|
- **Finetuned from model**: [Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) |
|
|
|
## Labels |
|
|
|
The following table describes the labels, the ones used to train this model, the counts of those labels (multiples per image), and the definition of those labels with a link to the original documentation. |
|
|
|
| Label | Zone | Line | Train Count | Validation Count | Test Count | Definition | |
|
|-------|------|------|-------------|------------------|------------|------------| |
|
| DefaultLine | | β | 81702 | 13554 | 12209 | [A line of text that is not distinguished by any particular features and is part of the main text flow.](https://segmonto.github.io/gd/gdL/DefaultLine/) | |
|
| InterlinearLine | | β | 2808 | 27 | 2234 | [A line of text written between two lines of main text, typically containing glosses, translations, or comments.](https://segmonto.github.io/gd/gdL/InterlinearLine/) | |
|
| MainZone | β | | 2314 | 365 | 275 | [The main textual zone of a page, usually containing the main body of text.](https://segmonto.github.io/gd/gdZ/MainZone/) | |
|
| HeadingLine | | β | 1381 | 701 | 135 | [A line of text that functions as a heading or title for a section of the main text.](https://segmonto.github.io/gd/gdL/HeadingLine/) | |
|
| MarginTextZone | β | | 916 | 146 | 199 | [A text zone in the margin of a page, often containing annotations, commentaries, or other secondary information.](https://segmonto.github.io/gd/gdZ/MarginTextZone/) | |
|
| DropCapitalZone | β | | 1566 | 102 | 124 | [A zone containing a large ornamental initial letter of a paragraph or section, typically extending below the first line of text.](https://segmonto.github.io/gd/gdZ/DropCapitalZone/) | |
|
| NumberingZone | β | | 632 | 102 | 94 | [A zone containing page numbers, folio numbers, or other numerical identifiers for the page.](https://segmonto.github.io/gd/gdZ/NumberingZone/) | |
|
| TironianSignLine | | | 282 | 0 | 0 | [A line containing Tironian notes, an ancient system of shorthand.](https://segmonto.github.io/gd/gdL/TironianSignLine/) | |
|
| DropCapitalLine | | | 1175 | 105 | 92 | [A line of text that begins with a drop capital.](https://segmonto.github.io/gd/gdL/DropCapitalLine/) | |
|
| RunningTitleZone | β | | 340 | 91 | 18 | [A zone containing a running title, typically located at the top of a page and repeating throughout a section or the entire document.](https://segmonto.github.io/gd/gdZ/RunningTitleZone/) | |
|
| GraphicZone | β | | 300 | 7 | 10 | [A zone containing non-textual elements such as images, drawings, or decorative elements.](https://segmonto.github.io/gd/gdZ/GraphicZone/) | |
|
| DigitizationArtefactZone | | | 28 | 0 | 0 | [A zone containing artefacts from the digitization process, such as color bars or reference marks.](https://segmonto.github.io/gd/gdZ/DigitizationArtefactZone/) | |
|
| QuireMarksZone | β | | 86 | 9 | 8 | [A zone containing marks used to indicate the gathering or quire to which a leaf belongs, often found at the bottom of the page.](https://segmonto.github.io/gd/gdZ/QuireMarksZone/) | |
|
| StampZone | β | | 39 | 5 | 4 | [A zone containing a stamp, such as a library stamp or ownership mark.](https://segmonto.github.io/gd/gdZ/StampZone/) | |
|
| DamageZone | β | | 12 | 1 | 0 | [A zone indicating an area of the page that has been damaged or is otherwise illegible due to physical deterioration.](https://segmonto.github.io/gd/gdZ/DamageZone/) | |
|
| MusicZone | β | | 179 | 0 | 0 | [A zone containing musical notation.](https://segmonto.github.io/gd/gdZ/MusicZone/) | |
|
| MusicLine | | | 167 | 0 | 0 | [A line containing musical notation.](https://segmonto.github.io/gd/gdL/MusicLine/) | |
|
| TitlePageZone | β | | 4 | 1 | 1 | [A zone encompassing the entire title page of a book or document.](https://segmonto.github.io/gd/gdZ/TitlePageZone/) | |
|
| SealZone | β | | 3 | 0 | 0 | [A zone containing a seal, typically used for authentication or closure of a document.](https://segmonto.github.io/gd/gdZ/SealZone/) | |
|
|
|
|
|
# How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. All models are trained with float16. |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
import os |
|
from unittest.mock import patch |
|
|
|
import requests |
|
from PIL import Image |
|
from transformers import AutoModelForCausalLM, AutoProcessor |
|
from transformers.dynamic_module_utils import get_imports |
|
import matplotlib.pyplot as plt |
|
import matplotlib.patches as patches |
|
|
|
# Mac solution => https://huggingface.co/microsoft/Florence-2-large-ft/discussions/4 |
|
def fixed_get_imports(filename: str | os.PathLike) -> list[str]: |
|
"""Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72.""" |
|
if not str(filename).endswith("/modeling_florence2.py"): |
|
return get_imports(filename) |
|
imports = get_imports(filename) |
|
imports.remove("flash_attn") |
|
return imports |
|
|
|
|
|
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): |
|
|
|
model = AutoModelForCausalLM.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True) |
|
processor = AutoProcessor.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True) |
|
|
|
def process_image(url): |
|
prompt = "<OD>" |
|
|
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
inputs = processor(text=prompt, images=image, return_tensors="pt") |
|
|
|
generated_ids = model.generate( |
|
input_ids=inputs["input_ids"], |
|
pixel_values=inputs["pixel_values"], |
|
max_new_tokens=1024, |
|
do_sample=False, |
|
num_beams=3 |
|
) |
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] |
|
|
|
result = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height)) |
|
return result, image |
|
|
|
|
|
image = "https://huggingface.co/datasets/CATMuS/medieval-segmentation/resolve/main/data/train/cambridge-corpus-christi-college-ms-111/page-002-of-003.jpg" |
|
|
|
result, image = process_image(image) |
|
fig, ax = plt.subplots(1, figsize=(15, 15)) |
|
ax.imshow(image) |
|
|
|
# Add bounding boxes and labels to the plot |
|
for bbox, label in zip(result['<OD>']['bboxes'], result['<OD>']['labels']): |
|
x, y, width, height = bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1] |
|
rect = patches.Rectangle((x, y), width, height, linewidth=2, edgecolor='r', facecolor='none') |
|
ax.add_patch(rect) |
|
plt.text(x, y, label, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5)) |
|
|
|
# Display the plot |
|
plt.show() |
|
``` |