Update README.md

1b3d138 verified 10 months ago

7.35 kB

	---
	license: cc-by-4.0
	datasets:
	- CATMuS/medieval-segmentation
	pipeline_tag: object-detection
	tags:
	- medieval
	- manuscript
	---

	# Florence 2 Medieval Zone Object Detection

	This is Microsoft's Florence 2 model trained for 10 epochs with [CATMuS Medieval Segmentation dataset](https://huggingface.co/datasets/CATMuS/medieval-segmentation) with a learn rate of `1e-6`. This model would not be possible without the numerous annotators behind the various datasets available on HTR-United (See dataset for details). A special thanks to [Thibault Clérice](https://huggingface.co/ponteineptique) who converted the original CATMuS dataset (for HTR) to a segmentation dataset.

	# Model Details

	- Developed by: [William J.B. Mattingly](https://huggingface.co/wjbmattingly)
	- License: CC-BY 4.0
	- Finetuned from model: [Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft)

	## Labels

	The following table describes the labels, the ones used to train this model, the counts of those labels (multiples per image), and the definition of those labels with a link to the original documentation.

	\| Label \| Zone \| Line \| Train Count \| Validation Count \| Test Count \| Definition \|
	\|-------\|------\|------\|-------------\|------------------\|------------\|------------\|
	\| DefaultLine \| \| ✓ \| 81702 \| 13554 \| 12209 \| [A line of text that is not distinguished by any particular features and is part of the main text flow.](https://segmonto.github.io/gd/gdL/DefaultLine/) \|
	\| InterlinearLine \| \| ✓ \| 2808 \| 27 \| 2234 \| [A line of text written between two lines of main text, typically containing glosses, translations, or comments.](https://segmonto.github.io/gd/gdL/InterlinearLine/) \|
	\| MainZone \| ✓ \| \| 2314 \| 365 \| 275 \| [The main textual zone of a page, usually containing the main body of text.](https://segmonto.github.io/gd/gdZ/MainZone/) \|
	\| HeadingLine \| \| ✓ \| 1381 \| 701 \| 135 \| [A line of text that functions as a heading or title for a section of the main text.](https://segmonto.github.io/gd/gdL/HeadingLine/) \|
	\| MarginTextZone \| ✓ \| \| 916 \| 146 \| 199 \| [A text zone in the margin of a page, often containing annotations, commentaries, or other secondary information.](https://segmonto.github.io/gd/gdZ/MarginTextZone/) \|
	\| DropCapitalZone \| ✓ \| \| 1566 \| 102 \| 124 \| [A zone containing a large ornamental initial letter of a paragraph or section, typically extending below the first line of text.](https://segmonto.github.io/gd/gdZ/DropCapitalZone/) \|
	\| NumberingZone \| ✓ \| \| 632 \| 102 \| 94 \| [A zone containing page numbers, folio numbers, or other numerical identifiers for the page.](https://segmonto.github.io/gd/gdZ/NumberingZone/) \|
	\| TironianSignLine \| \| \| 282 \| 0 \| 0 \| [A line containing Tironian notes, an ancient system of shorthand.](https://segmonto.github.io/gd/gdL/TironianSignLine/) \|
	\| DropCapitalLine \| \| \| 1175 \| 105 \| 92 \| [A line of text that begins with a drop capital.](https://segmonto.github.io/gd/gdL/DropCapitalLine/) \|
	\| RunningTitleZone \| ✓ \| \| 340 \| 91 \| 18 \| [A zone containing a running title, typically located at the top of a page and repeating throughout a section or the entire document.](https://segmonto.github.io/gd/gdZ/RunningTitleZone/) \|
	\| GraphicZone \| ✓ \| \| 300 \| 7 \| 10 \| [A zone containing non-textual elements such as images, drawings, or decorative elements.](https://segmonto.github.io/gd/gdZ/GraphicZone/) \|
	\| DigitizationArtefactZone \| \| \| 28 \| 0 \| 0 \| [A zone containing artefacts from the digitization process, such as color bars or reference marks.](https://segmonto.github.io/gd/gdZ/DigitizationArtefactZone/) \|
	\| QuireMarksZone \| ✓ \| \| 86 \| 9 \| 8 \| [A zone containing marks used to indicate the gathering or quire to which a leaf belongs, often found at the bottom of the page.](https://segmonto.github.io/gd/gdZ/QuireMarksZone/) \|
	\| StampZone \| ✓ \| \| 39 \| 5 \| 4 \| [A zone containing a stamp, such as a library stamp or ownership mark.](https://segmonto.github.io/gd/gdZ/StampZone/) \|
	\| DamageZone \| ✓ \| \| 12 \| 1 \| 0 \| [A zone indicating an area of the page that has been damaged or is otherwise illegible due to physical deterioration.](https://segmonto.github.io/gd/gdZ/DamageZone/) \|
	\| MusicZone \| ✓ \| \| 179 \| 0 \| 0 \| [A zone containing musical notation.](https://segmonto.github.io/gd/gdZ/MusicZone/) \|
	\| MusicLine \| \| \| 167 \| 0 \| 0 \| [A line containing musical notation.](https://segmonto.github.io/gd/gdL/MusicLine/) \|
	\| TitlePageZone \| ✓ \| \| 4 \| 1 \| 1 \| [A zone encompassing the entire title page of a book or document.](https://segmonto.github.io/gd/gdZ/TitlePageZone/) \|
	\| SealZone \| ✓ \| \| 3 \| 0 \| 0 \| [A zone containing a seal, typically used for authentication or closure of a document.](https://segmonto.github.io/gd/gdZ/SealZone/) \|


	# How to Get Started with the Model

	Use the code below to get started with the model. All models are trained with float16.

	```python
	import requests
	from PIL import Image
	from transformers import AutoProcessor, AutoModelForCausalLM
	import os
	from unittest.mock import patch

	import requests
	from PIL import Image
	from transformers import AutoModelForCausalLM, AutoProcessor
	from transformers.dynamic_module_utils import get_imports
	import matplotlib.pyplot as plt
	import matplotlib.patches as patches

	# Mac solution => https://huggingface.co/microsoft/Florence-2-large-ft/discussions/4
	def fixed_get_imports(filename: str \| os.PathLike) -> list[str]:
	"""Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
	if not str(filename).endswith("/modeling_florence2.py"):
	return get_imports(filename)
	imports = get_imports(filename)
	imports.remove("flash_attn")
	return imports


	with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):

	model = AutoModelForCausalLM.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
	processor = AutoProcessor.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)

	def process_image(url):
	prompt = "<OD>"

	image = Image.open(requests.get(url, stream=True).raw)

	inputs = processor(text=prompt, images=image, return_tensors="pt")

	generated_ids = model.generate(
	input_ids=inputs["input_ids"],
	pixel_values=inputs["pixel_values"],
	max_new_tokens=1024,
	do_sample=False,
	num_beams=3
	)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

	result = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
	return result, image


	image = "https://huggingface.co/datasets/CATMuS/medieval-segmentation/resolve/main/data/train/cambridge-corpus-christi-college-ms-111/page-002-of-003.jpg"

	result, image = process_image(image)
	fig, ax = plt.subplots(1, figsize=(15, 15))
	ax.imshow(image)

	# Add bounding boxes and labels to the plot
	for bbox, label in zip(result['<OD>']['bboxes'], result['<OD>']['labels']):
	x, y, width, height = bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]
	rect = patches.Rectangle((x, y), width, height, linewidth=2, edgecolor='r', facecolor='none')
	ax.add_patch(rect)
	plt.text(x, y, label, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))

	# Display the plot
	plt.show()
	```