Page to MD

v1v1d 's Collections

chart

OCR

Document Undestanding Models

Table Extraction

Captioning

Layout Detection

DocQA

VQA

Page to MD

Latex Extract

updated Dec 13, 2024

A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR

Upvote

v1v1d/Arxiv_MD_v2_2k

Viewer • Updated Jun 24, 2024 • 3.04k • 18
v1v1d/Arxiv_MD_v2

Viewer • Updated Jun 24, 2024 • 14.2k • 47
v1v1d/Arxiv_MD_v1_1k

Viewer • Updated Jun 23, 2024 • 1.14k • 16
v1v1d/Arxiv_MD_v1

Viewer • Updated Jun 18, 2024 • 9.96k • 26
ClimatePolicyRadar/all-document-text-data

Viewer • Updated 15 days ago • 34.2M • 54 • 16
nz/arxiv-ocr-v0.2

Viewer • Updated Sep 19, 2024 • 160k • 449 • 8

Upvote

Collection guide
Browse collections