Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
v1v1d 's Collections
chart
OCR
Document Undestanding Models
Table Extraction
Captioning
Layout Detection
DocQA
VQA
Page to MD
Latex Extract

Page to MD

updated Dec 13, 2024

A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR

Upvote
-

  • v1v1d/Arxiv_MD_v2_2k

    Viewer • Updated Jun 24, 2024 • 3.04k • 18

  • v1v1d/Arxiv_MD_v2

    Viewer • Updated Jun 24, 2024 • 14.2k • 47

  • v1v1d/Arxiv_MD_v1_1k

    Viewer • Updated Jun 23, 2024 • 1.14k • 16

  • v1v1d/Arxiv_MD_v1

    Viewer • Updated Jun 18, 2024 • 9.96k • 26

  • ClimatePolicyRadar/all-document-text-data

    Viewer • Updated 15 days ago • 34.2M • 54 • 16

  • nz/arxiv-ocr-v0.2

    Viewer • Updated Sep 19, 2024 • 160k • 449 • 8
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs