docling_free / README-HF.md
hellorahulk's picture
Initial commit with document parser
15fdcff

A newer version of the Gradio SDK is available: 5.22.0

Upgrade

πŸ“„ Smart Document Parser

A powerful document parsing application that automatically extracts structured information from various document formats.

πŸš€ Features

  • Multiple Format Support: PDF, DOCX, TXT, HTML, and Markdown
  • Rich Information Extraction:
    • Document content with preserved formatting
    • Comprehensive metadata
    • Section breakdown
    • Named entity recognition
  • Smart Processing:
    • Automatic format detection
    • Confidence scoring
    • Error handling

🎯 How to Use

  1. Upload Document: Click the upload button or drag & drop your document
  2. Process: Click "Process Document"
  3. View Results: Explore the extracted information in different tabs:
    • πŸ“ Content: Main document text
    • πŸ“Š Metadata: Document properties
    • πŸ“‘ Sections: Document structure
    • 🏷️ Entities: Named entities

πŸ“‹ Supported Formats

  • PDF Documents (*.pdf)
  • Word Documents (*.docx)
  • Text Files (*.txt)
  • HTML Files (*.html)
  • Markdown Files (*.md)

πŸ› οΈ Technical Details

Built with:

  • Docling: Advanced document processing
  • Gradio: Interactive web interface
  • Pydantic: Type-safe data handling
  • Hugging Face Spaces: Cloud deployment

πŸ“ License

MIT License