Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.22.0
π Smart Document Parser
A powerful document parsing application that automatically extracts structured information from various document formats.
π Features
- Multiple Format Support: PDF, DOCX, TXT, HTML, and Markdown
- Rich Information Extraction:
- Document content with preserved formatting
- Comprehensive metadata
- Section breakdown
- Named entity recognition
- Smart Processing:
- Automatic format detection
- Confidence scoring
- Error handling
π― How to Use
- Upload Document: Click the upload button or drag & drop your document
- Process: Click "Process Document"
- View Results: Explore the extracted information in different tabs:
- π Content: Main document text
- π Metadata: Document properties
- π Sections: Document structure
- π·οΈ Entities: Named entities
π Supported Formats
- PDF Documents (*.pdf)
- Word Documents (*.docx)
- Text Files (*.txt)
- HTML Files (*.html)
- Markdown Files (*.md)
π οΈ Technical Details
Built with:
- Docling: Advanced document processing
- Gradio: Interactive web interface
- Pydantic: Type-safe data handling
- Hugging Face Spaces: Cloud deployment
π License
MIT License