A comprehensive toolkit for evaluating and comparing document parsing libraries across multiple formats (PDF, DOCX, PPTX, etc.). This repository provides automated evaluation tools using LLM-based assessment to help you choose the best parser for your use case.
Three Powerful Evaluation Tools:
python parser_evaluator.py document.pdf --parsers docling_default marker pymupdf_pdf- ⚡ Fast: 1-2 minutes
- 💰 Economical: $1-2 per evaluation
- 📝 Text-based assessment
- ✅ Best for most use cases
python parser_evaluator_vision.py document.pdf --parsers docling_default marker- 🔍 Compares output against original PDF images
- 👁️ Visual accuracy verification
- 📊 More expensive but most accurate
- ✅ Best for critical documents
python llm_parser_evaluator.py evaluate output.md --parser-name docling --source-type pdf- 🤖 Uses Claude 3.5 Sonnet as expert judge
- 📋 Detailed scoring across 5 criteria
- 🔄 Batch evaluation support
📖 Full Documentation: EVALUATOR_README.md
⚡ Quick Reference: QUICK_REFERENCE.md
- Python 3.8 or higher
- AWS Account (for Bedrock Claude access)
- Poppler (for PDF to image conversion - vision evaluation only)
- Clone the repository
git clone https://github.com/AmazingK2k3/Parser_evals.git
cd Parser_evals- Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Configure environment variables
cp .env.example .env
# Edit .env and add your credentialsRequired environment variables:
AWS_ACCESS_KEY_ID- Your AWS access keyAWS_SECRET_ACCESS_KEY- Your AWS secret keyAWS_DEFAULT_REGION- AWS region (e.g., us-east-1)LLAMA_CLOUD_API_KEY- (Optional) For LlamaParseOPENAI_API_KEY- (Optional) For OpenAI-based features
Windows:
- Download Poppler from here
- Extract and add
poppler/binto your PATH
Linux:
sudo apt-get install poppler-utilsmacOS:
brew install popplerParser_evals/
├── .env # Environment variables (API keys, config)
├── requirements.txt # Python dependencies
├── core/ # Core configuration and utilities
│ ├── __init__.py
│ └── env_config.py # Environment configuration loader
├── parsing_extract/ # Parser implementations
│ ├── __init__.py # Parser factory and exports
│ ├── base_parser.py # Abstract base parser class
│ ├── pdf_parser.py # Basic PDF parser using LangChain
│ └── advanced_parsers.py # Advanced parser implementations
└── *.py # Main evaluation scripts (see below)
Purpose: Comprehensive evaluation tool specifically for DOCX-capable parsers
Key Features:
- Tests all DOCX-compatible parsers with optimized configurations
- Supports parsers: Python-docx, Docling, Unstructured, MarkItDown, docx2md, LlamaParse
- Generates markdown outputs for each parser
Purpose: Universal markdown generation script supporting multiple document types and parsing strategies
Key Features:
- Supports PDF, DOCX, DOC, PPTX, and other document formats
- Includes parsers: LlamaParse, Unstructured, PyMuPDF4LLM, MarkItDown, Marker, Docling, and DOCX-specific parsers
Purpose: LLM-based evaluation system using AWS Bedrock Claude as a judge
Key Features:
- Uses AWS Bedrock Claude-3.5-Sonnet as an expert evaluator
- Evaluates parsing outputs based on 5 criteria: Text Accuracy, Structure Preservation, Formatting Quality, Completeness, and Readability
- Supports both single file evaluation and parser comparison
## 🎯 Usage Examples
### Basic DOCX Evaluation
```bash
# Evaluate a single DOCX file with all parsers
python evaluate_docx_parsers.py document.docx
# Evaluate with specific parsers only
python evaluate_docx_parsers.py document.docx --parsers mammoth_docx,docling,python_docx
# Generate markdown using best quality parsers for PDF
python generate_markdown.py document.pdf --strategy best_quality
# Use specific parser with custom output directory
python generate_markdown.py document.docx --parser docling_standard --output results/
# Batch process all PDFs in a directory
python generate_markdown.py ./documents/ --batch --file-extension .pdf --strategy balanced# Evaluate parser output quality
python llm_parser_evaluator.py evaluate docling_output.md --parser-name docling --source-type pdf
# Compare two different parser outputs
python llm_parser_evaluator.py compare docling.md mammoth.md --parser1 docling --parser2 mammoth --source-type docx
# Batch evaluate all parser outputs in a directory
python llm_parser_evaluator.py batch ./eval_results/docx_parser_results/markdown_outputs/ --source-type docxNote: For quick evaluation, you can also directly upload documents to your preferred LLM interface along with parser outputs for comparison.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with LangChain
- Evaluation powered by AWS Bedrock Claude
- Parsers: Docling, Marker, PyMuPDF, MarkItDown, Mammoth, and more
For questions and support:
- 📖 Check the EVALUATOR_README.md for detailed documentation
- ⚡ See QUICK_REFERENCE.md for quick command examples
- 🐛 Report bugs via GitHub Issues
⭐ Star this repository if you find it helpful!