Parser Evaluations

A comprehensive toolkit for evaluating and comparing document parsing libraries across multiple formats (PDF, DOCX, PPTX, etc.). This repository provides automated evaluation tools using LLM-based assessment to help you choose the best parser for your use case.

🚀 Features

Three Powerful Evaluation Tools:

1. `parser_evaluator.py` - Text-Based (Fast & Economical)

python parser_evaluator.py document.pdf --parsers docling_default marker pymupdf_pdf

⚡ Fast: 1-2 minutes
💰 Economical: $1-2 per evaluation
📝 Text-based assessment
✅ Best for most use cases

2. `parser_evaluator_vision.py` - Vision-Enhanced (Accurate & Thorough)

python parser_evaluator_vision.py document.pdf --parsers docling_default marker

🔍 Compares output against original PDF images
👁️ Visual accuracy verification
📊 More expensive but most accurate
✅ Best for critical documents

3. `llm_parser_evaluator.py` - LLM Judge Evaluation

python llm_parser_evaluator.py evaluate output.md --parser-name docling --source-type pdf

🤖 Uses Claude 3.5 Sonnet as expert judge
📋 Detailed scoring across 5 criteria
🔄 Batch evaluation support

📖 Full Documentation: EVALUATOR_README.md
⚡ Quick Reference: QUICK_REFERENCE.md

🛠️ Installation

Prerequisites

Python 3.8 or higher
AWS Account (for Bedrock Claude access)
Poppler (for PDF to image conversion - vision evaluation only)

Setup

Clone the repository

git clone https://github.com/AmazingK2k3/Parser_evals.git
cd Parser_evals

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Configure environment variables

cp .env.example .env
# Edit .env and add your credentials

Required environment variables:

AWS_ACCESS_KEY_ID - Your AWS access key
AWS_SECRET_ACCESS_KEY - Your AWS secret key
AWS_DEFAULT_REGION - AWS region (e.g., us-east-1)
LLAMA_CLOUD_API_KEY - (Optional) For LlamaParse
OPENAI_API_KEY - (Optional) For OpenAI-based features

Additional Setup for Vision Evaluation

Windows:

Download Poppler from here
Extract and add poppler/bin to your PATH

Linux:

sudo apt-get install poppler-utils

macOS:

brew install poppler

📁 Repository Structure

Parser_evals/
├── .env                          # Environment variables (API keys, config)
├── requirements.txt              # Python dependencies
├── core/                         # Core configuration and utilities
│   ├── __init__.py
│   └── env_config.py            # Environment configuration loader
├── parsing_extract/              # Parser implementations
│   ├── __init__.py              # Parser factory and exports
│   ├── base_parser.py           # Abstract base parser class
│   ├── pdf_parser.py            # Basic PDF parser using LangChain
│   └── advanced_parsers.py      # Advanced parser implementations
└── *.py                         # Main evaluation scripts (see below)

🔧 Main Scripts

1. `evaluate_docx_parsers.py`

Purpose: Comprehensive evaluation tool specifically for DOCX-capable parsers

Key Features:

Tests all DOCX-compatible parsers with optimized configurations
Supports parsers: Python-docx, Docling, Unstructured, MarkItDown, docx2md, LlamaParse
Generates markdown outputs for each parser

2. `generate_markdown.py`

Purpose: Universal markdown generation script supporting multiple document types and parsing strategies

Key Features:

Supports PDF, DOCX, DOC, PPTX, and other document formats
Includes parsers: LlamaParse, Unstructured, PyMuPDF4LLM, MarkItDown, Marker, Docling, and DOCX-specific parsers

3. `llm_parser_evaluator.py`

Purpose: LLM-based evaluation system using AWS Bedrock Claude as a judge

Key Features:

Uses AWS Bedrock Claude-3.5-Sonnet as an expert evaluator
Evaluates parsing outputs based on 5 criteria: Text Accuracy, Structure Preservation, Formatting Quality, Completeness, and Readability
Supports both single file evaluation and parser comparison


## 🎯 Usage Examples

### Basic DOCX Evaluation
```bash
# Evaluate a single DOCX file with all parsers
python evaluate_docx_parsers.py document.docx

# Evaluate with specific parsers only
python evaluate_docx_parsers.py document.docx --parsers mammoth_docx,docling,python_docx

Markdown Generation

# Generate markdown using best quality parsers for PDF
python generate_markdown.py document.pdf --strategy best_quality

# Use specific parser with custom output directory
python generate_markdown.py document.docx --parser docling_standard --output results/

# Batch process all PDFs in a directory
python generate_markdown.py ./documents/ --batch --file-extension .pdf --strategy balanced

LLM-Based Evaluation

# Evaluate parser output quality
python llm_parser_evaluator.py evaluate docling_output.md --parser-name docling --source-type pdf

# Compare two different parser outputs
python llm_parser_evaluator.py compare docling.md mammoth.md --parser1 docling --parser2 mammoth --source-type docx

# Batch evaluate all parser outputs in a directory
python llm_parser_evaluator.py batch ./eval_results/docx_parser_results/markdown_outputs/ --source-type docx

Note: For quick evaluation, you can also directly upload documents to your preferred LLM interface along with parser outputs for comparison.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with LangChain
Evaluation powered by AWS Bedrock Claude
Parsers: Docling, Marker, PyMuPDF, MarkItDown, Mammoth, and more

📧 Support

For questions and support:

📖 Check the EVALUATOR_README.md for detailed documentation
⚡ See QUICK_REFERENCE.md for quick command examples
🐛 Report bugs via GitHub Issues

⭐ Star this repository if you find it helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parser Evaluations

🚀 Features

1. `parser_evaluator.py` - Text-Based (Fast & Economical)

2. `parser_evaluator_vision.py` - Vision-Enhanced (Accurate & Thorough)

3. `llm_parser_evaluator.py` - LLM Judge Evaluation

🛠️ Installation

Prerequisites

Setup

Additional Setup for Vision Evaluation

📁 Repository Structure

🔧 Main Scripts

1. `evaluate_docx_parsers.py`

2. `generate_markdown.py`

3. `llm_parser_evaluator.py`

Markdown Generation

LLM-Based Evaluation

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📧 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
core		core
parsing_extract		parsing_extract
.env.example		.env.example
.gitignore		.gitignore
EVALUATOR_README.md		EVALUATOR_README.md
LICENSE		LICENSE
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
SUMMARY.md		SUMMARY.md
evaluate_docx_parsers.py		evaluate_docx_parsers.py
generate_markdown.py		generate_markdown.py
llm_parser_evaluator.py		llm_parser_evaluator.py
parser_evaluator.py		parser_evaluator.py
parser_evaluator_vision.py		parser_evaluator_vision.py
requirements.txt		requirements.txt
test_enhanced_parsers.py		test_enhanced_parsers.py

Folders and files

Latest commit

History

Repository files navigation

Parser Evaluations

🚀 Features

1. parser_evaluator.py - Text-Based (Fast & Economical)

2. parser_evaluator_vision.py - Vision-Enhanced (Accurate & Thorough)

3. llm_parser_evaluator.py - LLM Judge Evaluation

🛠️ Installation

Prerequisites

Setup

Additional Setup for Vision Evaluation

📁 Repository Structure

🔧 Main Scripts

1. evaluate_docx_parsers.py

2. generate_markdown.py

3. llm_parser_evaluator.py

Markdown Generation

LLM-Based Evaluation

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📧 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `parser_evaluator.py` - Text-Based (Fast & Economical)

2. `parser_evaluator_vision.py` - Vision-Enhanced (Accurate & Thorough)

3. `llm_parser_evaluator.py` - LLM Judge Evaluation

1. `evaluate_docx_parsers.py`

2. `generate_markdown.py`

3. `llm_parser_evaluator.py`

Packages