Skip to content

AmazingK2k3/Parser_evals

Repository files navigation

Parser Evaluations

A comprehensive toolkit for evaluating and comparing document parsing libraries across multiple formats (PDF, DOCX, PPTX, etc.). This repository provides automated evaluation tools using LLM-based assessment to help you choose the best parser for your use case.

🚀 Features

Three Powerful Evaluation Tools:

1. parser_evaluator.py - Text-Based (Fast & Economical)

python parser_evaluator.py document.pdf --parsers docling_default marker pymupdf_pdf
  • ⚡ Fast: 1-2 minutes
  • 💰 Economical: $1-2 per evaluation
  • 📝 Text-based assessment
  • ✅ Best for most use cases

2. parser_evaluator_vision.py - Vision-Enhanced (Accurate & Thorough)

python parser_evaluator_vision.py document.pdf --parsers docling_default marker
  • 🔍 Compares output against original PDF images
  • 👁️ Visual accuracy verification
  • 📊 More expensive but most accurate
  • ✅ Best for critical documents

3. llm_parser_evaluator.py - LLM Judge Evaluation

python llm_parser_evaluator.py evaluate output.md --parser-name docling --source-type pdf
  • 🤖 Uses Claude 3.5 Sonnet as expert judge
  • 📋 Detailed scoring across 5 criteria
  • 🔄 Batch evaluation support

📖 Full Documentation: EVALUATOR_README.md
⚡ Quick Reference: QUICK_REFERENCE.md


🛠️ Installation

Prerequisites

  • Python 3.8 or higher
  • AWS Account (for Bedrock Claude access)
  • Poppler (for PDF to image conversion - vision evaluation only)

Setup

  1. Clone the repository
git clone https://github.com/AmazingK2k3/Parser_evals.git
cd Parser_evals
  1. Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Configure environment variables
cp .env.example .env
# Edit .env and add your credentials

Required environment variables:

  • AWS_ACCESS_KEY_ID - Your AWS access key
  • AWS_SECRET_ACCESS_KEY - Your AWS secret key
  • AWS_DEFAULT_REGION - AWS region (e.g., us-east-1)
  • LLAMA_CLOUD_API_KEY - (Optional) For LlamaParse
  • OPENAI_API_KEY - (Optional) For OpenAI-based features

Additional Setup for Vision Evaluation

Windows:

  1. Download Poppler from here
  2. Extract and add poppler/bin to your PATH

Linux:

sudo apt-get install poppler-utils

macOS:

brew install poppler

📁 Repository Structure

Parser_evals/
├── .env                          # Environment variables (API keys, config)
├── requirements.txt              # Python dependencies
├── core/                         # Core configuration and utilities
│   ├── __init__.py
│   └── env_config.py            # Environment configuration loader
├── parsing_extract/              # Parser implementations
│   ├── __init__.py              # Parser factory and exports
│   ├── base_parser.py           # Abstract base parser class
│   ├── pdf_parser.py            # Basic PDF parser using LangChain
│   └── advanced_parsers.py      # Advanced parser implementations
└── *.py                         # Main evaluation scripts (see below)

🔧 Main Scripts

1. evaluate_docx_parsers.py

Purpose: Comprehensive evaluation tool specifically for DOCX-capable parsers

Key Features:

  • Tests all DOCX-compatible parsers with optimized configurations
  • Supports parsers: Python-docx, Docling, Unstructured, MarkItDown, docx2md, LlamaParse
  • Generates markdown outputs for each parser

2. generate_markdown.py

Purpose: Universal markdown generation script supporting multiple document types and parsing strategies

Key Features:

  • Supports PDF, DOCX, DOC, PPTX, and other document formats
  • Includes parsers: LlamaParse, Unstructured, PyMuPDF4LLM, MarkItDown, Marker, Docling, and DOCX-specific parsers

3. llm_parser_evaluator.py

Purpose: LLM-based evaluation system using AWS Bedrock Claude as a judge

Key Features:

  • Uses AWS Bedrock Claude-3.5-Sonnet as an expert evaluator
  • Evaluates parsing outputs based on 5 criteria: Text Accuracy, Structure Preservation, Formatting Quality, Completeness, and Readability
  • Supports both single file evaluation and parser comparison

## 🎯 Usage Examples

### Basic DOCX Evaluation
```bash
# Evaluate a single DOCX file with all parsers
python evaluate_docx_parsers.py document.docx

# Evaluate with specific parsers only
python evaluate_docx_parsers.py document.docx --parsers mammoth_docx,docling,python_docx

Markdown Generation

# Generate markdown using best quality parsers for PDF
python generate_markdown.py document.pdf --strategy best_quality

# Use specific parser with custom output directory
python generate_markdown.py document.docx --parser docling_standard --output results/

# Batch process all PDFs in a directory
python generate_markdown.py ./documents/ --batch --file-extension .pdf --strategy balanced

LLM-Based Evaluation

# Evaluate parser output quality
python llm_parser_evaluator.py evaluate docling_output.md --parser-name docling --source-type pdf

# Compare two different parser outputs
python llm_parser_evaluator.py compare docling.md mammoth.md --parser1 docling --parser2 mammoth --source-type docx

# Batch evaluate all parser outputs in a directory
python llm_parser_evaluator.py batch ./eval_results/docx_parser_results/markdown_outputs/ --source-type docx

Note: For quick evaluation, you can also directly upload documents to your preferred LLM interface along with parser outputs for comparison.


🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with LangChain
  • Evaluation powered by AWS Bedrock Claude
  • Parsers: Docling, Marker, PyMuPDF, MarkItDown, Mammoth, and more

📧 Support

For questions and support:


⭐ Star this repository if you find it helpful!

About

Repo to evaluate parsers in document extraction for RAG. Has vision and LLM as a Judge functionality to evaluate md output of parsers. Supports PDFs, Docx and other formats.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages