MistralQA: An Intelligent Document Question Answering System

MistralQA is an advanced Question Answering (QA) system that extracts information from documents and provides precise answers to user queries. This project leverages the Mistral-7B language model in a Retrieval-Augmented Generation (RAG) framework, integrating various components from the LangChain library to deliver accurate responses from large PDF documents.

Features

Document Processing: Loads PDF files and splits them into manageable chunks using PyPDFLoader and RecursiveCharacterTextSplitter.
Advanced Language Model: Utilizes the Mistral-7B model, fine-tuned for instruction-based tasks, for generating nuanced and contextually accurate answers.
Efficient Embeddings and Retrieval: Implements HuggingFaceEmbeddings and stores vector representations in a Chroma vector database for fast and scalable retrieval.
Conversational Memory: Maintains context across multiple interactions using ConversationBufferMemory, supporting coherent and continuous dialogue.
Performance Optimization: Adapts to available hardware, running efficiently on both CPU and GPU.

How It Works

Load and Split Document: The system loads a PDF and splits it into chunks using LangChain components.
Create Embeddings: Converts text chunks into dense vector representations with HuggingFaceEmbeddings.
Retrieve Relevant Chunks: Uses a semantic search to find relevant text chunks based on user queries.
Generate Answers: Feeds retrieved chunks into the Mistral-7B model to generate accurate responses.
Maintain Context: Uses conversational memory to keep track of dialogue context, enhancing multi-turn interactions.

Quick Start

Prerequisites

Python 3.9 or higher
PyTorch with CUDA support (optional for GPU acceleration)
Install required Python libraries with pip install -r requirements.txt

Hardware Requirements

CPU: Minimum 32GB RAM and Intel i7 processor
GPU: Minimum RTX 3060 with 8GB RAM

Installation

Clone the Repository:

git clone https://github.com/deepaks11/MistralQA
cd RAG_llamacpp_mistral

GPU and CPU Requirements

If you're using GPU, ensure you have the following dependencies installed:

   langchain
   langchain-community
   transformers
   sentence-transformers
   conda install -c conda-forge faiss-gpu
   pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
   set LLAMA_CUDA=on
   pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python==0.2.76/whl/cu124
   PyMuPDFLoader
   chromadb

If you're using CPU, ensure you have the following dependencies installed:

   langchain
   langchain-community
   transformers
   sentence-transformers
   faiss-cpu
   torch torchvision torchaudio
   llama-cpp-python
   PyMuPDFLoader
   chromadb

Download the Mistral Model:
- Ensure the Mistral-7B model is downloaded and update the model path in the code to its location.

Running the Project

python run_cpu.py or python run_gpu.py

Interactive Mode: Enter your questions, and the system will provide answers based on the loaded document.
Exit: Type exit to end the session.

Usage Example

Enter a question: What are the main findings in the report?
Answer: The report highlights several key findings including...

Future Enhancements

Multi-document Support: Extend to query across multiple documents.
Enhanced Memory Management: Improve conversational flow handling.
Model Integration: Support additional language models.

Contribution

Contributions are welcome! Fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LangChain: For robust document processing tools.
Hugging Face: For pre-trained models and the transformers library.
Mistral AI: For the Mistral-7B model.

You can now copy this entire block at once and paste it directly into your README file on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dataset		dataset
models		models
.gitattributes		.gitattributes
README.md		README.md
requirement_cpu.txt		requirement_cpu.txt
requirement_gpu.txt		requirement_gpu.txt
run_cpu.py		run_cpu.py
run_gpu.py		run_gpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MistralQA: An Intelligent Document Question Answering System

Features

How It Works

Quick Start

Prerequisites

Hardware Requirements

Installation

GPU and CPU Requirements

Running the Project

Usage Example

Future Enhancements

Contribution

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MistralQA: An Intelligent Document Question Answering System

Features

How It Works

Quick Start

Prerequisites

Hardware Requirements

Installation

GPU and CPU Requirements

Running the Project

Usage Example

Future Enhancements

Contribution

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages