demo.mp4
The Agentic RAG System is an AI-powered document intelligence platform that enables users to extract insights from uploaded files (PDFs, Word docs, text) or web URLs through natural language queries. Built with Python/Flask and LangChain, the system uses a multi-agent workflow to intelligently process documents, retrieve relevant information from a vector database (ChromaDB), and generate human-like answers—seamlessly falling back to Wikipedia when needed. The responsive web interface (HTML/CSS/Bootstrap) allows users to ask questions conversationally, while the modular backend demonstrates robust error handling, logging, and secure file processing.
Try it now: AutoDocThinker: Agentic RAG System with Intelligent Search Engine
| # | Module | Technology Stack | Your Implementation Details |
|---|---|---|---|
| 1 | LLM Processing | Groq + LLaMA-3-70B | Configured with optimal temperature (0.2) and token limits |
| 2 | Document Parsing | PyMuPDF + python-docx | Handled PDF, DOCX, TXT with metadata preservation |
| 3 | Text Chunking | RecursiveCharacterTextSplitter | 500-character chunks with 20% overlap for context |
| 4 | Vector Embeddings | all-MiniLM-L6-v2 | Efficient 384-dimensional embeddings |
| 5 | Vector Database | ChromaDB | Local persistent storage with cosine similarity |
| 6 | Agent Workflow | LangGraph | 7 specialized nodes with conditional routing |
| 7 | Planner Agent | LangGraph Planner Node | Generates execution plans |
| 8 | Executor Agent | LangGraph Node | Orchestrates tool calls |
| 9 | Web Fallback | Wikipedia API | Auto-triggered when document confidence < threshold |
| 10 | Memory System | deque(maxlen=3) | Maintained conversation history buffer |
| 11 | User Interface | HTML, CSS, Bootstrap, JS | Interactive web app with file, URL, Text upload |
| 12 | Containerization | Docker | Portable deployment |
| 13 | CI/CD Pipeline | GitHub Actions | Automated linting/testing |
AutoDocThinker/
├── .github/
│ └── workflows/
│ └── main.yml
│
├── agents/
│ ├── init.py
│ ├── document_processor.py
│ └── orchestration.py
│
├── data/
│ └── sample.pdf
│
├── notebooks/
│ └── experiment.ipynb
│
├── static/
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── script.js
│
├── templates/
│ └── index.html
│
├── tests/
│ └── test_app.py
│
├── uploads/
│
├── vector_db/
│ └── chroma_collection/
│ └── chroma.sqlite3
│
├── app.log
├── app.py
├── demo.mp4
├── demo.png
├── Dockerfile
├── LICENSE
├── render.yaml
├── README.md
├── requirements.txt
└── setup.py
%% Agentic RAG System Architecture - Colorful Version
graph TD
A[User Interface]:::ui -->|Upload/Input| B[Flask Web Server]:::server
B --> C[Tool Router Agent]:::router
C -->|File| D[Document Processor]:::processor
C -->|URL| E[Web Scraper]:::scraper
C -->|Text| F[Text Preprocessor]:::preprocessor
D --> G[PDF/DOCX/TXT Parser]:::parser
E --> H[URL Content Extractor]:::extractor
F --> I[Text Chunker]:::chunker
G --> J[Chunking & Embedding]:::embedding
H --> J
I --> J
J --> K[Vector Database]:::database
B -->|Query| L[Planner Agent]:::planner
L -->|Has Documents| M[Retriever Agent]:::retriever
L -->|No Documents| N[Fallback Agent]:::fallback
M --> K
K --> O[LLM Answer Agent]:::llm
N --> P[Wikipedia API]:::api
P --> O
O --> Q[Response Formatter]:::formatter
Q --> B
B --> A
classDef ui fill:#4e79a7,color:white,stroke:#333;
classDef server fill:#f28e2b,color:white,stroke:#333;
classDef router fill:#e15759,color:white,stroke:#333;
classDef processor fill:#76b7b2,color:white,stroke:#333;
classDef scraper fill:#59a14f,color:white,stroke:#333;
classDef preprocessor fill:#edc948,color:#333,stroke:#333;
classDef parser fill:#b07aa1,color:white,stroke:#333;
classDef extractor fill:#ff9da7,color:#333,stroke:#333;
classDef chunker fill:#9c755f,color:white,stroke:#333;
classDef embedding fill:#bab0ac,color:#333,stroke:#333;
classDef database fill:#8cd17d,color:#333,stroke:#333;
classDef planner fill:#499894,color:white,stroke:#333;
classDef retriever fill:#86bcb6,color:#333,stroke:#333;
classDef fallback fill:#f1ce63,color:#333,stroke:#333;
classDef llm fill:#d37295,color:white,stroke:#333;
classDef api fill:#a0d6e5,color:#333,stroke:#333;
classDef formatter fill:#b3b3b3,color:#333,stroke:#333;
- Corporate HR Automation
- Legal Document Review
- Academic Research
- Customer Support
- Healthcare Compliance
- Financial Analysis
- Media Monitoring
- Education
- Technical Documentation
- Government Transparency
# 1. Clone the repository
git clone https://github.com/Md-Emon-Hasan/AutoDocThinker.git
cd AutoDocThinker
# 2. Install dependencies
pip install -r requirements.txtOr with Docker:
# Build Docker Image
docker build -t auto-doc-thinker .
# Run the container
docker run -p 8501:8501 auto-doc-thinker.github/workflows/main.yml
name: CI
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
flake8 .- Multilingual document ingestion
- Audio document ingestion + whisper
- Long-term memory + history viewer
- MongoDB/FAISS alternative for Chroma
- More tools (WolframAlpha, SerpAPI)
- Model selection dropdown (Gemini, LLaMA, GPT-4)
Md Emon Hasan
Email: [email protected]
LinkedIn: md-emon-hasan
GitHub: Md-Emon-Hasan
Facebook: mdemon.hasan2001/
WhatsApp: 8801834363533