conversational PDF chatbot using Open AI and FastApi
PDF Chatbot is an intelligent conversational AI system that allows users to upload PDF documents and engage in question-answering interactions based on the document's content. Built with FastAPI and leveraging OpenAI's powerful language models, this application demonstrates the practical implementation of Retrieval-Augmented Generation (RAG) in a real-world scenario.
- PDF Upload: Users can upload PDF documents to the system.
- Text Extraction: Automatically extracts text content from uploaded PDFs.
- Intelligent Chunking: Splits extracted text into manageable chunks for processing.
- Vector Embedding: Creates and stores vector embeddings of text chunks for efficient retrieval.
- Conversational AI: Enables users to ask questions about the uploaded document and receive contextually relevant answers.
- RAG Implementation: Utilizes Retrieval-Augmented Generation to provide accurate and context-aware responses.
- FastAPI: For creating robust and high-performance API endpoints.
- OpenAI API: Leverages GPT-3.5-turbo for natural language understanding and generation.
- LangChain: Facilitates the creation of the conversational retrieval chain.
- FAISS: Efficient similarity search and clustering of dense vectors.
- PyPDF: For extracting text from PDF documents.
- Pydantic: Data validation and settings management using Python type annotations.
- Python-dotenv: Management of environment variables.
- Clone this repository:
- Install the required dependencies: pip install -r requirements.txt
- Set up your OpenAI API key in a
.envfile: OPENAI_API_KEY=your_api_key_here
-
Start the FastAPI server:
-
Access the API documentation at
http://localhost:8000/docs -
Use the
/uploadendpoint to upload a PDF file. -
Use the
/queryendpoint to ask questions about the uploaded PDF.
GET /: Root endpoint, returns a welcome message.POST /upload: Uploads a PDF file and processes it for querying.POST /query: Accepts a query about the uploaded PDF and returns an AI-generated response.
Run the unit tests using pytest: