A Streamlit-based web app for uploading and querying multiple PDF research papers concurrently, using FAISS for vector search, Redis for storage, Gemini 1.5 Flash API for answers, and Kubernetes/Docker for deployment.
- Upload PDFs and ask questions about their content.
- Supports multiple users with isolated sessions via unique
user_id
. - FAISS enables fast similarity search for context retrieval.
- Redis stores text chunks, embeddings, and chat history.
- Gemini 1.5 Flash API generates concise answers.
- Deployed on Kubernetes with Docker for scalability.
- Docker Desktop: Installed and running on Windows.
- Kubernetes: Enabled in Docker Desktop.
- PowerShell: For running commands.
- Gemini API Key: Obtain from Google AI Studio.
- Free disk space: ~25GB for Docker images and builds.
- Set up the Gemini API key in
k8s/deployment.yaml
andapp/gemini_api.py
:- name: GEMINI_API_KEY value: "<your-real-key>"
- Ensure Docker Desktop and Kubernetes are running.
multi_user_pdf_qa/
├── app/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py
│ ├── utils/
│ │ ├── embeddings.py
│ │ ├── pdf_processor.py
│ │ ├── redis_client.py
│ │ ├── gemini_api.py
├── k8s/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── redis/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
├── screenshots/
│ ├── streamlit_paper1.png
│ ├── streamlit_paper2.png
│ ├── streamlit_paper3.png
│ ├── streamlit_paper4.png
│ ├── streamlit_paper5.png
│ ├── docker_build.png
│ ├── kubectl_commands.png
│ ├── docker_containers.png
│ ├── docker_images.png
│ ├── docker_volumes.png
│ ├── docker_builds.png
│ ├── kubectl_docker_status.png
├── readme.markdown
- PDF Upload: Users upload PDFs via Streamlit, saved temporarily.
- Text Extraction:
PyPDF2
extracts text, split into chunks. - Embedding:
sentence-transformers
(all-MiniLM-L6-v2
) generates embeddings. - Storage: Chunks, embeddings, and FAISS index stored in Redis.
- Querying: User questions are embedded, matched via FAISS, and context sent to Gemini 1.5 Flash.
- Response: Gemini generates concise answers, stored in Redis chat history.
- Multi-User: Kubernetes/Docker ensures isolated sessions per user.
-
Build the Docker image:
cd app docker build -t multi-user-pdf-qa:latest .
See
screenshots/docker_build.png
. -
Deploy to Kubernetes:
cd ../k8s kubectl apply -f redis/deployment.yaml kubectl apply -f redis/service.yaml kubectl apply -f deployment.yaml kubectl apply -f service.yaml
See
screenshots/kubectl_commands.png
. -
Forward the Streamlit service port:
kubectl port-forward svc/streamlit-service 8501:8501
See
screenshots/kubectl_commands.png
. -
Access the app at
http://localhost:8501
.- Upload a PDF (e.g., Generative Agents Interactive Simulacra of Human Behavior.pdf).
- Ask: “What is this paper about in one line?”
- See
screenshots/streamlit_paper1.png
tostreamlit_paper5.png
for 5 papers.
-
Verify system status:
kubectl get pods kubectl get svc docker ps
See
screenshots/kubectl_docker_status.png
. -
Check Docker Desktop:
- Containers:
screenshots/docker_containers.png
- Images:
screenshots/docker_images.png
- Volumes:
screenshots/docker_volumes.png
- Builds:
screenshots/docker_builds.png
- Containers:
streamlit==1.45.1
: Web app interface.redis==3.2.0
: Stores chunks, embeddings, and history.faiss-cpu==1.7.4
: Vector similarity search.PyPDF2==3.0.1
: PDF text extraction.sentence-transformers==2.2.2
: Text embeddings.google-generativeai==0.8.5
: Gemini 1.5 Flash API.torch==2.0.1
: ML framework.transformers==4.28.1
: Hugging Face models.huggingface_hub==0.16.4
: Model access.numpy==1.23.1
: Numerical computations.
Tested with 5 research papers. Screenshots are available in the screenshots/
folder: