RAG · AI · Self-Hosted · Ongoing · 2026

PaperMind

A local RAG-powered academic paper analyzer. Upload any PDF, ask questions, and get structured summaries — running 100% privately on self-hosted hardware. Zero cloud. Zero cost. Zero data leakage.

RAG LLM React + Vite FastAPI ChromaDB Ollama Self-Hosted

Status

In Development

Year

2026

Role

Solo Developer

Platform

Web

Blog

Read Blog ↗

Overview

PaperMind is a local RAG (Retrieval Augmented Generation) system I built to solve a real problem: reading dense academic papers is time-consuming, and existing AI tools like ChatGPT hallucinate when asked about documents they haven't seen.

PaperMind actually reads your PDF first — splitting it into chunks, converting each chunk to a vector embedding using nomic-embed-text, and storing them in ChromaDB. When you ask a question, it finds the most relevant chunks and sends them as context to Mistral 7B / Phi3 Mini running locally via Ollama. The LLM answers strictly from the paper — no hallucination, no guessing.

The entire stack — React frontend, FastAPI backend, ChromaDB, and local LLM — runs on a Linux mini PC via Docker Compose. No cloud dependency. No API cost. All data stays on-device.

Key Features

Natural Language Q&A

Ask anything about your paper in plain English. PaperMind retrieves the 5 most relevant chunks via cosine similarity and answers strictly from the document.

Structured Auto-Summary

One-click structured summary covering main topic, objectives, methodology, key findings, and conclusions — extracted from the paper, not guessed.

100% Local LLM

Runs Mistral 7B or Phi3 Mini via Ollama on local hardware. No OpenAI API key. No per-token cost. No data leaves the machine.

Multi-Paper Library

Upload and manage multiple PDFs simultaneously. Each paper has its own isolated vector store in ChromaDB with metadata filtering.

Privacy-First Design

Everything runs locally — ideal for unpublished research or sensitive academic work. No third-party API receives your documents.

RAG Architecture

Nginx — Reverse Proxy

Port 3001 · Routes /api/* → FastAPI

React 18 + Vite — Frontend

Static build · Lucide icons · Mobile responsive

FastAPI — Backend API

Python · /upload /ask /summarize /papers

ChromaDB — Vector Database

Persistent · Cosine similarity · Metadata filtering

Ollama — Local LLM Runtime

Mistral 7B / Phi3 Mini · nomic-embed-text · Host machine

PyMuPDF — PDF Processing

Text extraction · 800-char chunks · 100-char overlap

Build Progress

In Development

PDF Upload + Chunking100%

Vector Embedding100%

Q&A via RAG100%

Auto Summary100%

Streaming Responses~20%

Citation Highlighting0%

Tech Stack

React 18 Vite FastAPI Python ChromaDB Ollama Mistral 7B Phi3 Mini nomic-embed-text PyMuPDF LangChain Docker Nginx Ubuntu Linux

By the Numbers

4.4GB

Local LLM Size

RM 0

API Cost/Month

800

Chars per Chunk

Chunks per Query

Links

Read Full Blog Post GitHub

Local AI. Zero cost. Zero cloud.

A 4.4GB language model running on my own hardware, answering questions about research papers without sending a single byte to any external server. Built from scratch — RAG pipeline, Docker stack, systemd config and all.

Read the Full Blog