Paperless-ngx Tutorial: Build a Paperless Office Hub with Docker
Scan, OCR, and automatically organize your receipts and documents. Deploy Paperless-ngx on a VPS using Docker Compose for digital document storage.
How to Self-Host Paperless-ngx on a VPS with Docker Compose
Paperless-ngx is an enterprise-grade document management system (DMS) that transforms your physical documents into a searchable, organized digital archive. While SQLite is suitable for testing, running Paperless-ngx in a production VPS environment requires a robust architecture: a PostgreSQL database for metadata, a Redis instance for task queues/caching, and a reverse proxy for SSL termination.
This guide provides a production-hardened docker-compose.yml configuration and explains the technical mechanics of document archiving, database bindings, OCR optimization, and backup strategies.
Architectural Overview
A robust Paperless-ngx deployment consists of three primary components communicating over an isolated Docker bridge network:
- Webserver & Worker (The Core Application): Runs Django, Gunicorn, and Celery. It handles the web UI, processes OCR tasks, parses metadata, and manages document storage.
- Database (PostgreSQL): Stores document metadata, user settings, tagging configurations, and indexing references. PostgreSQL is preferred over SQLite for parallel write handling and crash resilience.
- Message Broker & Cache (Redis): Handles Celery task distribution (e.g., orchestrating OCR worker tasks) and serves as an in-memory cache for web sessions.
+----------------------------------+
| Reverse Proxy |
| (Nginx / Caddy) |
+----------------------------------+
| (Port 8000)
v
+----------------------------------+
| Paperless-ngx Container |
| (Django, Gunicorn, Celery) |
+----------------------------------+
/ | \
/ | \
+--------------------+ +---------------+ +--------------------+
| PostgreSQL (DB) | | Redis (Queue) | | Host Filesystem |
| (Port 5432 - Int) | | (Port 6379) | | (Consume, Media) |
+--------------------+ +---------------+ +--------------------+
Directory Structure and Permissions
Before deploying, organize the host file system. Paperless-ngx processes files from a "consume" folder and saves the processed results into a "media" folder.
To prevent ownership conflicts between the host user and the Docker container processes, identify your host user's UID and GID:
id -u # Typically 1000
id -g # Typically 1000
Create the directory structure on the VPS:
sudo mkdir -p /opt/paperless/{config,data,media,consume,export,pgdata,redisdata}
sudo chown -R 1000:1000 /opt/paperless
/opt/paperless/consume: Place files here to be automatically ingested./opt/paperless/media: Stores original documents and generated PDF/A files./opt/paperless/data: Application state (index files, temporary scratchpad)./opt/paperless/export: Backup destination./opt/paperless/pgdata: Persistent PostgreSQL data.
Production Docker Compose Configuration
The following docker-compose.yml defines the multi-container setup. It includes strict health checks to guarantee that dependent services (PostgreSQL and Redis) are fully operational before the Paperless web server initializes.
Create /opt/paperless/docker-compose.yml:
version: '3.8'
services:
redis:
image: docker.io/library/redis:7-alpine
container_name: paperless-redis
restart: unless-stopped
volumes:
- /opt/paperless/redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
db:
image: docker.io/library/postgres:16-alpine
container_name: paperless-db
restart: unless-stopped
volumes:
- /opt/paperless/pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless_usr
# Set a strong password in production
POSTGRES_PASSWORD: super_secret_db_password_here
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless_usr -d paperless"]
interval: 10s
timeout: 5s
retries: 5
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:2.9.0
container_name: paperless-webserver
restart: unless-stopped
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
ports:
- "127.0.0.1:8000:8000"
volumes:
- /opt/paperless/config:/usr/src/paperless/data
- /opt/paperless/data:/usr/src/paperless/data/index
- /opt/paperless/media:/usr/src/paperless/media
- /opt/paperless/consume:/usr/src/paperless/consume
- /opt/paperless/export:/usr/src/paperless/export
environment:
# PUID/PGID to match host permissions
USERMAP_UID: 1000
USERMAP_GID: 1000
# Database Configuration
PAPERLESS_DBENGINE: postgresql
PAPERLESS_DBHOST: db
PAPERLESS_DBPORT: 5432
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless_usr
PAPERLESS_DBPASS: super_secret_db_password_here
# Broker & Cache
PAPERLESS_REDIS: redis://redis:6379
# Security & URL Bindings
# Change this to your actual sub-domain
PAPERLESS_URL: https://paperless.yourdomain.com
# Generate a random 64-character alphanumeric string for SECRET_KEY
PAPERLESS_SECRET_KEY: "change_me_to_a_random_long_secret_key"
PAPERLESS_TIME_ZONE: "UTC"
PAPERLESS_OCR_LANGUAGE: "eng"
# OCR Tuning
PAPERLESS_OCR_MODE: skip_noarchive
PAPERLESS_TASK_WORKERS: 2
PAPERLESS_THREADS_PER_WORKER: 1
Technical Deep-Dive: Database and Broker Bindings
Redis Connection Protocol
Paperless-ngx relies on Redis for Celery tasks. The PAPERLESS_REDIS configuration takes a URI format: redis://redis:6379. Because Docker automatically maps container names to internal IPs using its embedded DNS server, the hostname redis correctly resolves to the Redis container.
For security, the Redis container does not expose port 6379 to the host system. It is only accessible to containers on the shared bridge network.
PostgreSQL Connection Lifecycle
During boot, the webserver service waits for db to pass its health check (pg_isready). Once healthy, Django runs migrations to build the schema. By default, Paperless utilizes a connection pool to minimize the overhead of opening and closing database TCP sockets.
Document Archiving and OCR Internals
When a PDF or image enters the /opt/paperless/consume directory, a Celery worker is spawned to process it. The core pipeline consists of:
- Extraction: Reading textual content if it already exists.
- OCR (Tesseract): If no text is found, Tesseract analyzes the document.
- Archiving: Creating a standardized PDF/A version.
Critical OCR Configurations
To control OCR behaviors and VPS resource usage, configure the following environmental variables:
PAPERLESS_OCR_MODE:redo: Re-run OCR even if text is present. Use this for scanned documents containing bad OCR headers.skip: Skip OCR if text is already present (e.g., native digital documents).skip_noarchive: Skip OCR if text is present, but still generate a standardized PDF/A file for long-term archiving. (Recommended).
PAPERLESS_OCR_USER_ARGS: A JSON string containing custom configuration flags passed directly to Tesseract. For example:{"tessedit_char_whitelist": "0123456789"}to restrict scanning to numerical digits for invoices.PAPERLESS_FILENAME_FORMAT: Allows dynamic directory structuring. Example:{created_year}/{correspondent}/{title}. If undefined, Paperless stores documents flat in the media folder using database-indexed names.
Optimizing Resources on Single-Core / Low-Ram VPSs
By default, Paperless tries to consume all available CPU threads for OCR operations. This can crash a small VPS (1GB - 2GB RAM). Limit resource usage using:
PAPERLESS_TASK_WORKERS: 1
PAPERLESS_THREADS_PER_WORKER: 1
Reverse Proxy Configuration (Nginx)
To secure Paperless-ngx with Let's Encrypt SSL, configure Nginx on your host VPS to proxy requests to container port 8000.
Save the following configuration to /etc/nginx/sites-available/paperless:
server {
listen 80;
server_name paperless.yourdomain.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name paperless.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/paperless.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/paperless.yourdomain.com/privkey.pem;
# Crucial for large document uploads
client_max_body_size 100M;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for paperless live progress bars
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Backup and Maintenance Workflows
A proper backup consists of exporting the database schema, indexing configuration, and media files.
1. Document Exporter Tool
Paperless-ngx provides an internal command that dumps files, database entries, and configs into a structure optimized for portability.
Run the exporter container command:
docker compose exec webserver document_exporter ../export
2. Database Dumps
Alternatively, backup the raw database directly using pg_dump:
docker compose exec db pg_dump -U paperless_usr paperless > /opt/paperless/backup/db_backup_$(date +%F).sql
Include the /opt/paperless/media, /opt/paperless/config, and database dumps in your external backup destination (e.g., Restic, rsync, or AWS S3).