Deployment

This guide describes three typical Vedana deployment scenarios: local dev, single-node production, and clustered production.

Local dev

The simplest path is docker compose -f apps/vedana/docker-compose.yml up --build -d. See Quick Start. Requires nothing beyond Docker.

Suited to:

  • development and debugging;
  • demos;
  • evaluation against the golden dataset.

Single-node production

For small/medium production, the same docker-compose setup can run on a VPS / dedicated. Minimum configuration:

ServiceRAMCPUDiskNotes
app2 GB25 GBReflex backoffice + Caddy
api1 GB1FastAPI (jims-api)
widget1 GB1if you need the widget
db4 GB250+ GBPostgres + pgvector. Disk depends on embeddings volume
memgraph8 GB+2+20 GBdata in RAM (analytical mode)
grist1 GB15 GBnot needed if using managed Grist

Minimum for a case with thousands of nodes and tens of thousands of document chunks: ~16 GB RAM, 8 vCPU, 100 GB SSD.

Production-grade topology

flowchart LR
    User[Users] --> CDN[Cloudflare /<br/>Ingress]
    CDN --> WG[vedana-widget<br/>:8090]
    CDN --> API[vedana-api<br/>:8080]
    CDN --> BO[vedana-backoffice<br/>:9000<br/>+ SSO]
    TG[Telegram BotAPI] --> TGB[vedana-telegram]

    subgraph Cluster
        WG --> CORE[vedana-core<br/>+ JIMS]
        API --> CORE
        BO --> CORE
        TGB --> CORE

        ETL[etl-cron Job] --> CORE
    end

    CORE --> PG[(Postgres<br/>+ pgvector)]
    CORE --> MG[(Memgraph)]
    CORE --> GR[(Grist)]
    CORE --> LLM[LLM Provider<br/>OpenAI / OpenRouter / VertexAI]

    CORE -.metrics.-> Prom[Prometheus]
    CORE -.traces.-> OTel[Jaeger / Tempo]
    CORE -.errors.-> Sentry[Sentry]

Things to definitely change for production

  1. Change every default credential. The repo ships with several non-empty defaults purely for local Docker Compose — they are not safe for production:
    • MEMGRAPH_PWD="modular-current-bonjour-senior-neptune-8618" in apps/vedana/.env.example,
    • SENTRY_DSN (Epoch8 demo project) in apps/vedana/.env.example,
    • POSTGRES_PASSWORD: postgres, GRIST_SESSION_SECRET: dev-secret, and GRIST_API_KEY: 095081… hard-coded in apps/vedana/docker-compose.yml. For production, generate new secrets and inject them via your secrets manager (don’t keep them in the repo).
  2. Enable TLS — Caddy can do it automatically; you only need a public hostname and DNS. You can put nginx/Cloudflare tunnel in front.
  3. Close unnecessary ports. Only the following should be public:
    • 80/443 (Caddy → backoffice / widget),
    • 443 for the api (if external),
    • everything else — private network only.
  4. Set up backups.
    • Postgres: pg_dump on a cron to S3 / managed snapshots.
    • Memgraph: snapshot + cypherl dumps (see Storage Model).
    • Grist: document export.
  5. Turn on SentrySENTRY_DSN + SENTRY_ENVIRONMENT.
  6. Turn on Prometheus scraping — on each service’s --metrics-port. Defaults differ per service to avoid port collisions when several CLIs run on the same host: jims-api / jims-telegram / jims-max default to 8000; jims-widget defaults to 8001. In a Compose / Kubernetes setup where each service runs in its own container, you can keep these defaults; if you co-locate services, override with explicit --metrics-port. See API Overview → Common CLI configuration.
  7. Pin Docker image versions — don’t use :latest for Memgraph and Grist in production. Pin tags.

A service unit (systemd, as an example)

[Unit]
Description=Vedana
Requires=docker.service
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/vedana
ExecStart=/usr/bin/docker compose -f apps/vedana/docker-compose.yml up -d
ExecStop=/usr/bin/docker compose -f apps/vedana/docker-compose.yml down
Restart=on-failure

[Install]
WantedBy=multi-user.target

Kubernetes

For scalable production it’s easier to move to Kubernetes. There are no official Helm charts at the time of writing; below are recommendations for assembling your own.

Splitting into deployments

DeploymentReplicasNotes
vedana-api2–N statelessscales horizontally. ENV from Secret.
vedana-widget2–N statelesssame
vedana-backoffice1–2 (state in DB)one replica is fine; for HA — two with sticky sessions
vedana-telegram1aiogram does long-poll itself; doesn’t scale horizontally (TG limit)
etl-cron1 (CronJob)a CronJob that calls datapipe run every N minutes

StatefulSets

StatefulSetWhat’s insidePVC
postgresPostgres + pgvectorRWO, sized to the data
memgraphMemgraph (one replica for analytics)RWO, for snapshots

For production-grade Postgres prefer an operator (Crunchy / Zalando). For Memgraph — the community / enterprise Memgraph operator (depending on licence).

Secrets and ConfigMaps

  • vedana-secrets — all passwords, API keys (MEMGRAPH_PWD, OPENAI_API_KEY, SENTRY_DSN, TELEGRAM_BOT_TOKEN, GRIST_API_KEY).
  • vedana-config — non-secret values (MODEL, EMBEDDINGS_MODEL, EMBEDDINGS_DIM).

Ingress / service mesh

  • Backoffice — behind external SSO (Cloudflare Access / Authentik).
  • API — behind a reverse proxy with rate limiting and auth headers.
  • Widget — public, with a CORS policy.

CI/CD

The repository uses auto-generated GitHub Actions via uv-workspace-codegen. Configuration lives in each library’s pyproject.toml.

Building packages:

make build           # uv build per package
make build-vedana-project   # build the Vedana Docker image

Publishing (with a GCP token):

UV_PUBLISH_USERNAME="oauth2accesstoken" \
UV_PUBLISH_PASSWORD="$(gcloud auth print-access-token)" \
make publish

In your custom CI:

  • run uv run pytest on changed packages;
  • run a smoke evaluation on a minimal golden dataset on staging;
  • build Docker, tag by git sha;
  • auto-deploy to staging, prod after approval.

Migrations

Always apply migrations before releasing a new Vedana image:

docker compose -f apps/vedana/docker-compose.yml run --rm db-migrate

Or in Kubernetes — a separate Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: vedana-migrate-{{ .Values.image.tag }}
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: vedana:{{ .Values.image.tag }}
          command: ["uv", "run", "alembic", "upgrade", "head"]
          env: ...
      restartPolicy: Never

Healthchecks for the orchestrator

ServiceEndpointDescription
apiGET /healthzreturns {"status":"ok"}
widgetGET /healthzsame
telegramGET /healthz on 9000aiohttp in the background
backofficeGET /healthzthrough the Caddy proxy
dbstandard pg_isreadyas in docker-compose
memgraphbolt-pingmgconsole or cypher-shell

Liveness — a simple /healthz. Readiness — a request that actually checks DB connectivity (you can build a /healthz/deep).

Cost optimization

See Cost Management.

What’s next