Observability

Vedana is built to run under load: tracing, metrics, and Sentry integration.

OpenTelemetry

Traces are created in several places:

SpanWhereAttributes
jims.run_pipeline_with_contextjims_core.thread.thread_controllerjims.thread.id, jims.pipeline
memgraph.execute_ro_cypher_queryvedana_core.graph.MemgraphGraphmemgraph.query, memgraph.parameters
memgraph.run_cyphervedana_core.graph.MemgraphGraphmemgraph.query, memgraph.parameters, memgraph.limit
memgraph.text_searchvedana_core.graph.MemgraphGraphmemgraph.label, memgraph.fts_query, memgraph.limit
memgraph.vector_searchvedana_core.vts.MemgraphVectorStorelabel, prop_type, prop_name, top_n, threshold, query
pgvector.vector_searchvedana_core.vts.PGVectorStoresame + the generated SQL

Typical trace hierarchy (the llm.* rows below are illustrative — the actual span names come from openinference.instrumentation.litellm and use the provider/operation names from LiteLLM, e.g. litellm.completion. Vedana does not create custom llm.chat_completion_* spans itself):

gantt
    title Example OTel trace of one request (≈12.4s)
    dateFormat X
    axisFormat %s

    section Pipeline
    jims.run_pipeline_with_context :a1, 0, 12400

    section Filter step
    litellm.completion (FILTER_MODEL, structured) :a2, 100, 1200

    section Tool iter 1
    litellm.completion (with tools) :a3, 1300, 3500
    memgraph.execute_ro_cypher_query :a4, 4500, 200
    pgvector.vector_search :a5, 4500, 100

    section Tool iter 2
    litellm.completion (with tools) :a6, 5000, 4100
    memgraph.execute_ro_cypher_query :a7, 8800, 300

    section Final answer
    litellm.completion (with tools) :a8, 9300, 3000

Exporter configuration is done through the standard ENV variables of the OpenTelemetry SDK:

  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_SERVICE_NAME
  • OTEL_RESOURCE_ATTRIBUTES
  • and so on.

jims_core.util.setup_monitoring_and_tracing_with_sentry() plugs Sentry into the OTel pipeline.

Prometheus metrics

LLM (jims_core.llms.llm_provider)

MetricTypeLabelsWhat it counts
llm_calls_totalCountermodelnumber of LLM calls
llm_usage_prompt_tokens_totalCountermodelprompt tokens consumed
llm_usage_completion_tokens_totalCountermodelcompletion tokens consumed

Beyond Prometheus, LLMProvider keeps a local usage: dict[str, ModelUsage] counter with full parameters (prompt, completion, cached, request_cost). That data is included in rag.query_processed.event_data.technical_info.model_stats.

Pipeline (jims_core.thread.thread_controller)

MetricTypeLabelsWhat it measures
jims_pipeline_runs_totalCounterstatus, pipelinenumber of pipeline runs
jims_pipeline_run_duration_secondsHistogramstatus, pipelineduration with buckets 0.1…600s

status is success or failure. pipeline is the class/function name.

Starting metrics

JIMS CLIs accept a --metrics-port option. Defaults differ per service so two services can run on the same host without colliding: jims-api / jims-telegram / jims-max default to 8000; jims-widget defaults to 8001. Run, for example:

uv run python -m jims_api.main --app vedana_core.app:app --port 8080 --metrics-port 8000

--port 8080 sets the API’s own HTTP port; --metrics-port 8000 runs the Prometheus scrape endpoint at http://host:8000/metrics. setup_prometheus_metrics(port=...) brings up the standard Prometheus client HTTP server.

Sentry

Enabled by the --enable-sentry CLI flag. Configuration:

  • SENTRY_DSN — required.
  • SENTRY_ENVIRONMENT — environment name.

What goes to Sentry:

  • unhandled exceptions raised out of pipelines (anything not caught by RagPipeline’s try/except);
  • OpenTelemetry spans, via the Sentry → OTel integration (setup_monitoring_and_tracing_with_sentry registers a SentrySpanProcessor on the tracer provider).

Note on tool-call errors: there is no dedicated loguru → Sentry handler in setup_monitoring_and_tracing_with_sentry (jims_core/util.py:24-49). Tool-call errors inside LLM.create_completion_with_tools are caught with logger.exception(...) and returned to the LLM as a string — they don’t propagate out, so they only surface in Sentry if sentry_sdk’s default LoggingIntegration picks them up.

What to log

Vedana uses loguru for applications and the standard logging for libraries. Levels:

  • setup_verbose_logging() (the --verbose flag) — DEBUG for the entire project.
  • production — usually INFO, WARNING, ERROR.

Inside RagPipeline, debug logs reveal:

  • data model filtering parameters;
  • the reasoning produced by the filtering step;
  • parameters of every vts/cypher tool call;
  • end of the tool-calling loop.

Pipeline errors are written via self.logger.exception(...) — so the stack ends up in Sentry/log files but not in the user chat (where the user sees a generic “An error occurred while processing the request”).

Healthchecks

  • HTTP API (jims-api): GET /healthz{"status":"ok"} on the main HTTP port (no separate healthcheck server).
  • Web widget (jims-widget): GET /healthz on the main HTTP port. There is no --healthcheck-port flag for the widget.
  • Telegram bot (jims-telegram): separate aiohttp endpoints /health and /healthz on --healthcheck-port (default 9000).
  • Postgres: pg_isready -U postgres (compose healthcheck).
  • Grist: wget http://localhost:8484/api/status.

What to monitor in production

At minimum:

  • llm_usage_prompt_tokens_total{model} and llm_usage_completion_tokens_total{model} — cost per model.
  • jims_pipeline_run_duration_seconds_bucket{status="success"} — p50/p95/p99 of the pipeline.
  • jims_pipeline_runs_total{status="failure"} rate — error count.
  • size of the thread_events table (growth) and average thread length.
  • Memgraph CPU/RAM (it runs in IN_MEMORY_ANALYTICAL mode by default).
  • Postgres connections and lock waits.

For more dashboard ideas, see Monitoring & Metrics.