Observability
Vedana is built to run under load: tracing, metrics, and Sentry integration.
OpenTelemetry
Traces are created in several places:
| Span | Where | Attributes |
|---|---|---|
jims.run_pipeline_with_context | jims_core.thread.thread_controller | jims.thread.id, jims.pipeline |
memgraph.execute_ro_cypher_query | vedana_core.graph.MemgraphGraph | memgraph.query, memgraph.parameters |
memgraph.run_cypher | vedana_core.graph.MemgraphGraph | memgraph.query, memgraph.parameters, memgraph.limit |
memgraph.text_search | vedana_core.graph.MemgraphGraph | memgraph.label, memgraph.fts_query, memgraph.limit |
memgraph.vector_search | vedana_core.vts.MemgraphVectorStore | label, prop_type, prop_name, top_n, threshold, query |
pgvector.vector_search | vedana_core.vts.PGVectorStore | same + the generated SQL |
Typical trace hierarchy (the llm.* rows below are illustrative — the actual span names come from openinference.instrumentation.litellm and use the provider/operation names from LiteLLM, e.g. litellm.completion. Vedana does not create custom llm.chat_completion_* spans itself):
gantt
title Example OTel trace of one request (≈12.4s)
dateFormat X
axisFormat %s
section Pipeline
jims.run_pipeline_with_context :a1, 0, 12400
section Filter step
litellm.completion (FILTER_MODEL, structured) :a2, 100, 1200
section Tool iter 1
litellm.completion (with tools) :a3, 1300, 3500
memgraph.execute_ro_cypher_query :a4, 4500, 200
pgvector.vector_search :a5, 4500, 100
section Tool iter 2
litellm.completion (with tools) :a6, 5000, 4100
memgraph.execute_ro_cypher_query :a7, 8800, 300
section Final answer
litellm.completion (with tools) :a8, 9300, 3000
Exporter configuration is done through the standard ENV variables of the OpenTelemetry SDK:
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_SERVICE_NAMEOTEL_RESOURCE_ATTRIBUTES- and so on.
jims_core.util.setup_monitoring_and_tracing_with_sentry() plugs Sentry into the OTel pipeline.
Prometheus metrics
LLM (jims_core.llms.llm_provider)
| Metric | Type | Labels | What it counts |
|---|---|---|---|
llm_calls_total | Counter | model | number of LLM calls |
llm_usage_prompt_tokens_total | Counter | model | prompt tokens consumed |
llm_usage_completion_tokens_total | Counter | model | completion tokens consumed |
Beyond Prometheus, LLMProvider keeps a local usage: dict[str, ModelUsage] counter with full parameters (prompt, completion, cached, request_cost). That data is included in rag.query_processed.event_data.technical_info.model_stats.
Pipeline (jims_core.thread.thread_controller)
| Metric | Type | Labels | What it measures |
|---|---|---|---|
jims_pipeline_runs_total | Counter | status, pipeline | number of pipeline runs |
jims_pipeline_run_duration_seconds | Histogram | status, pipeline | duration with buckets 0.1…600s |
status is success or failure. pipeline is the class/function name.
Starting metrics
JIMS CLIs accept a --metrics-port option. Defaults differ per service so two services can run on the same host without colliding: jims-api / jims-telegram / jims-max default to 8000; jims-widget defaults to 8001. Run, for example:
uv run python -m jims_api.main --app vedana_core.app:app --port 8080 --metrics-port 8000
--port 8080 sets the API’s own HTTP port; --metrics-port 8000 runs the Prometheus scrape endpoint at http://host:8000/metrics. setup_prometheus_metrics(port=...) brings up the standard Prometheus client HTTP server.
Sentry
Enabled by the --enable-sentry CLI flag. Configuration:
SENTRY_DSN— required.SENTRY_ENVIRONMENT— environment name.
What goes to Sentry:
- unhandled exceptions raised out of pipelines (anything not caught by
RagPipeline’stry/except); - OpenTelemetry spans, via the Sentry → OTel integration (
setup_monitoring_and_tracing_with_sentryregisters aSentrySpanProcessoron the tracer provider).
Note on tool-call errors: there is no dedicated
loguru→ Sentry handler insetup_monitoring_and_tracing_with_sentry(jims_core/util.py:24-49). Tool-call errors insideLLM.create_completion_with_toolsare caught withlogger.exception(...)and returned to the LLM as a string — they don’t propagate out, so they only surface in Sentry ifsentry_sdk’s defaultLoggingIntegrationpicks them up.
What to log
Vedana uses loguru for applications and the standard logging for libraries. Levels:
setup_verbose_logging()(the--verboseflag) —DEBUGfor the entire project.- production — usually
INFO,WARNING,ERROR.
Inside RagPipeline, debug logs reveal:
- data model filtering parameters;
- the reasoning produced by the filtering step;
- parameters of every vts/cypher tool call;
- end of the tool-calling loop.
Pipeline errors are written via self.logger.exception(...) — so the stack ends up in Sentry/log files but not in the user chat (where the user sees a generic “An error occurred while processing the request”).
Healthchecks
- HTTP API (
jims-api):GET /healthz→{"status":"ok"}on the main HTTP port (no separate healthcheck server). - Web widget (
jims-widget):GET /healthzon the main HTTP port. There is no--healthcheck-portflag for the widget. - Telegram bot (
jims-telegram): separateaiohttpendpoints/healthand/healthzon--healthcheck-port(default 9000). - Postgres:
pg_isready -U postgres(compose healthcheck). - Grist:
wget http://localhost:8484/api/status.
What to monitor in production
At minimum:
llm_usage_prompt_tokens_total{model}andllm_usage_completion_tokens_total{model}— cost per model.jims_pipeline_run_duration_seconds_bucket{status="success"}— p50/p95/p99 of the pipeline.jims_pipeline_runs_total{status="failure"}rate — error count.- size of the
thread_eventstable (growth) and average thread length. - Memgraph CPU/RAM (it runs in
IN_MEMORY_ANALYTICALmode by default). - Postgres connections and lock waits.
For more dashboard ideas, see Monitoring & Metrics.