Troubleshooting
A collection of common problems and how to fix them.
Startup
docker compose up — db-migrate fails on pgvector
Symptom: the Alembic migration 2dfad73e5cce_move_emb_to_pgvector fails with extension "vector" is not available.
Cause: the Postgres provider doesn’t allow auto-creating extensions.
Fix:
- Self-hosted Postgres → make sure the image is
pgvector/pgvector:pg15(or another with pgvector pre-installed). SetCREATE_PGVECTOR_EXTENSION=true. - Managed (Supabase / Neon / RDS) → create the extension manually via the provider console, set
CREATE_PGVECTOR_EXTENSION=false.
Memgraph: Authentication failed
Symptom: at startup the app logs say it can’t log into Memgraph.
Cause: the password in .env doesn’t match what’s stored in Memgraph.
Fix: verify MEMGRAPH_USER, MEMGRAPH_PWD in .env — they must match the values used on the first Memgraph start (Memgraph stores auth across restarts).
If you’ve recreated Memgraph with a different password — drop the mg_graph volume (docker compose down -v); it’ll be re-created with the new password.
Grist doesn’t open on 8484
Symptom: http://localhost:8484 is empty or complains.
Fix:
- Check
docker compose ps— isgristhealthy? - In the logs (
docker compose logs grist) check for errors. Often it’s volume permissions:/persistmust be writable. - Restart the service:
docker compose restart grist.
ETL
ETL ran but Memgraph is empty
Things to check:
- Backoffice → ETL logs: which steps succeeded, which failed?
- In Postgres check
nodesandedges:
If empty — data didn’t reach from Grist (error inSELECT COUNT(*) FROM nodes; SELECT COUNT(*) FROM edges;get_grist_data/prepare_*). - If Postgres has data but Memgraph doesn’t — the issue is in the
pass_df_to_memgraphstep. Check Memgraph logs; thebolt://URI may be wrong.
ETL works but embeddings aren’t built
Symptom: rows don’t appear in rag_anchor_embeddings.
Causes:
- No attribute in the data model has
embeddable=true. - The LLM provider doesn’t respond (check
OPENAI_API_KEY). - Provider rate limit exceeded (LiteLLM logs this).
- The attribute is embeddable but values in the data are empty / NULL.
Data model filtering failed in the logs
Symptom: the logs show Data model filtering failed: .... Chat answers become slower and worse.
Cause: the data model filtering step crashed; the pipeline fell back to the full model.
Things to check:
- the
FILTER_MODELis reachable and didn’t error out; - the data model isn’t too big for structured output (if there are too many descriptions, the LLM may fail to comply);
- check
dm_filter_prompt/dm_filter_user_prompt— if you overrode them, there may be a syntax error.
Workaround: set ENABLE_DM_FILTERING=false. That makes requests more expensive but more stable.
Assistant answers
The assistant gives a vague / generic answer
Checklist:
- Open Details for the answer in the backoffice. Which tool calls did it make?
- If only
vector_text_search— the playbook is the issue: it should use cypher for a structural question. CheckQueries. - If cypher ran but returned an empty result — verify there’s data in the graph (Memgraph Lab).
- If cypher didn’t return the expected attribute — add/refine the
queryfield inAnchor_attributes.
The assistant uses the wrong anchor
Cause: the anchor’s description is ambiguous or incomplete; the LLM doesn’t know when to use it.
Fix: improve description — add specific scenarios “use this anchor when …”.
The assistant invents an attribute / link
Cause: the system prompt isn’t strict enough about “don’t go beyond the data model”.
Fix: override generate_answer_with_tools_tmplt and add: “If the needed attribute/link isn’t in graph_composition, don’t make it up — say so”.
The assistant gives the right answer but in the wrong format
Cause: the format isn’t pinned in the playbook or the system prompt.
Fix: in the Queries for the corresponding intent, specify a step “Format the answer as: …” — a concrete template.
Vector search
Vector search never returns results
Checklist:
- Is the attribute
embeddable=true? - Did ETL run
generate_embeddingswithout errors? - Are there rows for that attribute in
rag_anchor_embeddings? - embed_threshold isn’t cranked to 0.95+ (try 0.5 to rule out the threshold).
- Embedding dimensions match (
EMBEDDINGS_DIMin .env vs the model’s actual dimensionality).
Vector search returns junk
Causes:
- threshold too low → raise to 0.7+;
- an attribute is embeddable that shouldn’t be (e.g. a numeric code) → drop embeddable;
- embeddings were built on the previous model after
EMBEDDINGS_MODELwas changed → rebuild embeddings.
Cypher
The LLM generates broken Cypher
Symptom: in the logs Error executing tool cypher: ....
Causes:
- Cypher templates in the
queryfields of anchors/links are wrong → the LLM learned from them. - Label names don’t match what’s in Memgraph (e.g.
Productvsproduct). - The LLM gets confused by named edge labels — verify the link
sentencematches what’s in Memgraph.
What to do:
- Open Memgraph Lab → run the query manually to see the syntax error.
- Fix the
queryin the data model so the LLM has the right template. - If the LLM keeps producing the same wrong pattern — add a Queries scenario with an explicit instruction.
Cypher works in Memgraph Lab but fails in chat
Probable cause: the LLM puts single quotes inside a string Cypher doesn’t accept (or wrong escaping).
Fix: in the playbook, specify that parameters are passed via $param, not interpolated into the string.
Performance
Requests became very slow
Checklist:
jims_pipeline_run_duration_secondsmetrics — where does it grow?- If it’s LLM latency — look at traces. Maybe additional tool-call iterations crept in.
- If it’s Memgraph latency — add indexes (
CREATE INDEX ON :Product(price)for frequently-filtered fields). - If it’s pgvector latency — add an HNSW index on
rag_anchor_embeddings.embeddingmanually (CREATE INDEX ... USING hnsw (embedding vector_cosine_ops)). Vedana’s Alembic migrations do not create this index automatically — its optimal parameters depend on your row count, dimensionality, and recall/latency target, so it’s left as an explicit operator decision.
Very expensive requests (lots of tokens)
Causes and fixes:
- Data model filtering is disabled.
ENABLE_DM_FILTERING=trueis the default and is the right setting in production — keep it on. Only turn it off (false) if you’re debugging the agent’s behaviour with the full data model in context, or if your data model is so small that the filtering step costs more than it saves. If you’ve explicitly set it tofalse, set it back totrue. (Same recommendation appears in Cost Management — the two pages agree: keep it on by default.) - Large data model in context → override rendering templates to drop
queryfields. pipeline_history_lengthtoo large → drop to 10–15.- Many retry iterations → improve the playbook so the agent picks the right path immediately.
Sentry / Logs
Sentry isn’t getting errors
Causes:
SENTRY_DSNis empty or wrong.- The
--enable-sentryCLI flag wasn’t passed. - Errors are swallowed inside the pipeline (
RagPipelinecatches exceptions and writes vialogger.exception, but Sentry sees them via the handler).
Check: temporarily add raise Exception("test") to the pipeline and verify it reaches Sentry.
Prometheus metrics are empty
Causes:
--metrics-portisn’t exposed.- Prometheus scrape config isn’t set up.
- The service hasn’t received any requests yet (metrics appear after the first call).