Data Model: Overview
The data model in Vedana is the contract between your domain and the assistant. It describes which entities exist, what properties they have, and how they are connected. Without it the assistant only sees text; with it, it sees the structure of knowledge.
The data model is not the data, it’s the schema. The data itself (products, documents, contracts, branches) lives in the Grist Data doc and is loaded into the graph through ETL. The data model lives in the Grist Data Model doc and describes how that data is interpreted.
The seven tables
| Table | Required? | Purpose |
|---|---|---|
| Anchors | yes | graph node types (domain entities) |
| Anchor_attributes | yes | properties of nodes |
| Links | yes | edge types (relationships) |
| Link_attributes | only if edges have attributes | properties of edges |
| Queries | no | the playbook — scenarios for typical questions |
| Prompts | no | overrides for prompt templates |
| ConversationLifecycle | no | responses to lifecycle events (/start, etc.) |
The minimum working data model is Anchors + Anchor_attributes + Links. Everything else extends behaviour but isn’t required for the system to start.
How the model gets into the LLM
For each request:
- ETL has already synced Grist → the
dm_*tables in Postgres. DataModelreads the tables and builds in-memoryAnchor,Link,Attribute,Query,Promptobjects.- (Optional) The data model filtering step (
RagPipeline.filter_data_model) picks only the elements relevant to the current question. DataModel.to_text_descr(...)renders the chosen elements into text using thedm_*_descr_templatetemplates (or their overrides from thePromptstable).- That text is inserted into the system prompt for the main agent.
Rendering templates
By default the data model is rendered into five sections. Section headers in the built-in template (vedana_core.data_model.dm_descr_template) are in Russian:
## Узлы:— anchors;## Атрибуты узлов:— anchor attributes;## Связи между узлами:— links;## Атрибуты связей:— link attributes;## Типичные вопросы:— typical questions (playbook).
You can override these by setting dm_descr_template in the Prompts table — e.g. to translate them to English:
## Nodes:
- Product: A sellable product...; example ID: product_id: "p-001"; query: MATCH (p:product) WHERE ...
## Node attributes:
- Product.price: Price in EUR; example: 999.0; query: ...
## Links:
- PRODUCT_belongs_to_CATEGORY: a product belongs to a category; example query: MATCH ...
## Link attributes:
- ...
## Typical questions:
- Who likes <interest>?
1) ...
2) ...
Templates (with names from Prompts):
| Template | What it renders |
|---|---|
dm_descr_template | the wrapper for all sections |
dm_anchor_descr_template | a row about an anchor |
dm_attr_descr_template | a row about an anchor attribute |
dm_link_descr_template | a row about a link |
dm_link_attr_descr_template | a row about a link attribute |
dm_query_descr_template | a row from the playbook |
You can override any of these in the Prompts table to change the format the LLM sees the model in.
Principles of a good data model
- Describe only what’s needed. Extra anchors/attributes inflate the context and reduce filtering accuracy.
- Anchor names: singular, Latin script.
product, notproducts, notproduct_catalog. - Clear
descriptions. This is the primary signal to the LLM telling it when to apply an anchor / link. “Represents a product” — bad. “A sellable product in the catalog with a price, availability status, and category” — good. - Fill in
queryfor anchor / attribute / link. Without that, the assistant can’t reliably retrieve the data. - Embeddable — only for text. Identifiers and numbers should be regular attributes.
- Tune
embed_thresholdthrough evaluation. Too low — irrelevant results, too high — misses. Start at0.7. - Write Queries (the playbook). That’s the cheapest way to make behaviour predictable.