Tuning Embeddings & Thresholds
The quality of semantic search in Vedana depends on three things:
- which embeddings model you use;
- what you make embeddable;
- what threshold is set on each embeddable field.
This guide is about practical tuning of those parameters.
Embeddings model
Set globally via EMBEDDINGS_MODEL (default text-embedding-3-large). Dimension — EMBEDDINGS_DIM (default 1024).
Important:
- Don’t change the dimension on the fly. It requires a SQL migration and recomputing all embeddings.
- If budget matters,
text-embedding-3-smallis faster and cheaper but quality is lower. - Local models (via VertexAI or OpenRouter) can give wildly different quality — after a model change, always run the golden dataset.
What to make embeddable
| Field type | embeddable | Why |
|---|---|---|
| Entity names | ✅ | Users often write with typos / abbreviations |
| Descriptions, titles | ✅ | Meaning-based search is the whole point of vector search |
| Addresses | ✅ | “store on Nevsky” instead of an exact address |
| FAQ questions | ✅ | The main field for matching |
Document chunks (content) | ✅ | The whole point of document RAG |
| Numeric values | ❌ | An embedding for “999.00” is meaningless |
| ID, SKU, article numbers | ❌ | Need exact match |
| boolean | ❌ | Two values aren’t enough for vector space |
| dates | ❌ | Semantically meaningless — you need filters/comparisons |
| enum (category, status) | usually ❌ | Better as a link / strictly typed attribute |
| URL, file path | ❌ | No semantics |
Picking thresholds
Threshold is the cosine similarity above which a vector search result is considered relevant. Set per attribute in Anchor_attributes / Link_attributes.
Starting values
| Scenario | Start |
|---|---|
| Names / exact identifiers | 0.75–0.85 |
| Descriptions | 0.65–0.75 |
| Document chunks | 0.50–0.65 |
| FAQ | 0.70–0.78 |
Iterative tuning
- Run the golden dataset at current thresholds. Record the Hit Rate and the failure list.
- Failure analysis:
- False positive (returned an irrelevant result) → threshold is too low.
- False negative (didn’t find a valid result) → threshold is too high.
- Change one field at a time by ±0.05.
- Re-run the eval. Compare Hit Rate.
- Repeat until the metric plateaus.
Special cases
- One very common term appears in multiple anchors → raise the threshold for that field, otherwise you get permanent conflicts.
- Long texts vs short queries → lower the threshold: the embedding of a long text is “diffuse”.
- Short texts vs long queries → raise the threshold: too easy to land on a coincidental match.
Cosine similarity vs cosine distance
In Vedana we operate in similarity (1 - distance). Closer to 1.0 means a tighter match. In pgvector this is computed via the <=> operator (cosine distance), and the inversion happens on the SQLAlchemy side.
In the code (PGVectorStore.vector_search):
similarity = (1 - rag_anchor_embeddings.c.embedding.cosine_distance(embedding)).label("similarity")
stmt = (
...
.where(similarity > threshold)
.order_by(distance)
.limit(top_n)
)
top_n limits
top_n (default 5) is how many results to return per tool call. Too many — bloats LLM context and pays for extra tokens. Too few — risks missing the right one.
Start: top_n=5. For document questions with very specific queries — sometimes top_n=3. For broad exploratory questions — top_n=10.
Tune at the RagPipeline(top_n=...) level or inside the tool call (if you’ve rewritten the agent).
Budget
| Action | Cost (order of magnitude) |
|---|---|
| Embedding one chunk | ~0.0001 USD on text-embedding-3-large |
| Vector search | free (only Postgres compute) |
| Reprocessing the full dataset | total_chunks × 0.0001 — usually pennies for thousands of documents |
Changing EMBEDDINGS_MODEL | recompute all embeddings from scratch |
Changing EMBEDDINGS_DIM | recompute + SQL migration (downtime!) |
The llm_calls_total{model} and llm_usage_prompt_tokens_total{model} metrics will tell you how many embeddings you actually run.
Tuning checklist
- Golden dataset is collected and updated (at least 50 questions).
- Eval was run before changes — you have a baseline.
- One parameter at a time.
- Eval after every change.
- Track historical Hit Rate to see the trend.
- Don’t confuse threshold with embeddable:
embeddable=falsedoesn’t use a threshold at all.
What’s next
- Evaluation — how to measure.
- Customizing Prompts — improve answers beyond retrieval.