
CTO & Co-founder of Visivo
Relations: Why Joins Belong in Your Data Model, Not Every Query
Re-writing the same join in every query is how grain bugs and double-counted revenue happen. Declaring relations once in the model fixes it.

The single most common source of wrong numbers in analytics is not a typo in an aggregation. It is a join, written by hand, slightly differently, in every query that needs it. Declare the relationship between two tables once in your data model and you eliminate an entire class of bug: wrong grain, fan-out, and double-counted revenue all come from joins encoded ad hoc per query. A semantic layer with real relations resolves the correct join for you, automatically, every time.
I want to walk through exactly how repeated joins go wrong, why declaring relationships once is the fix, and how a semantic layer turns a join from a thing you write into a thing the system knows.
The join you have written a hundred times
If you write SQL for a living, you have typed this join more times than you can count:
SELECT ...
FROM orders o
JOIN customers c ON o.customer_id = c.id
It is muscle memory. And that is the problem. Because it lives in your fingers and not in the model, it gets re-typed in every analysis that touches both tables: the revenue report, the cohort query, the churn dashboard, the ad-hoc question a stakeholder asked on Slack. Each copy is a chance to get it subtly wrong, and "subtly wrong" in a join is the most dangerous kind of wrong, because the query runs, returns a plausible number, and nobody notices until the number is in front of someone who cares.
The join condition itself is usually fine. What varies, and what breaks, is the grain the join produces and how the downstream aggregation handles it. Get the cardinality wrong and you do not get an error. You get a number that is too big, confidently presented, on a dashboard people trust.
How repeated joins cause grain and fan-out bugs
Let me make the failure concrete, because "grain bug" sounds abstract until it has cost you a quarter's reported revenue.
Suppose each order has many line_items. You want total revenue per customer, so you join customers to orders to line_items and sum a revenue column. Innocent enough. But the moment you join orders to line_items, the order row fans out: an order with three line items becomes three rows. If your revenue lives at the order level and you sum it across those fanned-out rows, you have just tripled that order's revenue. The query runs. The total is wrong. This is a fan-out join, and it is the textbook way analytics double-counts money.
The mirror image is just as common. Join at the wrong grain in the other direction, deduplicate too aggressively, or join on a non-unique key, and you under-count, silently dropping rows. Either way, the root cause is identical: the person writing the query has to hold the entire join graph and every table's grain in their head, correctly, every single time, under deadline pressure. Humans do not do that reliably. I do not do that reliably, and I have been doing this for a long time.
The reason these bugs are endemic is structural. When the join is re-derived per query, there is no single place that knows the right way to relate two tables. The knowledge lives in tribal memory and in whoever happened to write each query. Multiply that across a team and across months and you do not have a data model. You have a folklore.
Declaring relationships once
The fix is the same move software engineering made decades ago with shared functions, and the same move the warehouse made with dbt™ models: stop copy-pasting the logic and define it once where everyone reads from it.
A relation is a named, declared statement of how two models connect. You write it a single time, in the model, as code:
relations:
- name: orders_to_customers
condition: ${ref(orders).customer_id} = ${ref(customers).id}
- name: orders_to_line_items
condition: ${ref(orders).id} = ${ref(line_items).order_id}
That is it. The relationship between orders and customers is now a fact about your project, not a string buried in forty queries. When someone needs revenue by customer, they do not write the join. The join already exists, declared once, reviewed once, correct once. Updating it, say the key changes during a migration, is a one-line edit that propagates everywhere, instead of a search-and-replace across every dashboard you can remember to check.
Because the relation is code in your repository, it gets the same treatment as everything else worth trusting: it goes through a pull request, a teammate reviews it, and the change lands in your Git history. This is the BI-as-code principle applied to the join graph itself. The shape of your data becomes a reviewable artifact rather than an oral tradition.
How a semantic layer resolves joins for you
Declaring the relation is half the value. The other half is what the semantic layer does with it: it resolves the correct join per query, automatically, including the grain handling that humans get wrong.
When you ask for a metric from one model split by a dimension from another, the layer walks the declared relations to find the path between them and constructs the join, then aggregates at the right grain so the metric does not fan out. You asked for revenue by customer segment; the layer knows revenue lives on orders, segment lives on customers, and orders_to_customers connects them, so it builds exactly the query that produces the right number. You never wrote a join, so you never wrote the wrong one.
This is the difference between a semantic layer that documents joins and one that enforces them. Documentation is a wiki page nobody reads at 5pm on a Friday. Enforcement is the system refusing to let you express the question incorrectly in the first place. The best semantic layers do the latter: they make the correct join the only join you can get, which is the same reason a single definition of a metric is worth so much more than a documented one.
Safer metrics across multiple tables
Relations are what make cross-table metrics safe, and cross-table metrics are most of the interesting ones. Revenue per customer, items per order, sessions per signup: every one of these spans tables, and every one of them is a fan-out bug waiting to happen if you hand-roll the join.
With relations declared, a metric can compose across models and trust the layer to assemble it correctly:
models:
- name: orders
sql: "SELECT * FROM orders"
metrics:
- name: net_revenue
expression: "SUM(amount - refunds - discounts)"
- name: customers
sql: "SELECT * FROM customers"
metrics:
- name: active_count
expression: "COUNT(DISTINCT id)"
insights:
- name: revenue-per-customer-by-segment
props:
type: bar
x: ?{ ${ref(customers).segment} }
y: ?{ ${ref(orders).net_revenue} }
interactions:
- split: ?{ ${ref(customers).segment} }
net_revenue lives on orders, the segment lives on customers, and the orders_to_customers relation lets the Insight pull from both without anyone writing the join or worrying about the grain. Centralizing the relationship means correctness is inherited, not re-earned per query. And when the data model changes, you fix the relation once and every metric built on it is correct on the next run.
Relations in Visivo
Relations shipped in Visivo as part of the semantic-layer release we recapped last week, and they work exactly as described above. You declare a relation with a condition that references the two models, and from then on Insights resolve the join for you whenever a metric and a dimension live on different models. You write the relationship once; the layer writes the SQL.
The practical effect on a team is hard to overstate. The grain bugs that used to surface in a board meeting stop happening, because the path that produced them no longer exists. New analysts become productive faster, because they do not need to memorize the join graph before they can ask a cross-table question. And every relation, like every metric, is text in Git: reviewed, tested, and versioned alongside the rest of your project.
You can see relations composing metrics across models in the examples gallery, and /get-started walks you from install to your first relation in a few minutes. The full reference for models, metrics, dimensions, and relations lives in the Visivo documentation.
The next time a revenue number looks too high, before you debug the aggregation, check the join. Then move it into the model so you never have to check it again.
Previously in Visivo
This is a deep dive on a feature from the release we covered in Visivo v1.0.76-78: the semantic layer lands. For the broader case that a metric's definition belongs in one shared place, read why your metrics belong in a semantic layer.