Author Image

CEO & Co-founder of Visivo

What Is GenBI? Generative Analytics and Why It Needs a Semantic Layer

GenBI lets people ask questions in plain language and get real charts back. It only works when a governed semantic layer keeps the AI from guessing.

A natural language question producing a governed chart through a semantic layer

GenBI, short for generative BI, is analytics where you ask a question in plain language and get a real chart or answer back, generated by an AI model rather than hand-built by an analyst. It is the natural-language front door to your data. But GenBI is only trustworthy when a governed semantic layer sits between the AI and the warehouse, because the layer is what stops the model from guessing at what your metrics mean. Plain language is the easy part. Returning the correct number is the hard part, and that is where most GenBI either succeeds or quietly fails.

The broader direction of the industry is toward AI-driven, agentic, and composable analytics, and natural language is becoming a primary way people interact with data. dbt™ Labs and others have made the case that by 2026 a large share of data questions will start as a sentence rather than a SQL query. The opportunity is real. So is the failure mode, and understanding the difference is the whole point of this post.

GenBI: the one-line definition

GenBI is the application of generative AI to business intelligence, so that a person can express an information need in natural language and receive a visualization or a direct answer without writing SQL or building a chart by hand.

"Show me revenue by region for the last four quarters." In a GenBI system, that sentence becomes a query, the query runs against your data, and the result comes back as a trended chart, all without a human touching a query editor. The model does the translation from intent to query to visualization.

What makes this a category rather than a feature is the shift in who builds the analysis. In traditional BI, an analyst translates a business question into SQL and a chart. In GenBI, the model does that translation, and the human stays in natural language the whole way through. When it works, it collapses the distance between having a question and seeing an answer from days to seconds. The promise is genuinely large, which is exactly why the reliability question matters so much.

Why the hype outran the reliability

The first wave of "ask your data anything" demos were dazzling and, in production, often unreliable. The pattern was familiar: a slick demo on a clean, tiny dataset, followed by disappointing results the moment it met a real company's warehouse.

The reason is that the early approach was usually text-to-SQL with no governance in between. You point a large language model at your raw database schema, hand it a question, and ask it to write SQL. On a five-table demo schema, this looks magical. On a real warehouse with three hundred tables, four columns that all sound like "revenue," and a dozen ways to define an "active" customer, it falls apart. The model has no way to know which "revenue" you mean, so it picks one, writes confident SQL, and returns a number that is plausible, well-formatted, and wrong.

That last property is the dangerous one. A GenBI system that returns an obviously broken answer is harmless, because the user distrusts it. A GenBI system that returns a plausible wrong answer is corrosive, because the user believes it and acts on it. The hype outran reliability not because the language understanding was bad, the models are excellent at parsing intent, but because the systems had no governed definition of what the business actually means by its own metrics. The model was guessing, and guessing dressed up in clean SQL is indistinguishable from knowing.

Plain language is easy, correct SQL is hard

It is worth being precise about where the difficulty actually lives, because it is not where people assume.

Modern language models are very good at the linguistic half. They reliably parse "revenue by region last four quarters" into the components of intent: a measure (revenue), a dimension (region), a time grain and range (four quarters). Understanding the sentence is close to solved.

The hard half is mapping that parsed intent onto your specific data correctly. Which physical column is "revenue," and is it gross or net? Does it already exclude refunds, or do you subtract them? What counts as "region," the billing country or the shipping region? Which tables join to produce a customer-level view, and on which keys, so the aggregate does not fan out and double-count? These are not language questions. They are questions about the structure and meaning of one company's data, and the model cannot answer them by reading column names, because column names lie. A column called rev might be the wrong one; the right number might require subtracting another table the model never thought to join.

This is the gap. The model knows what you asked. It does not, on its own, know what your business means, and the cost of that gap is a confidently wrong query. Closing it is not a matter of a better model. It is a matter of giving the model a governed map of your business, which is exactly what a semantic layer is.

The semantic layer is GenBI's guardrail

A semantic layer is the centralized, governed definition of your metrics, dimensions, and the relationships between your data models, expressed in business terms. "Net revenue" is defined once, as an exact expression. "Active customer" has one canonical definition. The joins between orders, customers, and products are declared, with the correct keys, so an aggregate is computed the way your team agreed it should be.

When you put that layer between the AI and the warehouse, the entire problem changes shape. The model no longer translates a question into raw SQL against three hundred ambiguous tables. It translates the question into a request against a small, governed set of named metrics and dimensions. Instead of guessing which column means revenue, it selects the net_revenue metric, whose expression was written and reviewed by humans who know what it means. The semantic layer executes the actual query using the governed definition, so the number that comes back is the number your finance team already agreed on.

The effect on accuracy is not marginal. Research on natural-language querying has found dramatic accuracy improvements when a semantic layer guides the model rather than letting it generate SQL from scratch, with one widely cited study reporting roughly a 72.5 percentage point gain in answer accuracy when the AI queried through a semantic layer instead of writing raw SQL. The intuition behind the number is simple: you replaced "guess what the business means" with "look up what the business defined." The model stays in its zone of competence, understanding the question, and the semantic layer handles the part the model is bad at, knowing the truth.

This is also why the prior posts in this series matter here. A semantic layer is only trustworthy if its definitions are correct and stable, which is what a single source of truth for metrics, a Git review workflow, and testing dashboards like software provide. GenBI is the payoff at the top of a stack that has to be governed all the way down.

What good GenBI looks like in practice

Strip away the demos and a GenBI system you can actually trust has a recognizable shape.

It is grounded in a governed semantic layer, not pointed at raw tables. Every answer traces back to a defined metric, not an improvised SQL expression. It is transparent about what it computed: a good system shows you the metric it selected and the query it ran, so a skeptical user can verify rather than trust blindly. It declines gracefully when a question maps to something undefined, saying "there is no governed definition of that" instead of inventing one. And it produces the same answer the dashboard would, because both read from the same layer, so the number in the AI chat matches the number in the executive report rather than quietly diverging.

That last property is where GenBI and headless BI meet. The AI assistant is just another consumer of the metrics layer, the same way a dashboard or an embedded chart is. Define the metric once, and the chatbot, the dashboard, and the embedded widget all return the identical number, because they are all asking the same governed source. GenBI done right is not a separate magic system. It is one more head on a well-built metrics layer, and the layer is what makes the magic reliable instead of merely impressive.

Where Visivo is headed

Visivo's semantic layer, the metrics, dimensions, and relations you define in code, is precisely the governed context an AI assistant needs to answer questions correctly. Today that layer powers dashboards and interactive Insights from a single set of definitions. The same definitions are exactly what a generative interface would query to stay grounded.

models:
  - name: orders
    sql: SELECT * FROM orders_table
    metrics:
      - name: net_revenue
        expression: "SUM(amount) - SUM(refund_amount)"
        description: "Gross revenue minus refunds, per finance definition"
    dimensions:
      - name: region
        expression: "billing_region"

relations:
  - name: orders_to_customers
    join_type: inner
    condition: "${ref(orders).customer_id} = ${ref(customers).id}"

A question like "net revenue by region" has an unambiguous answer here, because net_revenue is defined, region is defined, and the join is declared. There is nothing for a model to guess. We are building toward a world where that governed foundation is what every interface, including the conversational ones, reads from, so the answer is always the one your team already agreed on. We are not claiming a finished AI feature today; we are saying the groundwork that makes generative analytics trustworthy is the semantic layer, and that is what we build.

If you want to understand the layer GenBI depends on, start with the semantic layer for AI-ready analytics, and get started to define your first governed metric in a few minutes.

Previously in Visivo

This is the start of a short arc on AI and analytics. Previously we covered testing your dashboards like software, the practice that keeps the metric definitions underneath GenBI correct in the first place. A wrong definition produces a wrong AI answer just as surely as a wrong dashboard, which is why the governed, tested foundation comes before the generative front door.

Install command copied