
CTO & Co-founder of Visivo
The Rise of In-Process Analytics: Why DuckDB Is Eating Local BI
In-process analytics engines like DuckDB are replacing heavyweight clusters for the workloads most teams actually have. Here is why, and what it means for local dashboards.

In-process analytics means the database runs inside your application's own process, with no separate server, no network round-trip, and no auth handshake between you and your data. DuckDB is the engine making this mainstream, and it is eating local BI for a simple reason: hardware has outpaced dataset growth for most teams, so the cluster you were told you needed is now overkill for the few million rows you actually query. This post is about why that shift is happening now and what it means for how you build local dashboards.
We have written a hands-on, performance-focused piece on building DuckDB dashboards in Visivo before. This one is the why-now: the market trend behind the engine, and why an embedded OLAP database is quietly becoming the default substrate for local analytics.
What in-process analytics actually means
For most of the last twenty years, "analytics database" meant a server. You provisioned a cluster, it ran somewhere over the network, your tool authenticated to it, and queries traveled there and back. That architecture exists for a real reason: when your data genuinely does not fit on one machine, you need many machines coordinating.
In-process analytics flips that default. The database is a library you import, and it runs inside the same process as your application or your dashboard tool. DuckDB is the canonical example: it is an embedded OLAP engine, the analytics-shaped counterpart to what SQLite is for transactional workloads. There is no server to stand up, no port to open, no network hop, and no credentials to manage between your code and the engine. You hand it SQL, it runs against columnar data in memory or on local disk, and it hands back results.
The phrase that captures it is "the database is where your code is." That sounds like a minor architectural detail. In practice it changes the economics and the developer experience of an entire class of workloads.
Why clusters are overkill for most workloads
The uncomfortable truth the cluster era papered over: most analytical workloads are not big. They are medium, and medium fits on a laptop now.
Hardware kept improving while the datasets most teams actually query did not grow nearly as fast. A modern machine has tens of gigabytes of RAM and many cores. A columnar engine that vectorizes its execution and only reads the columns a query touches can chew through tens or hundreds of millions of rows on that hardware in well under a second. The dashboard that drives a weekly business review is very often built on a few million rows of aggregated data. You do not need a distributed cluster for that. You need one fast core and a good query engine, and you already have both.
This is where the cost argument becomes hard to ignore. An always-on cluster bills you whether or not anyone is running a query, and it carries an operational tax: someone has to size it, secure it, patch it, and explain the bill. There is a carbon dimension too. Keeping a fleet of machines warm to serve workloads that a single node could handle is wasteful in both dollars and energy. When the marginal query could have run locally for free, a standing cluster is overkill, and increasingly teams are noticing.
None of this means clusters are obsolete. Genuinely large data, high-concurrency serving, and shared production warehouses are real and important. The point is narrower and it is the one that matters: for the local, iterative, single-developer or single-team workloads that make up the bulk of day-to-day analytics, the cluster was always more than the job required.
The developer-experience advantage
Cost is the argument that gets executives nodding. Developer experience is the argument that actually changes behavior, and in-process analytics wins on it decisively.
Iteration is instant. When the engine is in your process and the data is on local disk, there is no network latency between a query and its result. You change a definition, re-run, and see the answer immediately. That tight loop is the difference between exploring your data and waiting on it. Fast feedback is the single biggest multiplier on analytical productivity, and removing the network is the cleanest way to get it.
Setup is nothing. There is no server to provision before you can ask your first question. You install a library, point it at a file, and query. For onboarding a teammate or spinning up a clean environment, "nothing to set up" is a feature that compounds every single time.
Deployment is simple. An in-process engine has no separate service to deploy, monitor, or keep available. The database ships with the application. That collapses an entire tier out of your architecture, which means fewer things to break and fewer things to reason about when something does.
It reads your files where they are. DuckDB queries CSV, Parquet, and other columnar files directly, often without an explicit load step. Your data does not have to migrate into a proprietary store before you can work with it. It stays in open formats you can version, share, and inspect with other tools.
The cumulative effect is that analytics starts to feel like the rest of modern development: fast, local, reproducible, and yours. That is a profound change from the "submit a query, wait, hope the cluster is healthy" rhythm of the server era.
Where DuckDB fits in a modern stack
The honest framing is that DuckDB does not replace your warehouse. It complements it, and it owns a layer the warehouse was always clumsy at.
DuckDB is production-ready and the ecosystem around it has grown fast. It pairs naturally with Parquet as a columnar storage format, with dataframe libraries like Polars for in-memory transformation, and with emerging table-format work like DuckLake for managing larger local and lakehouse-style datasets. That ecosystem momentum matters, because it means choosing an in-process engine is no longer a bet on an experiment. It is adopting a tool with a healthy, expanding orbit of compatible pieces.
In a real stack, the division of labor tends to look like this. Your warehouse or lakehouse remains the system of record and the place heavy, shared, production transformations run. DuckDB becomes the engine for everything local and iterative: the analyst exploring a Parquet extract, the dashboard built on a curated slice of data, the CI job that needs to run a query without provisioning infrastructure, the prototype that does not justify a cluster yet. The two coexist. The trend is simply that more of the work that used to default to the cluster now sensibly lives in-process, and a lot of the modern data stack is being re-thought around that.
Building local dashboards on DuckDB with Visivo
This is exactly the workload Visivo is built for, and DuckDB is a first-class source in it. You point a Visivo source at a DuckDB database or a set of Parquet files, define your metrics and dimensions in the semantic layer, and build charts as Insights on top:
sources:
- name: local-duckdb
type: duckdb
database: analytics.duckdb
models:
- name: events
source: ${ref(local-duckdb)}
sql: "SELECT * FROM events"
metrics:
- name: weekly_active
expression: "COUNT(DISTINCT user_id)"
dimensions:
- name: event_week
expression: "DATE_TRUNC('week', occurred_at)"
insights:
- name: weekly-active-users
props:
type: line
x: ?{ ${ref(events).event_week} }
y: ?{ ${ref(events).weekly_active} }
Run visivo serve and the whole thing executes locally: DuckDB runs in-process against your data, Visivo compiles your Insight, and the viewer renders it with hot reload. There is no warehouse to connect to and no cluster to wait on, so the iteration loop is as fast as the in-process model promises. Because the definition is a file, it is reviewable, testable, and reproducible like any other code. That is the in-process advantage and the BI-as-code advantage working together: a fast local engine under a governed, version-controlled definition.
If you want to try it, /get-started gets you from install to a running dashboard in minutes, the examples gallery has live projects to run locally, and the DuckDB source is documented in full at docs.visivo.io.
Previously in Visivo
Last week's perspective, BI-as-code vs GUI BI in 2026, made the case for a code-first, hybrid analytics practice. In-process analytics is the engine layer that makes the local, iterative half of that practice fast. For the hands-on performance walkthrough, see building DuckDB dashboards in Visivo.