Why We Split the Game in Two — DDD + Data-Driven in a Bevy Project

The problem we were about to have

Imagine you want to add a sawmill. Not a complicated one — just something that eats wood and spits out planks.

Before the refactor, here's roughly what that looked like: a new variant in components.rs for the sawmill's state, a new arm in the state machine in state.rs, a tweak to movement.rs because workers need to know what to do near it, two edits in lib.rs to register the resource and the plugin, and probably a new system file for the production loop because the existing one was hardcoded for the woodcutter. Five files, six edits, and the sawmill itself — the fact that it consumes 2 wood and produces 1 plank in 5 seconds — was scattered across all of them.

The tedious part isn't any single edit. It's the context-switching, and the silent fear that you forgot one of the six places. Multiply that by the dozens of buildings we want to have, and the cost stops being "annoying" and becomes "we'll quietly stop adding buildings." Not by decision. Just by erosion.

So we restructured early, while the codebase was still small enough to move cleanly.

Two patterns, not one

What we ended up with isn't really "DDD" alone. It's two patterns stacked:

DDD — split the simulation rules from the engine that runs them. This is a structural thing: where files live, what depends on what.
Data-driven content — describe new things (buildings, jobs) as data instead of code. This is a content thing: how you add a new building tomorrow without touching any system.

They're independent. You can do DDD without data-driven (each building is still its own hardcoded type). You can do data-driven without DDD (registries living inside the Bevy layer). But together they compound: DDD gives you a place to put the data, and data-driven gives you a reason to want that place.

DDD: split "what the game is" from "how Bevy runs it"

The core insight behind Domain-Driven Design — in any context, not just games — is that your business logic shouldn't be entangled with your infrastructure. In web dev, that means separating database queries from business rules. In a Bevy game, it means separating "what a sawmill does" from "how Bevy's ECS renders and ticks it."

We now have two distinct layers:

packages/domain/ — pure Rust. No Bevy. No ECS. No rendering. Just data structures, rules, and algorithms. A BuildingDef describes a building. A JobDef describes a job. Inventory::transfer() enforces that you can't take what isn't there. This layer doesn't know what a Transform is and never will.

packages/core/ — Bevy infrastructure. This is where systems live, where ECS queries happen, where things get drawn. Its job is to read domain state, call domain logic, and write the results back. It's the adapter between the game world and the simulation rules.

The isolation is enforced by the Rust compiler itself: packages/domain/Cargo.toml has no Bevy dependency. If someone accidentally writes use bevy::* inside a domain file, it's a compile error. Not a code review comment. Not a convention. A hard failure.

In practice, the boundary looks boring. A Bevy system queries the ECS, hands the relevant data to a domain function, and writes the result back. Something like: read Inventory components → call Inventory::transfer() from the domain → write the updated values into ECS. The domain function doesn't know what an entity is. The Bevy system doesn't know what the rules are. They meet at a function signature.

Data-driven: new content is data, not code

DDD on its own doesn't fix the "five files for a sawmill" problem — it just gives the simulation rules a clean home. The actual fix is the second pattern: representing buildings and jobs as data that a generic system reads at runtime, instead of as types that need their own dedicated systems.

Here's what adding a sawmill looks like now:

pub const BUILDING_SAWMILL: BuildingId = 3;

registry.register(BuildingDef {
    id: BUILDING_SAWMILL,
    name: "Sawmill",
    footprint: vec![GridPos::new(0, 0), GridPos::new(1, 0)],
    worker_slots: 2,
    recipe: Some(Recipe {
        inputs: vec![(ResourceType::Wood, 2)],
        output: (ResourceType::Plank, 1),
        duration_secs: 5.0,
    }),
    storage_capacity: 20,
});

That's it. One registry call. A single generic production system handles all buildings that have a recipe. You're not writing a new system. You're not adding a new component. You're describing a thing, and the existing machinery runs it.

Same idea for jobs. The woodcutter used to be hardcoded logic scattered across multiple systems. Now it's expressed as a sequence of phases:

JobDef {
    id: JOB_WOODCUTTER,
    name: "Woodcutter",
    phases: vec![
        JobPhase::GoTo(JobTarget::NearestResource(ResourceType::Wood)),
        JobPhase::Work { duration_secs: 1.2 },
        JobPhase::PickUp { resource: ResourceType::Wood, amount: 1 },
        JobPhase::GoTo(JobTarget::Hub),
        JobPhase::Deliver(JobTarget::Hub),
    ],
}

Adding the miner cost zero new systems. Zero new components. Just a new JobDef with different phases. The job executor doesn't care which job it's running — it reads the phases and executes them. (We know this works because the miner already exists — it was the first job we added after the woodcutter, and it was genuinely a single registry call.)

The "magic" of the executor isn't really magic. It's a match statement, more or less:

match current_phase {
    JobPhase::GoTo(target) => move_toward(target),
    JobPhase::Work { duration_secs } => tick_timer(duration_secs),
    JobPhase::PickUp { resource, amount } => take_from_world(resource, amount),
    JobPhase::Deliver(target) => drop_at(target),
}

That's the whole trick. Every phase variant has one piece of code that knows how to run it. Add a new phase variant, and you write the one branch that handles it — but you do that once, not once per job. New jobs that only combine existing phases cost nothing.

The citizen scheduler, briefly

Citizens aren't just passive workers. Each one has a priority list, RimWorld-style. Given [(Builder, High), (Woodcutter, Normal), (Hauler, Low)], a citizen will build if there's construction to do, cut wood if there isn't, and haul if neither is available.

The scheduler is a pure function in the domain layer. It takes a citizen's priorities and a trait object (WorkAvailability) that answers questions like "are there trees left?" — that trait is implemented by the ECS layer using actual queries. The domain doesn't know how the answer is computed. It just asks.

This is the cleanest thing in the whole architecture. The domain poses a question; the infrastructure answers. Neither knows how the other works. And the scheduler can be unit-tested with a fake WorkAvailability that just returns true or false — no Bevy, no world, no entities.

Where this breaks down

Worth saying out loud, because this is where most "data-driven" articles get evangelical and stop being useful: not everything wants to be data.

The Recipe shape works because every production building does the same thing — consume inputs, wait, emit an output. The day we want a building that fires arrows at raiders, "fields on a struct" stops being enough. We'd need a new shape (a BehaviorDef enum?), or a small escape hatch where a building references a hand-coded behavior by ID. Either way, the generic executor can't handle it alone anymore.

Same with jobs. As long as a job is a sequence of move, work, pick up, deliver, the JobPhase enum covers it. But "build a wall using whatever materials are in the nearest stockpile, and re-plan if the stockpile empties mid-job" doesn't compress into five phase variants. At some point we'll either add a phase that wraps a function pointer, or accept that this particular job lives outside the data-driven path.

We're fine with that. The pattern earns its keep on the 80% of content that's repetitive. The remaining 20% can be one-offs, and the trick is recognizing them as one-offs instead of contorting the schema until everything fits. The mistake would be treating the data-driven shape as a rule rather than a default.

UI is the other obvious gap. A generic system can spawn a generic building, but it can't draw a unique tooltip, play a one-of-a-kind animation, or make a specific building feel different. That's still custom code per building, and that's fine — the architecture isn't trying to solve everything.

What this actually costs

The ceremony is real. A new building isn't free anymore — it's an ID, a registry entry, sometimes a new field on BuildingDef if the existing fields don't cover what we want. You can't just paste logic into whatever system is closest. There's a right place for things now, and finding it takes a beat every time.

Debugging changes shape too. With hardcoded systems, you set a breakpoint where the bug lives. With data-driven, the bug might be in the data (wrong resource ID), in the executor (off-by-one in the phase index), or in the trait object answering a question wrong (WorkAvailability::has_work returning stale state). "Why is this citizen idle?" used to be one stack trace. Now it's three places to check before the question even narrows.

The trait-object indirection in the scheduler is mild but real — dyn WorkAvailability means a virtual call per query. Not measurable on a hundred citizens. Maybe measurable on ten thousand. We'll see.

What we get back, in exchange:

The domain tests in milliseconds. cargo test -p game-domain doesn't spin up a Bevy app.
New content is additive, not editorial. We register things instead of editing existing systems.
The plugin tree stays thin: GamePlugin → WorldPlugin, CitizenPlugin, BuildingPlugin. Adding an entity type means two files and a wire-up, not a treasure hunt across lib.rs.

So: more upfront structure, less friction every time we add a thing. We're at the point where adding a thing is most of what we do, so the math works for us. It might not work for a smaller game with five buildings and no plans for more — that game should probably just write five hardcoded systems and ship.

Where this is going

Right now the domain has woodcutters, stone miners, buildings with recipes, and inventories. The scheduler exists but isn't fully wired to all job types yet. Production chains are defined but not visually represented in the UI.

The interesting question now isn't "can we add another building" — it's "what's the first building that will refuse to fit?" Probably the first defensive structure, or the first one with a behavior that depends on what's around it. When that happens we'll learn whether the schema bends gracefully or whether we end up writing the escape hatch we hand-waved earlier.

That's the part we're actually curious about. The plumbing was the easy half.

The NeToNuH Team