Stack

Moonage runs on Cloudflare end to end.

Layer	Choice
Edge + compute	Cloudflare Workers
Stateful coordination	Durable Objects
Relational storage	D1 (per-workspace, regionally pinned for enterprise)
Object storage	R2
Memory + retrieval	Workers + R2 + SuperMemory
Auth + sessions	Workers + KV
Site, handbook, docs	Next.js 16 + Tailwind v4 on OpenNext for Cloudflare
Sandboxed execution	Cloudflare Sandbox SDK

There is no legacy provider behind this stack. New services land directly on Workers.

Why Cloudflare

Three reasons, in priority order.

One control plane, one trust boundary. Compute, storage, and network policy share a single auth and audit fabric. Less seams means less drift between intent and reality.
Regional residency without a second cloud. Enterprise customers pin a dedicated D1 to their region. Subprocessors honor the same boundary. We do not need a multi-cloud strategy to deliver multi-region.
Edge-first economics. Cold starts are measured in single-digit milliseconds. We stop paying for warming, and the latency budget for an agent run shrinks accordingly.

These three principles — one runtime, regional residency, edge-first economics — are why the whole stack runs on Cloudflare.

Engineering principles

Ship to production first, talk about it second. No long-running feature branches. Trunk-based, behind flags where needed.
Tests are the brief. A change without a test is a change we don't trust. The test is how we describe the intended behavior.
Observability before optimization. Don't tune what you can't see.
The code review is part of the work. A thorough review is faster than a fast revert.
Cost is a constraint, not an afterthought. Cloudflare bills are visible to engineering, not buried in finance.
Boring on purpose. We adopt new tools when they earn it. We do not chase frameworks for fun.

How a change reaches production

One shape, with two named exceptions.

Open an issue with a brief on the change — what it does, what it touches, how it gets verified.
Land a PR with peer review and tests passing. Branch protection enforces this in GitHub.
Deploy via the standard pipeline. Manual deploys outside the pipeline are reviewable but visible.

Exceptions:

Incident mitigation. A fix may bypass review to stop the bleeding. The follow-up postmortem must add the missing tests.
Documentation. Doc-only changes can ship without a secondary reviewer.

Incidents

A small, opinionated definition.

An incident is anything that requires unplanned human intervention to keep the system honest.
We mitigate first, prevent recurrence second.
Every incident gets a postmortem, regardless of severity. The cost of a short doc is lower than the cost of forgetting why we made the fix.

The on-call engineer leads the response. Synchronous coordination happens in #incidents. The postmortem is written async within five business days, shared internally, and linked from the changelog when the fix lands.

Engineer learning ladder

How an engineer at Moonage builds operational fluency.

First incident. Own the response under the on-call's wing. Write the postmortem.
First twenty postmortems read. Annotate what the fix could have been if caught earlier.
Monthly PR feedback as a primary reviewer. Build the review muscle.
Quarterly authoring of a design doc. Land a non-trivial system change end-to-end.
Cross-team review. Review another team's design docs and incident write-ups.

The point is not the ladder. The point is to make sure nobody is stranded in their first incident.

Annual review

This page is reviewed every six months by the engineering lead. Last review: 2026-05.

How we build