How is this different from hiring full-time engineers?

Full-time engineers are usually the right answer eventually. We're the bridge. You hire us when you need someone shipping by Monday and a job posting won't close for months. When you've hired the right team, we hand off everything. The code is already yours, the infra is already in your cloud, the runbook is already written. Working ourselves out of the job is the goal, and we don't mind when it happens.

What if our codebase is a mess?

Most are. The founders' decade-each in production engineering has touched most stack vintages still in use: Rails monoliths, 4-year-old Next.js apps with three router migrations, greenfield TypeScript. We adapt to your conventions, your CI, your branching model. We won't try to rewrite your stack to use our preferred one. That's a tell of an agency that's actually selling templates.

Do you really have no contracts longer than a month?

Real answer: our MSA is signed once, and engagements run on monthly purchase orders. You can pause or end at any time on 30 days' notice. That's not a fine-print clause, it's the operating model. We'd rather stay because we're earning it than because you signed a year of it.

What does 'senior engineer' actually mean here?

Founder-built. Senior operators with a decade in production engineering across AI, infrastructure, and platform work. There is no junior bench to hide behind. You get the people who'd actually be writing the architecture doc anywhere else. We don't post bios on the site because LinkedIn isn't the right hiring surface; we'd rather you meet whoever is staffed on the first call and decide from there.

How fast can you actually start?

Sprint engagements: typically 7 days from signature. Pod engagements: 14 days. We won't lie about availability. Founder-led means slots are real and limited. We'd rather tell you 'we can't take you for six weeks' than start late and miss the timeline.

We've been burned by agencies before. Why is this different?

Mostly because we don't try to be everything. Six services. Senior engineers only. Code in your repo. Monthly contracts. Weekly demos. Founder-led. No PMs in the loop, no offshore handoff, no proprietary platform. If you've been burned, you know what bit you. We've tried to make ourselves the opposite of that.

What does week one actually look like?

Day 1: kickoff call, Slack channel created, repo access exchanged, problem statement written and pinned. Day 2: scoping doc with the smallest shippable thing identified. Day 3–4: working prototype in a sandbox. Day 5: Loom walkthrough, demo on your calendar for next Friday. Week one is choreographed. The improvising starts in week two.

Do you sign NDAs? BAAs? SOC 2 vendor questionnaires?

Yes to all three. Vendor security questionnaires are turned around quickly because we are deliberately small and there's no committee to route around. BAAs are ready for legal review on day one for healthcare engagements. The goal is to be the easiest vendor your procurement team deals with this quarter.

May 1, 20269 min readarchitecture · infrastructure · production

Boring on purpose: the stack that survives a year in production

Every framework you adopt is migration risk you accept on day one. The cheapest production system is the one made of components that have been in production for years. Boring is a feature.

The most expensive part of an AI system is rarely the AI. It is the framework you adopted in week one that turned into a migration in month nine.

We have walked into enough engagements with this exact shape to recognise the pattern on day one. A team builds an agent on the orchestration library that was hot when they started. The library makes the demo easier. The team ships. Then the library has a major version bump that changes its core abstractions. Then it gets acquired and the abstractions change again. Then a competing library gets popular and the team's hiring pool shifts. Then a contractor who is good but only knows the new library wants to be hired. Now the team is six months into a rewrite that produces the same product they already had.

This is the framework churn tax, and many AI teams in 2026 are paying it before they have a name for it. The way out is a decision the team has to make early. Pick boring on purpose.

                  Adopt now              Wait or avoid
   ──────────────────────────────────────────────────────────
   Proven runtime   Postgres              New platform layer
   Clear exit       Redis, SQLite         Proprietary agent OS
   One job          pgvector, Inngest     All-in-one AI suite
   Owned by you     Thin wrappers         Deep framework lock-in

Boring is not a preference for old tools. It is a preference for components with a clear job, a credible exit, and a low handoff cost.

What boring actually means

Boring does not mean old. It does not mean stagnant. It means proven, stable, and unlikely to require a migration in the next two years.

A few markers we use for boring:

The component has been in production at well-known engineering shops for at least three years. Not as a side project. As a load-bearing part of the system. Postgres is boring. SQLite is boring. Redis is boring. Most things that have a real birthday party event when they hit a major release are not yet boring.

The component does one thing. Components that try to be a platform tend to fight you when your needs do not match the platform's shape. The thing that lets you assemble many small components yourself does not have an opinion on your shape, which means it does not get in the way later.

The component has a credible exit. If the team that maintains it stops, the project does not die. There is community ownership, multiple competing implementations, or a clear migration path. Postgres meets this. Many newer specialised data stores do not clear this bar yet.

The community has lived through at least one major API change. Libraries in their first or second year tend to break things across versions. By year three, the project either has matured into a stable API or has been displaced by something else. Trust the libraries that have already done their breaking changes.

If a component meets all four of those, it is probably boring enough. Most components teams adopt early on AI projects meet none of them, because the boring options are not what the conference talks are about.

Postgres is the boring answer most of the time

The architecture decision we most often end up revisiting on engagements we are brought into is the data layer. Teams reach for specialised infrastructure too early. Postgres, with extensions, covers more of what an AI system needs than people think.

Vector search? pgvector. It is not the fastest pure vector database on the market. It is fast enough for many production workloads under ten million vectors, depending on dimensionality, filters, concurrency, and latency targets. It runs in your existing Postgres cluster, with your existing backups, your existing replication, your existing IAM. The operational simplicity of "it is just another column on the table" is worth more than the recall benchmark difference at the scale most systems actually operate at.

Full-text search? Postgres full-text. Solid for most use cases. If you need real BM25 with proper tuning, fine, layer Elasticsearch or Typesense in. Most teams reach for the specialised tool before they have proven the boring one is insufficient.

JSON storage? Postgres jsonb. Indexed, queryable, ACID. We have shipped agent memory layers entirely on jsonb fields with appropriate indexes, without introducing a separate document store.

Queueing? Postgres-based queues are not the fastest but they are transactional with the rest of your data, which removes whole classes of cross-system consistency bugs that come with separate queue infrastructure. For most agent workloads, the simplicity is worth more than the throughput.

Caching? Use Redis if you need it. Postgres is not the right tool. But check whether you need Redis at all. Often a small in-memory cache plus Postgres is enough.

The rule we use: start everything in Postgres. Move a workload out of Postgres when you have measured that Postgres is the bottleneck. Most workloads never reach that point.

The orchestration question

This is the most contested decision on most projects. Do you use an agent framework or roll your own orchestration?

There is no universally right answer, but there are usefully right answers depending on what you are building.

Use a framework if: you are building a system with many possible execution paths that you cannot enumerate in advance. Complex multi-agent setups, dynamic tool selection, anything where the agent's behaviour is genuinely open-ended. The framework's abstractions earn their cost here because writing the orchestration yourself would be a quarter of work.

Roll your own if: your system has a finite number of paths, the paths are knowable in advance, and the value is in the quality of each path rather than the variety of paths. Most customer-facing agents are in this category. The "agent" is really a state machine with five to ten possible flows, and modeling it as a state machine with explicit transitions makes it debuggable, testable, and inspectable. Modeling it as an open-ended agent framework with autonomous reasoning makes it none of those things.

For most production agents we build, the answer is some hand-rolled orchestration around a typed function-calling loop. A few hundred lines of code. We can read it. We can step through it. We can predict its behaviour. When something goes wrong in production, we can look at the trace and explain exactly what happened. Frameworks can make the trace inscrutable in exchange for letting you write less code, which is the wrong trade for systems that need to be debugged at 2am.

If you do use a framework, treat the framework as a temporary substrate. Write your code so that swapping the framework would be a refactor, not a rewrite. The agent business logic should live in your code, not in framework-specific abstractions. We default to thin wrappers, not deep integration.

Vendor lock-in is the bill nobody pays attention to

Every AI tool you adopt has a lock-in cost. Some are explicit and easy to see (proprietary data formats, proprietary APIs). Some are implicit and easy to ignore (the cost of training your team on the tool's mental model, the cost of building your evals against the tool's specific behaviour, the cost of writing your prompts against the tool's quirks).

The boring rule helps here too. Components with a credible exit have low lock-in cost. Components that are the centerpiece of a vendor's pitch have high lock-in cost because the vendor is incentivised to make leaving expensive.

The specific pattern to watch for: vendors that want to be the "operating system" for your AI workloads. They will manage your prompts, your tools, your observability, your evals, your deployment, your routing, your fallbacks. Each integration sounds reasonable. Each one is a strand of rope. By the time you want to leave, leaving is six months of work, and the vendor knows it.

Use vendors that do one thing well. Helicone or Langfuse for observability. Vanta or Drata for compliance plumbing. Model providers for models. Avoid anything pitching itself as a single platform for everything. The platform you can leave in a week is a platform that has to keep earning its place. The platform you cannot leave is one you have already lost the negotiation with.

The stack we tend to default to

For an AI agent in production, the defaults we reach for, in May 2026:

Postgres for everything that is not specifically not a Postgres problem. With pgvector for retrieval.

A typed function-calling loop in TypeScript or Python, hand-rolled, around the major model providers (Anthropic, OpenAI, Google, Meta). Provider SDKs wrapped in a thin abstraction that lets us swap providers without changing call sites.

Helicone or Langfuse for model observability. OpenTelemetry plus your existing APM for everything else.

Inngest or Trigger.dev for background jobs and durable workflows. Or just Postgres queues if the workload is simple.

Vercel or Cloudflare or your existing cloud for hosting. Whichever already has the rest of the company's infrastructure on it. Do not introduce a new hosting layer for the AI workload.

Vanta or Drata for compliance plumbing, partner pricing passed through.

That is the boring stack. It is not new. It does not have a hot Twitter following. It does not have a foundational framework that promises to be the agentic operating system. It also does not require a migration in a year. It is the stack that the team can hand off to the client's engineers without a six-week onboarding. It is the stack we would still be defending if we revisited the engagement two years later.

Where this leaves you

Stack decisions feel low-stakes during the build phase. They become high-stakes the moment you try to evolve the system, hand it off to a new team, or migrate off a vendor.

If you are starting an AI build now, the questions to ask before you adopt anything:

Has this been in production at a well-known engineering shop for at least three years?

Does this do one thing, or does it want to be a platform?

If the company maintaining this goes away tomorrow, what is the migration path?

How easy is it to leave this in two years?

If the answers are uncomfortable, the boring alternative is probably the right call. The boring alternative is also usually cheaper, easier to hire for, and easier to defend in a security review.

We almost never regret choosing boring. We have, more times than we can count, watched teams regret choosing exciting.

The system that is in production a year from now is the one made of components that were already boring when the system was built. That is not a coincidence. That is the operating principle.