milebits
Field notes

Operator notes on building AI for production.

Architecture decisions we’d defend, failure modes we keep watching teams hit, and the parts most vendor pitches leave out. Written by the founders, in the voice we actually use.

  1. 9 min readrag · retrieval · evaluation · production

    Why most RAG systems fail before retrieval

    The retrieval algorithm is rarely the problem. Most RAG failures happen earlier, at stages the team isn't looking at. Here's the failure shape we keep seeing and the order we'd actually debug it in.

    Read the note
  2. 10 min readevaluation · production · ai-engineering

    Why eval harnesses belong in week one

    Most teams treat evaluation as a post-launch optimisation. By the time launch happens, the team is debugging with vibes and reverting changes based on hunches. The eval set is week-one work, not week-six work.

    Read the note
  3. 10 min readagents · cost · operations · production

    The real operational cost of AI agents

    Token bills are the visible part of the cost. The bigger numbers are hidden in retries, fallbacks, conversation context growth, and cost accounting nobody set up. Cost discipline is an architecture decision, not an optimisation.

    Read the note
  4. 9 min readarchitecture · infrastructure · production

    Boring on purpose: the stack that survives a year in production

    Every framework you adopt is migration risk you accept on day one. The cheapest production system is the one made of components that have been in production for years. Boring is a feature.

    Read the note
  5. 9 min readvoice-ai · latency · performance · production

    The latency budget you didn't know you had

    Many voice AI and real-time agent projects ship with no explicit latency budget. They discover the budget exists when users start hanging up. The median number is the lie; the p95 is the system.

    Read the note