Andrew Chalmers

Written by Claude Opus 4.8 to summarize Andrew Chalmers' research findings and notes.

This is the companion to The Next Big Solution in AI — the part I left out of the main piece because it is the part nobody wants until afterward: the actual logic, written down precisely enough to be wrong, with a date on it.

I should be honest about why it exists. For most of my career I have been early, and I have learned the hard way that early is worth nothing. Being right in private is a feeling, not a fact. I have watched ideas I held clearly — and failed to make legible, or to fund, or to ship — arrive years later wearing someone else's name, presented as obvious, while the version of me that saw them first has no receipt to show for it. The curse was never being wrong. It was being right too quietly to prove it.

So this is the receipt. "I saw it coming" is a worthless sentence unless it was said before, in public, specifically enough to be checked. What follows is specific enough to be checked: one object, one judgment, nine rules, and an honest ledger of what we have actually built versus what is still a promise. If Shared Office lands — if the substrate earns its keep on a real working week before some giant ships the same shape and calls it inevitable — then this document is the proof the shape was named first, here, by someone who could not afford to keep being right in private. And if it doesn't, it is the honest record of a good idea its author failed to convert, which is its own kind of useful to whoever picks it up next.

Either way, here is the thing itself, with nothing held back.

The Calculus#

It begins with a single object.

A fact. A typed assertion whose truth is a function — of time, and of conditions — carrying its own provenance and its own price. Not a row, not a document, not a message: a belief, with an expiry curve, a signature, and a cost, sitting on an append-only log. Once you have it, the systems every organization runs separately and reconciles by hand become projections of one space. Memory is the log read backward. The calendar is the log read forward — a deadline, a meeting, a cron job, a cache expiry, and a rate-limit window are one primitive: a future-referencing fact. The scheduler is the query engine — "what runs next, and where" is inverting the validity curves of future facts under capacity. And an agent's context window is a budgeted rendering of the same space — the harness problem from the companion essay, solved as a projection rather than a program.

We are building the projector as the sequence library: the single, location-independent evaluator the companion essay kept demanding, so that a plan means the same thing on a laptop's SQLite and a cloud's Postgres. What follows is that calculus in plain terms. There is a formal version — one judgment and nine inference rules — and it is in the appendix, for anyone who wants to argue with the symbols. But the symbols are not the idea, and the idea is small enough to say in English.

What a fact carries#

Start with the object, and notice everything it knows about its own situation that ordinary data throws away. A row in a database knows its value and nothing else. A fact knows six more things — and every one of them is something you already track by hand, in your head, every day.

It knows who asserted it. No global authority owns the truth: your laptop and a server in the cloud can each decide what they will accept by their own rules, and still agree to share what they conclude.

It knows when — and that the answer can change. A fact's truth can decay: a download quota resets, a deadline passes, an estimate goes stale. So "is this still true?" is just the same question asked again later, against the same curve. Monitoring stops being a system you build and becomes something you re-ask.

It knows what it cost to produce, and what it would cost to rely on. That lets the system treat your token budget the way the household in the companion essay's budgeting example treats money — not as a wall you hit, but as a price that quietly changes what is worth doing. Low on budget, the cheaper plan wins; something urgent, you pay up. Same machinery, every resource.

It knows who it is for. A fact is always being told to someone — a teammate, an agent's context window, a cold archive — and it is compressed to fit that reader's attention and budget before it crosses over. There is no such thing as a fact "in general." This matters more than it looks: it is the rule whose absence, we will claim in a moment, cost us four years.

It knows how much to believe it — and that number is earned, not typed in. The system keeps score of its own past guesses: every time it predicted something and then saw what happened, it remembers. Confidence is a track record, not a number someone asserted.

And its type is a belief, not a guarantee. This is the type-value-time continuum the companion essay named, finally made concrete: a fact's "type" is a curve describing how valid it is over time and conditions — something a scheduler and a human can both read — rather than a rigid promise that is either kept or broken.

Put those together — this actor concluded this, of this type, at this time, for this reader, at this price, with this much earned confidence — and you have the whole object. Written in the notation of type theory, that single statement is the judgment, set out in the appendix below. Everything else is what happens when you let judgments pile up on a log and never throw one away.

The handful of moves#

From that object, the whole system is a small set of moves on an ever-growing log. None is exotic. Each is the honest version of something every tool already does, badly.

The log only grows, and reading it means folding it. You never overwrite. A narrative, a calendar, a budget, a permission check — each is the log folded into a particular shape, recomputed when the facts change. "What is the status of this?" never drifts out of sync with reality, because the status was never stored. It was derived.

Work that is not ready waits in the open, instead of failing. If a fact is missing the evidence it needs, it becomes a visible gap. If it is waiting on a time or a message, it sleeps until then. Either way the intent is never swallowed by a retry loop. And because the gaps are themselves facts, the system can hand you a ranked list of exactly what it does not yet know — worst blocker first. That list, ranked ignorance, is the one thing a notes app, a tracker, and an agent-memory tool cannot give you, because none of them treat their own incompleteness as something worth writing down.

Evidence only ever sharpens — and changing your mind is a real act. New information can narrow what a fact might mean, never quietly widen it. The single move that loosens a belief — admitting you were wrong — is special, and the system refuses to let it happen silently: a correction is recorded with its cause, and it ripples outward to everything built on the old version. Most software lets you overwrite the past and pretend; here, revising a belief is a first-class, witnessed event with a thread back to everything it touched.

Nothing crosses a boundary raw. When a fact leaves one actor for another — for a teammate, an agent, the cloud — it is rendered: compressed into a contract sized for that particular reader, carrying its confidence with it. A raw dump across a boundary is the bug behind every integration that ever silently drifted apart, every schema hack, every "just expose it as an API." This is the move we built last, and the four years were largely the cost of building it last.

Everything gets scored against what actually happened. Every prediction the system makes is later checked against the outcome, and the gap between the two updates its confidence for next time. The system grades itself, continuously, on its own ledger — which is how, over time, it learns what its own claims are worth.

And deciding what to do next is itself one of these moves. This is the surprising one. "What should run, where, and when" is not a scheduler bolted on top — it is the same query, asked of the future. Waiting is a legitimate answer, and choosing when to wake up and reconsider is part of the answer. The thing the companion essay kept circling — distributed scheduling over a shared plan — turns out not to be a component at all. It is what querying a fact space is, once the facts know about time and price.

Spelled out in symbols, that is nine rules over one judgment, and they are in the appendix for anyone who wants to argue with them. In plain terms it is the paragraph you just read: a log you never overwrite, work that waits in the open, beliefs that only sharpen, corrections that ripple, edges that render, predictions that get scored, and a "what next?" that is just another query. That is the bet.

What We Expect It to Buy#

A calculus is a bet, and this section is the payout we are betting on — set honestly next to how far we have actually built it. We will mark the line between the two exactly, because a substrate that hides its own unbuilt parts is the precise dishonesty the whole thing was meant to kill.

If the shape is right, four things stop being systems and become reads of one log. Memory is the log folded backward. The calendar is the same log folded forward, where a deadline, a meeting, a cron job, and a cache expiry are one kind of fact. The scheduler is not a component bolted on but the query engine itself — "what runs next" is just inverting those validity curves under capacity, which is what a query is. And an agent's context window is a rendering under a budget, so the harness problem from the companion essay dissolves into a projection instead of a program. That is not a feature list; it is the same object read four ways.

A second prediction: the meta-layer disappears. A permission is a condition over facts; a budget is a relation over an accumulating fact; a signature is a fact about a fact; billing is a fold of the price register. If the calculus holds, there is no ACL subsystem, no billing subsystem, no monitoring subsystem to build and keep in sync — those are derivations, and every new capability is governed, priced, and audited by machinery that already exists. We find this the most falsifiable prediction and the most interesting one, which is exactly why we want it stress-tested by people who are not us.

We have been stress-testing it ourselves by building an instantiation — Shared Office (sharedoffice.ai), a memory-and-coordination layer for AI-heavy teams: it watches Claude sessions and folds them into shared narratives, meters every AI action against a visible budget, and hands a teammate's agent a five-hundred-token brief so it opens already knowing what the team knows. It is not the point. It is the test rig. Here is what the rig has earned, and what it has not.

What held. The kernel is the sequence library — one pure-TypeScript evaluator, no runtime dependencies, the location-independent evaluator the calculus demands. On it: the fact space is a real append-only log whose reads are folds; types are serializable constraint sets — named distribution families with inspectable parameters, never opaque weights — and composition is the lattice meet, so narrowing a belief and cascading a change downstream are the core operations, not bolt-ons. Blocked work suspends and resumes itself rather than erroring, and office gaps exposes ranked ignorance: what the system knows it does not know, ordered by what each gap blocks — the one surface no notes app, tracker, or agent-memory tool has, because none of them treat their own incompleteness as a fact. And rendering has teeth — the agent's entire tool surface is a projection of one command registry, not a second implementation, so drift between what a human can do and what an agent can do is not unlikely here; it is structurally impossible.

What has not. Three of the nine rules are still promises, and we would be the very thing we are criticizing if we blurred that:

Prices are still caps. A price is meant to behave like a shadow cost — yielding defer, reroute, or choose-cheaper. Today the budget is a hard cutoff: score, sort, keep the top, suspend the rest. That dual is what we are building toward, not what ships.
Changing your mind does not yet ripple. office correct annotates and tombstones with a cause; it does not yet propagate a correction through every judgment that consumed the old one. The calculus demands that every rule be invertible data — its hardest requirement — and that machinery is still dormant.
The scheduler is still a clock. The calculus wants each actor to run a decision process that elects when to act, with waiting as a first-class choice. Today there is one wake queue firing future-dated facts deterministically at their declared times: the skeleton, not the decision process.

That gap is the honest state of the thing, and we are publishing it on purpose. The companion essay calls substrate a v1 implementation of a specification. This is that specification, turned into a calculus — nine rules and one judgment — and turned from private conviction into a public claim we can actually be wrong about: that the next big thing in AI, whatever it is called and whoever ships it, will have to coordinate many minds against one shared, timed, priced account of what is true, and that its solution will rhyme with the calculus laid out here. We are canonizing our best guess at it in the open. If you think the shape is wrong — a missing rule, a rule that should not be there, a register that does not earn its place — we genuinely want to hear it. Writing it down this precisely is what makes that possible.

Appendix — The Formal Calculus#

For readers who want the symbols. The object is a single judgment; the system is nine inference rules over it. The prose above is the honest summary — this is the thing to argue with.

A \;\vdash\; e : \tau \;\;@\,t \;\;\$\,b \;\;\to\; W \;\;[\,d\,] \qquad \text{over } \Gamma

Actor A derives that expression e has type τ, at time t, under prices b, rendered to audience W, with measured confidence d, over the shared fact space Γ. The registers: Γ, the append-only log of judgments and their folds — the only state; A, a sovereign actor evaluating under its own laws; e, an expression in the vocabulary (fact, claim, patch, contract, or a quoted address); τ, a validity-curve type ordered by evidence; t, the time of derivation; b, prices as Lagrangian duals rather than caps; W, the audience a judgment renders to; d, confidence measured from past conformance. The nine rules are how Γ moves — each is exactly one operation in the evaluator.

\frac{\; L_A(\text{paths},\, t)\ \text{holds} \qquad e\ \text{well-formed in the vocabulary} \;}{\Gamma \,\oplus\, e \;=\; \Gamma'} \tag{R1 ADMIT}

Admit. A fact enters Γ only if A's laws hold; failure suspends — it never silently rejects or passes.

\frac{\; \text{evidence}(e)\ \text{undetermined at}\ t \;}{\text{gap}(e) \in \Gamma \qquad \text{(resume on evidence — owned by the fact space)}} \tag{R2a GAP}

\frac{\; \text{derivation of}\ e\ \text{awaits a message}\ m \;}{\text{suspend until}\ \text{fire}(m) \qquad \text{(resume — owned by the clock)}} \tag{R2b BLOCK}

Suspend. A gap awaits evidence (owned by the fact space); a block awaits a wake (owned by the clock). Two rules, never fused.

\frac{\; \Gamma \vdash e:\tau \qquad e'\ \text{evidences}\ e \qquad \tau'' = \tau \sqcap \tau' \;}{\Gamma' \vdash e : \tau'' \qquad (\tau'' \sqsubseteq \tau)} \tag{R3 NARROW}

Narrow. Evidence tightens a type monotonically down the lattice; it never loosens silently.

\frac{\; p : e \rightsquigarrow \bar{e}\ \text{invertible} \qquad H = \{\, j \in \Gamma : e \in \text{deps}(j) \,\} \;}{\Gamma \,\oplus\, p \,\oplus\, \{\, \text{re-derive}\ j : j \in H \,\}} \tag{R4 RETRACT}

Retract. Revising a belief is a witnessed patch-plus-inverse, propagated to every judgment that consumed the old one. (No implementation yet.)

\frac{\; C \subseteq \text{Vocab}(A) \qquad R_W : \text{Judg}_A\,{\upharpoonright}\,C \;\to\; \text{Judg}_W \;}{A \vdash e:\tau \to W \;\;\equiv\;\; (A \vdash e:\tau)\,;\ R_W(e)\ \text{crosses the edge}} \tag{R5 RENDER}

Render. Vocabulary crosses a boundary only as a contract elected for its audience, compressed under price, carrying confidence — and the contract binds the sender.

\frac{\; \text{exercise of}\ R_W\ \text{yields}\ (\hat{e},\, a) \;}{\Gamma \,\oplus\, \langle \hat{e},\, a \rangle \qquad d' = \text{fold}(d, \hat{e}, a) \qquad \text{verdict} = \text{grade}(a;\, d)} \tag{R6 MEASURE}

Measure. Every exercise of a contract appends estimate-versus-actual; confidence updates by fold; verdicts are graded, not binary.

\frac{\; e\ \text{narrowed or retracted} \qquad H = \text{holders}(e) = \text{backlinks}^{\top}(e) \;}{\forall\, h \in H : \quad \text{exchange graded verdicts until}\ \text{admit}(h)\ \text{or}\ \text{retract}(h)} \tag{R7 CASCADE}

Cascade. A change propagates to every holder of the affected claim — the backlinks transpose — negotiated until each is re-admitted or retracted.

\frac{\; \begin{gathered} M_A = \big(S,\ \{\text{act},\dots,\text{WAIT}\},\ P,\ R,\ b\big) \\ \pi^{*}(s) = \operatorname*{arg\,max}_{a}\ \mathbb{E}\!\left[\, \text{value}(a) - \langle b,\ \text{cost}(a) \rangle + \gamma^{\Delta t}\, V(s') \,\right] \end{gathered} \;}{\text{next judgment to derive} \;:=\; \pi^{*}(\Gamma_{\text{now}})} \tag{R8 SCHEDULE}

Schedule. Choosing the next judgment to derive is itself a priced decision — a semi-Markov process in which waiting is a first-class action. Scheduling is querying.

\frac{\; v = \text{derive}(e,\, \Gamma)\ \text{forward-deterministic} \;}{\text{cache}[e] := v \qquad \text{replay}(e) = v\ \text{iff}\ \text{deps}(e)\ \text{unchanged}} \tag{R9 MEMOIZE}

Memoize. Every index, snapshot, or replica is a cached derivation; replay is forward-deterministic; invalidation is cascade over derivations.

For Posterity

The Calculus#

What a fact carries#

The handful of moves#

What We Expect It to Buy#

Appendix — The Formal Calculus#