Token efficiencyApril 28, 2026

Atomic facts and the token budget

The naive way to give a model access to your documents is to put the documents in the context window. It breaks at scale. Long contexts cost tokens linearly, dilute attention, and force the model to re-read material it has already seen. The unit is the document, and the document is almost always larger than the question being asked of it.

Watcher uses a smaller unit. During extraction, each assertion in a source becomes one row in facts.db, with its own fact hash and a pointer back to the chunk it came from. An agent asks for the specific claim it needs over MCP, instead of pulling an entire file into context to find one sentence inside it.

The savings are real. On atomic fact queries against a compiled corpus, this cuts tokens per answer by 82 to 96 percent compared to reading whole pages. The range tracks how much of the document was relevant to the question, so the savings are largest when the document was big and the answer was small.

Efficiency is the payoff. Auditability is the point. Because the fact returns with its hash, the agent can prove what it read before it reasons over it. Nothing gets summarized into vagueness and loses its citation along the way. The claim and its provenance stay attached as one object, all the way to the answer.

Re-reading the same context burns tokens on the same uncertainty over and over. Querying atomic, hashed facts buys a smaller, sharper, checkable answer instead. We think that is the better trade.