The Idle Loop | islam.ninja

The fastest cognitive engine ever built spends most of its time waiting.

In 1964, engineers at IBM had built the most powerful commercial computer in the world, and they spent their days watching it do nothing.

The System/360 could execute instructions faster than any machine before it. But the problem was not the processor. It was everything around it. Tape drives operated at mechanical speed. Card readers fed data one punch at a time. Disk platters rotated and sought. Every time the processor needed something from the outside world, it stopped. It waited. It ran what the engineers called an idle loop — a tight cycle of checking whether the slow world had caught up yet.

Their solution was the channel — a dedicated processor that managed data transfers independently, freeing the main processor to do useful work while the peripherals caught up. The channel did not make the peripherals faster. It made the waiting productive. It was a speed-matching layer between a fast symbolic engine and a slow physical world.

Sixty years later, we are building the same architecture for a different kind of processor.

I. The Two Clocks

A large language model generates tokens at hundreds to thousands per second. A fast model can produce a page of reasoning in the time it takes to blink. Token generation — the raw cognitive throughput of the system — has become fast enough that it is, for most practical purposes, instant.

Everything around it has not.

A typical API call takes a hundred to three hundred milliseconds to return. A test suite runs for minutes. A database migration takes longer. A browser page loads in seconds — an eternity at token speed. A human reads and responds in minutes or hours. A legal review takes days. A physical actuator moves at the speed of physics. The model operates at electronic speed, and the world operates at the speed of mechanics, networks, institutions, and human attention.

This creates a split that runs through every AI system that touches reality. Call them the inner loop and the outer loop.

The inner loop is fast and purely symbolic — decomposing tasks, generating plans, evaluating hypotheses, writing code, drafting text, considering alternatives. It costs fractions of a cent and completes in milliseconds.

The outer loop is where the model meets the world. It calls APIs, runs tests, queries databases, navigates browsers, sends emails, waits for human approval, moves robotic arms. Everything here is slow, because everything here is physical.

A coding agent shows the tension cleanly. It generates fifty file edits in twelve seconds of reasoning. Then it calls the test suite. The suite takes four minutes. By the time the first result returns, the agent has already — internally, silently — considered three architectures and discarded two. But those discarded architectures were evaluated without feedback. The agent reasoned in the dark, extrapolating from a model of the codebase that its own edits have already changed.

A customer service agent shows it differently. It diagnoses the problem in two seconds. The refund takes forty-five seconds to process through the billing API. The shipping change takes ninety seconds through fulfillment. The agent’s cognition finished before its first action completed. What should it do with the surplus? Plan the next three interactions based on assumptions that may be wrong by the time the first transaction clears?

The model is not waiting because it is slow. It is waiting because the world is.

II. Every Fast System Learns to Wait

This is not a new problem. It is one of the oldest in engineering, and its history maps what happens next with uncomfortable precision.

In 1764, James Hargreaves built the spinning jenny, and overnight the bottleneck in textile production moved from spinning to weaving. It took twenty-one years for the power loom to rebalance the system. Then spinning could not keep up again. For half a century, the constraint migrated back and forth — always present, never in the same place.

Computing tells the same story in compressed time. The System/360’s channels solved the first gap between the processor and the outside world. Then processors got faster and memory became the bottleneck, so caches were invented. Then disk I/O became the constraint, then network latency. Each solution surfaced the next constraint. Sixty years later, the gap between the processor and the world has never closed. It has only moved.

John Boyd, the fighter pilot turned strategist, made his career on the idea that faster decision cycles win — his OODA loop became the canonical framework for tempo advantage in conflict. But Boyd also understood the failure mode. Speed without orientation is thrashing, not advantage. An agent that can observe, orient, and decide faster than the world updates from its last action is not running inside anyone else’s decision cycle. It is running inside its own — making each decision based on a state of reality that its own previous action has already changed but not yet perceived. Every coding agent that edits a file and then reasons about the project using the pre-edit version of that file is demonstrating Boyd’s failure mode in real time.

III. The Feedback Desert

When cognition is faster than execution, something goes wrong that is not obvious from any single decision.

The agent fills every idle cycle with more reasoning — re-planning, re-evaluating, reconsidering — the way a nervous driver changes lanes every two seconds because calculating the optimal lane is easier than staying in one and seeing where it leads. Planning is cheap. Waiting is expensive. So the agent never waits.

But every decision it makes during the gap between action and feedback is premised on a world that may no longer exist. Its own previous actions have already changed that world, but the results have not yet returned. The agent reasons about file B using a model that still includes the pre-edit version of file A. It commits to a refund before the billing system confirms the account is eligible. Each decision is sound against the state it observed. The observed state was already obsolete.

Tokens accumulate. Plans, drafts, analyses accumulate. But if the world has not changed in response to any of it, the progress is internal — the agent has moved through symbolic space without moving through physical space. Motion mistaken for progress, because the motion is visible and the stasis is not.

IV. The Inversion

These pathologies only exist because the difficulty ranking has already flipped — and we have not yet registered the inversion.

We automated muscle first. The industrial revolution replaced human and animal labor with machines. Then we automated rote cognition — computers replaced clerks and calculators. Then skilled cognition — LLMs began replacing analysts, writers, coders, planners. The assumption was always that the physical world was the easy part and the mind was the hard part. Descartes would have agreed. The history of technology seemed to confirm it.

The speed gap reverses the difficulty ranking. Cognition is now cheap and fast. The physical and institutional world — APIs, databases, bureaucracies, supply chains, human decision-makers, legal processes, actuators — is the bottleneck. Not because interacting with these things is intellectually hard. It is not. But each one has latency, inertia, noise, and its own clock. Each one refuses to run at the speed of thought. The hard problem is no longer thinking. It is doing.

When thinking becomes cheap, value migrates to its complements. That is the economic abstraction. The engineering signature is more specific: it shows up as a clock-speed gap between the inner loop and the outer loop, between the model’s symbolic world and the physical one it acts on. The engine sits idle, and in the idling you can read the whole story.

V. What the Channel Does

A coding agent is about to call a function that queries a database. The main model is still mid-sentence, working through its chain of reasoning. But a smaller, cheaper model — running in parallel, watching the reasoning stream — has already predicted what the main model will ask for. It fires the database query before the main model finishes deciding to make it. By the time the reasoning completes and the agent reaches for the result, the answer is already waiting in cache. The idle loop never runs. The agent does not notice the world was slow, because the channel hid the latency.

This is speculative execution — a technique borrowed, name and all, from CPU architecture. A processor that would otherwise sit idle instead guesses what instruction comes next and runs it early, the way a good assistant starts pulling the file before the boss finishes asking for it. If the guess is right, the wait disappears. If the guess is wrong, the system throws out the bad result and retries. The cost of a wrong guess is one wasted computation. The cost of never guessing is an idle processor on every cycle.

The pattern works. But it fails in a way that is instructive. When the prediction is wrong — when the agent’s actual next step diverges from what the draft model expected — the pre-fetched result is not just useless. It is misleading, because the system briefly holds a cached answer to a question nobody asked. The flush-and-retry cycle is the architecture admitting that it guessed wrong about the future. Every speculative system carries this risk: the faster you try to go, the more it costs when you are wrong about where you are going.

The channel layer — the infrastructure between the fast model and the slow world — is not an accessory to the architecture. It is the architecture. And its most important function may not be to hide the latency. It may be to impose it. In high-stakes workflows — anything involving money, medical decisions, legal commitments, irreversible actions — the channel routes the decision to a human. Not because the human is smarter. Because the human is slower, and the slowness is the point. The human forces the system to synchronize with ground truth before it acts. A pause that feels like friction is actually a cache flush: the moment the system abandons its internal model and checks what is real. What that channel layer looks like when you take it seriously as engineering is a separate question — but the need for it starts here, in the gap.

VI. Where This Breaks

The argument has limits.

Faster cognition is genuinely valuable even when execution cannot keep pace. A doctor who considers fifty differential diagnoses in the time it used to take to consider five makes better decisions, even if the blood test still takes two days. The inner loop produces better plans, not just more of them. Planning quality has value independent of execution speed. The feedback desert is real, but not all unsupervised cognition is wasted. And some domains have no meaningful outer loop — pure mathematical reasoning, analysis of static datasets, creative writing. There the inner loop is the product, and the speed mismatch does not apply.

The gap may also close. It always has before, at least partially. That original gap drove the invention of caches, techniques that let devices bypass the processor entirely when moving data, and speculative execution. The current gap is already driving purpose-built infrastructure designed to make the tools faster and the connections between model and world less wasteful. In five years the channel layer may be fast enough that the mismatch is unremarkable — the way most of the tricks that tamed the original CPU-I/O gap are now invisible to the people who benefit from them.

But the most honest objection cuts deeper than either of these. Humans have always planned ahead of their information. Eisenhower launched the D-Day invasion on a weather forecast that his own meteorologists called a gamble — the entire operation committed to a narrow window of acceptable conditions, with no way to verify until the first boats hit the beach. Oncologists routinely begin chemotherapy protocols before the full panel of genetic markers returns, because waiting for perfect information means watching the tumor grow. Venture capitalists fund companies on quarter-old financials and a forty-minute conversation. The entire history of consequential decision-making is a history of acting before the feedback arrives.

The feedback desert, in other words, may not be new. It may be the permanent condition of any agent — biological or artificial — that operates in a world slower than its own deliberation. What the AI speed gap has done is make the condition visible and measurable in a way that human cognition never allowed. We could always see the idle loop in a CPU. We could never see it in a general planning an invasion. I am not sure whether making it visible changes the problem, or merely names something that was always there. The engineering cannot answer that. Neither, I suspect, can the philosophy.

In 1964, IBM engineers looked at the fastest computer in the world and saw a machine that spent most of its time waiting. They could have treated this as a defect — a problem to solve, a gap to close. Instead, they built the channel. They accepted that the processor and the world run on different clocks, and they designed an architecture that made the difference productive rather than wasteful.

Sixty years of faster processors, faster memory, faster networks, and the gap is still there. It moved. It always moves. But it never closed.

The idle loop was never the problem. It was the discovery. The moment the architecture confessed that thinking and doing are different activities, on different clocks, and always will be.

The thinking has never been faster. The world has not sped up to match. Everything interesting happens in the gap.