Tim Davis | Probabilistic engineering and the 24-7 employee

T timdavis.com ↗

▲ 42 points • 27 comments • by kiyanwang • 5w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

49 %

AI likelihood · overall

Mixed

60% human-written 40% AI-generated

SEGMENTS · HUMAN 1 of 5

SEGMENTS · AI 3 of 5

WORD COUNT 1,908

PEAK AI % 100% · §2

Analyzed

Apr 21

backend: pangram/v3.3

Segments scanned

5 windows

avg 382 words each

Distribution

60 / 40%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,908 words · 5 segments analyzed

Human AI-generated

§1 Human · 10%

Software is quietly becoming a probabilistic system, and almost no one is saying it out loud. We built our profession around deterministic code. Write it, test it, ship it, know it works - but in my experience that contract is breaking. Inside the top few percent of operators at truly AI-native companies, the codebase has started to become something you believe works, with a probability you can no longer precisely state. The workday is changing as a consequence, and so are the roles, the organizations, the training pipelines, and the nature of what it means to ship.I noticed because I built one. A few months ago, in the evenings after my day job running Modular, I started building a side project called Compound Loop - a system that orchestrates multiple frontier models against each other to write, review, and merge code more or less autonomously. I would set it running on a real problem before I went to bed, and I would wake up and triage a stack of pull requests that had not existed the night before. Some were excellent, some were wrong, and some surfaced a question I did not know to ask. By 8 a.m. I was not catching up on yesterday's work - I was deciding which of the overnight jobs to keep, while the system kept analyzing logs and adding more PRs. The continuous compounding nature of it was, and still is, infectious to watch.For the first time in the history of knowledge work, the person who went home did not take the only copy of their brain with them. 9-9-6 as a concept is dead, and we are simply 24-7 employees now - but the 24-7 employee is not a person working 24 hours, it is a person whose agents work with enormous parallelization. Most teams in 2026 still bottleneck on coordination rather than typing, and most organizations have barely begun to restructure, but the frontier is always where the future shows up first, and the frontier is already here. This essay is not a description of the industry at large, but rather a description of what is already happening inside the most AI-native teams, and where I believe that pulls the rest of the industry. Roles are not just collapsing upward - they are splittingInside the most AI-native teams, the pattern is messier than the clean "everyone levels up" story most commentary is selling.

§2 AI · 100%

Some operators really are moving up the stack: the best engineers are becoming more effective product managers, working at engineering's abstraction layer, the best product managers are becoming system architects, and the best architects are thinking about distribution, growth, and the shape of the market. For this group - maybe the top tier of any team - the work is more leveraged than it has ever been, and they are having the best years of their careers.But that is not the whole picture, and pretending it is does a disservice to everyone else. Alongside the upward shift, a downward pressure is fragmenting roles in ways the headlines are not covering. Plenty of engineers are not becoming architects - instead they are becoming spec writers, reviewers, and agent babysitters, operators who spend their days translating intent into machine-readable prompts and then grading the machine's work against standards they themselves might not fully possess. Some of that work is genuinely important, but some of it is the 2026 equivalent of data entry, dressed up in new terminology.We need to be honest about what that means for the people doing it. These fragmented roles will be paid less, valued less, and in many cases become career dead ends - a layer of output-wrangling work the system needs but does not reward. The pay gap between the top tercile running fleets of agents effectively and the middle tier managing their exhaust will be wider than the pay gap between engineers and sales reps was in the previous era. That gap is already opening inside the companies I watch closely, and I don't believe it is going to close on its own.One honest note on where the scarce work has moved. In AI infrastructure, kernel performance and compiler design and hardware abstraction remain deeply defensible moats, because there is still a high degree of determinism needed at the lowest levels of systems engineering. But at the level of building software on top of those moats, the center of gravity has shifted hard toward the human inputs a machine cannot yet replicate, and that shift is real and accelerating.Jevons was right about coal, and he is right about codeIn 1865, the economist William Stanley Jevons observed that more efficient steam engines led to more coal consumption rather than less - efficiency expanded the set of things worth building engines for. We are living the software version of that same observation, and it is one of the most exciting moments the profession has ever seen.

§3 Mixed · 42%

As the unit cost of writing code approaches zero, we are not writing less, we are writing vastly more and shipping vastly more, and the best teams are leaning into the curve with both hands.The companies that believe the scaling laws are unbounded are building accordingly, and they will be the power-law-distributed winners.Many of my friends at leading AI-native companies are already rapidly moving there in practice. Agents are opening pull requests, reviewing each other's work, and closing them without a human ever touching the keyboard, with a continuously live log monitoring loop to rapidly fix issues. Self-healing test suites rewrite themselves when the underlying code changes. Autonomous experimentation loops spin up, measure, and tear down a hundred hypotheses in the time a team once ran three. Documentation updates itself faster on merges using tightly honed AI skills that also self-improve. We are moving from a world where features were bound by the constraint of how fast engineers could type to one where we are bound on human creativity, management of agentic systems, and how fast the product surface can absorb the output.In my view, this is a wonderful moment to build. The throughput gains are not subtle, and the teams that have genuinely restructured around agents are shipping three, five, or ten times what they shipped a year ago, and the curve is bending up rather than flattening. Many of the founders and operators I talk to who are running their companies this way are not complaining about noise - they are trying to figure out how to feed more work to their agent fleets tomorrow than they did today, because every incremental unit of well-directed agent output is a compounding advantage over competitors who are still typing.But Jevons' second lesson applies here too, and it is the one that separates teams that ride the curve from teams that get thrown off it: when supply explodes, selection becomes everything.

§4 AI · 100%

More coal made engines more valuable, but it also made the discipline of choosing what to burn, what to power, and what to build with the output dramatically more important. Cheap energy without judgment is just waste, and the same logic applies to code.For the teams running this well, selection is not a drowning problem - it is the new leverage point. The operator who can direct a fleet of agents toward the right problem, filter the outputs for what is actually valuable, and integrate the results into something coherent is doing the highest-leverage work in software right now. The value of a piece of work is no longer set by how much effort it took to produce, because effort has collapsed - it is set by how well someone pointed the agent fleet, chose from what came back, and integrated it into something that compounds even faster. Production is not where the work gets hard anymore. Where it is hard now is direction, selection, and coherence, and those are the exact muscles the best teams are building for as fast as they can.From deterministic engineering to probabilistic engineeringWe are rapidly moving from deterministic engineering to probabilistic engineering, and our tools, our training, and our organizational instincts are still built for the old paradigm. Deterministic engineering was the contract we operated under for most of the history of the profession - you wrote code, you tested it, you reviewed it, and you knew, within well-understood bounds, what it did. Failures were deterministic - given the same input, you got the same output, and a bug was a reproducible thing you could hunt.Probabilistic engineering is different, and inside frontier teams it is already here. Large portions of the codebase were generated by stochastic systems, reviewed under time pressure against contexts too large to fully hold, and integrated into a whole that no single human ever designed end-to-end. The codebase still runs and still ships, but the confidence interval around "this works as intended" has widened, and most teams have not updated their practices to reflect it. This is where the asymmetry at the center of all this comes into focus: generation has become cheap, but validation has not.

§5 AI · 100%

An agent can produce a plausible-looking 500-line pull request in under a minute, but catching a subtle bug in that same pull request - a concurrency issue, a silent misinterpretation of the spec, or a case where the code does what was literally asked for but not what was actually wanted - can take a senior engineer an hour of careful reading, or longer. Review scales worse than generation, and crucially, review scales worse than linearly with output volume, because as more of your codebase is written by agents, the context you need to hold in your head to evaluate any single piece grows. You are not reviewing one pull request against a codebase you wrote; you are reviewing a pull request against a codebase largely written by other agents, reviewed by you at a depth you have started to forget, under time pressure that is always rising.At some scale, the system produces more than humans can reliably evaluate, and correctness becomes probabilistic rather than assured.This is not a future problem, it is a present one. Past a certain throughput, bugs slip through not because reviewers are careless, but because the output volume has exceeded what human attention can meaningfully inspect, and the models doing much of the review are non-deterministic themselves and miss plenty. The codebase stops being a thing you know works and becomes a thing you believe works, with a probability you can no longer precisely state. Concretely, this looks like a race condition that passes your test suite nine times out of ten, a feature that works perfectly in staging and fails under a prompt distribution you did not anticipate, or a migration that is silently corrupting one row in ten thousand and will take three weeks to catch. Proximal and Modular recently published joint research testing frontier agentic systems against basic tasks - the failure patterns we documented map directly to what I am describing. I’ve personally seen this in code I’ve written with my own multi-agent harness system. The failure mode typically is not a dramatic collapse but a slow, silent degradation - generation rises, review quality falls, unnoticed defects accumulate, and trust in the system quietly erodes until a customer or an auditor or a production incident forces the issue into the open. By then the technical debt runs deep.The uncomfortable truth is that we do not yet have the tooling to solve this properly.