The Optimal Amount of Slop is Non-zero

S slater.dev ↗

▲ 15 points • 4 comments • by sltr • 3d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,886

PEAK AI % 0% · §1

Analyzed

Jun 22

backend: pangram/v3.3

Segments scanned

5 windows

avg 377 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,886 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

Regretting that code you vibed? Learn when skipping human review is and isn't a smart move.

Rigor should be proportional to risk. My regular readers might be shocked at the title of this post. If you've read my other posts, such as AI: Accelerated Incompetence or LLMs are not Bicycles for the Mind, you might expect that I would more readily miss my son's birthday than ship unreviewed LLM code. You would not be far from wrong: there are just a few narrow situations where I have. Today you'll learn about those and along the way my decision criteria for skipping code review.

note Definitions agentic coding: An LLM edits, runs, and tests code for you in a loop vibe coding: Accepting LLM-generated code without reading it slop: Low-quality, high-quantity AI-generated content

Looks can be deceiving Month by month, I encounter more people who have discovered agentic coding and have come to trust it so much that they are now unbothered to outsource not just software implementation but also verification to it. Just yesterday I chatted with a dev who says he's stopped reviewing code. He lets a team of LLM agents do it for him. I felt disappointed because he should understand this vexatious property of software, that externally observable appearance and behavior gives very little signal about internal quality. A program that does everything expected of it can still be riddled with quality issues. It works today but will break when revised and the world around it changes. As a daily user of Claude Code, I can attest that when given clear requirements and context, it regularly generates software that actually does what I asked. However, across hundreds of sessions, the code has not once been what I would call good, even after adversarial LLM review. Closed-source software is both an experience good and credence good1. We've all bought some downloadable software or subscribed to a SaaS. Before you bought, you evaluated whether the software works well for your needs, but there was no way for you, a prospective customer, to evaluate the quality of its implementation. You can only evaluate on externals. If there's a security flaw, you can't discover it.

§2 Human · 0%

Certifications like SOC 2 exist to rebalance this information asymmetry between the developer and the customer. If you, a developer, outsource reading the code to an LLM, then you discard your information advantage and bring no more value over a nonpractitioner. Here's how we know software is a credence good: give an exec a slick-looking prototype, and they'll be ready to write a check for millions. Really all you've done is given them a poster for a movie that doesn't even have a premier date yet. This is why good prototypes deliberately look unfinished.

Prodent mockups and prototypes look like pencil sketches on purpose because no exec says "ship it now." Programmers have a capability the general population doesn't: to review the code their LLM generates. That's a valuable advantage, but internal code quality is paid for in the scarce currencies of time and attention. When is the effort worthwhile? What we're looking for is the right risk-rigor ratio. Matching rigor to risk In any situation, when deciding how much rigor to exercise, we have to consider the possible costs of things going wrong. If they're low enough, we don't bother to exercise the rigor, but if they are high enough, we should. Let me tell you two stories that demonstrate getting this wrong. Too much rigor Imagine a dystopian future where hamburgers are extremely valuable. Crime rings regularly steal, launder, and resell them. When you walk into McDonald's, you pass through a metal detector and are subjected to a brisk frisking. When you order a hamburger, the cashier sternly asks to see your government-issued photo ID. In this dire world, such extreme measures are necessary to maximize McDonald's profits by protecting from loss. In the real world, this story is a laughable fiction because the rigor far outweighs the risk. Burgers would cost ten times more, and McDonald's wouldn't sell very many. At a sufficient level of risk, such drastic security measures land fully inside the Overton window: they are routine at every commercial airport in the world. Too little rigor Let's flip over to a story that demonstrates the exercise of too little rigor. The movie The Invention of Lying (2009)2 takes place in a world where nobody has ever told a lie.

§3 Human · 0%

The main character Mark Bellison, played by Ricky Gervais, is down on his luck: he's about to be evicted because he can't afford his rent. Defeated and expecting to become homeless, he saunters in to his local bank branch and asks to close his account. The teller replies that unfortunately, the computer system is down and she can't close the account, but if Mark will kindly tell her what his balance is, she can make a withdrawal right away. The account holds a balance of $300, but an epiphany hits Mark, and he tells the world's first lie: "I have $800 in my account." Just that moment, the computer system comes back up which correctly reports a balance of $300. Since lying is inconceivable to her, the teller assumes the computer is wrong and happily hands Mark $800. She even apologizes for the inconvenience.

A guilty-looking Mark Bellison realizing he just stole $500 In the real world, Mark would have asked for $8 billion. The bank would have failed, and the effects would have rippled through the US financial system. I think that would have been a more interesting movie, but in a world where nobody lies, I don't think banks would even exist. One purpose of a real bank is to keep your dirty, greedy hands off my money. In that world, a bank would be more like an office refrigerator that contains a styrofoam take-home box into which your fingernail etched your initials. The right amount of rigor The movie is not entirely a fiction. To an extent, banks pretend people don't lie. The title of this post is a snowclone of Patrick McKenzie's classic essay The optimal amount of fraud is non-zero3 which explains that banks allow some fraud as a policy decision that maximizes the overall value of commerce. The banking industry is not stupid or gullible. Smart people have converged on this arrangment after centuries of facilitating commerce and handling fraud. Enforcing zero fraud would be very expensive. Similarly, enforcing a human review of all code is expensive. An important difference between a bank and your business that for banks, the risk of fraud is distributed. Most card fraud is absorbed by retailers as the cost of doing business. Beyond that, the card network absorbs the cost.

§4 Human · 0%

Fraud is also policy. Banks are deputies of the state, regulated and backstopped by the full faith and assurance of the US Federal Reserve. What kind of software is it? Well before LLMs, it was clear that some software needs more stringent verification. The Python script that backs up your spouse's photos merits less scrutiny than your employer's authentication platform which in turn needs less care than the software running your dad's pacemaker. Bertrand Meyer, an expert on software verification, uses a three-bucket "ABC" taxonomy: Acute, Business, and Casual4. Casual software has limited distribution and loose quality constraints. Examples include an app for your personal use, a spreadsheet macro, or an internal proof-of-concept. Most software falls into this category.

if sometimes they crash, sometimes produce not-quite-right results, cannot be easily understood or maintained by anyone other than their original developers, target just one platform, run too slowly, eat up too much memory, are not easy to change, include duplicated code — it is not the end of the world

Business software is what most professional developers work on every day. If the software doesn't work, your organization suffers loss. Acute software is mission-critical and merits the highest levels of scrutiny.

if it does not work exactly right — someone will get killed, someone will lose huge amounts of money, or something else will go terribly wrong.

When deciding to which bucket software belongs, consider these factors:

Longevity: How long does the software have to keep working? Its potential harm

Reach: How many people or organizations can defects harm? Severity: How badly can defects harm someone or your org?

Examples:

Banking infrastructure still run COBOL written in the 1960s. A disruption of flight scheduling can delay thousands of itineraries and inflict costly second-order economic loss. A malfunctioning medical device can kill someone.

The grey zone For the two extremes of acute and casual software, appropriate LLM use is pretty evident. A biologist who vibe codes a Python script that produces incorrect data may publish a bad paper. Deploying unverified cancer treatment planning software will in the best case earn your business an FDA audit and in the worst case mistreat or kill a patient. For line of business software, the right amount of rigor is more elusive. It depends on what you ultimately want to achieve, your time horizon, and your appetite for risk.

§5 Human · 0%

What are you optimizing for? Speed If you're trying to ship as fast as possible right now, all else be jammed, the optimal amount of unreviewed code you should ship is close to 100%. On the other hand, if you're trying keep a reasonably fast velocity for the long haul, you'll want to slow down so you can understand the code and invest more in its maintainability. Business value Businesses want to maximize profits and minimize costs, but getting greedy today can cost later. Shipping unreviewed code can land a quick lucrative sale. This same decision can also make it expensive to iterate or pivot later. This has been the case long before LLMs. Learning It's been well established for over 50 years that producing information makes you retain it better than just reading it.5 If you're training junior software engineers, the optimal amount of vibe coding is probably zero. If you're an expert dev and learning a new language or domain, the optimal amount is still probably close to zero. Ethics There are some serious ethical problems with LLMs: stolen training data, violation of copyright, energy, water, and land use, suppression of wages, and devaluing of human labor are just a few. If any these bother your conscience, optimal use of LLMs might be zero. Slop I have shipped Here's a sample of the software I have created without human review:

A macOS app that turns the screen black after 5 idle minutes to spare my OLED monitors A macOS app that rearranges my windows to preset locations A macOS app that shows Claude usage in the menu bar A private fork of Wezterm with vertical, draggable tabs A VIM clone for the AlphaSmart Dana A CLI that automatically says "yes" to Claude after a coundown A CLI that watches a folder for receipt images and OCRs them A web app for tracking prayers A web and Android app for sending text messages from my browser An iOS app for tracking baby routines like sleep and feeding An Android and iOS app replacing the awful one that our smart thermostat uses

What do all of these apps have in common? They have limited distribution. They're just for me or my family.