Fake building: Claude wrote 3,000 lines instead of import pywikibot
Pangram verdict · v3.3
We believe that this document is fully AI-generated
AI likelihood · overall
AIArticle text · 397 words · 2 segments analyzed
TL;DR. Claude would rather reinvent the wheel than pip install one.I wanted to fix typos on some Fandom wikis. Opened Claude Code, Opus 4.7. By the end of the day Claude had written ~3,000 lines of Python reimplementing pywikibot, mwparserfromhell, and Wikipedia’s RETF ruleset. It didn’t web search for prior art once.What got built vs. what existedComponentWhat Claude wroteWhat was on PyPIWikitext stripper122 lines of regex, with cases for nested templates, <nowiki>, <pre>, <ref> with templates, color tagsmwparserfromhell.parse(text).strip_code()Typo dictionary18 entries (teh→the, recieve→receive, occured→occurred, …)RETF, ~4,000 rules, community-maintained since 2007Edit runner10 copies, ~250 LOC each. Cookie auth, raw CSRF fetch, maxlag backoff, conflict retrypywikibot.Page.save(). The migrated version is 8 lines.Cosmetic fixesBespoke patterns I never asked forpywikibot/scripts/cosmetic_changes.py, shipped since ~2010Wiki family config13 hand-rolled SiteDefinitions in a families/ directorypywikibot/families/*.py, ships upstreamI spent the day debugging trivial bugs in the hand-rolled stripper. ASCII art bleeding into matches, code blocks getting tokenized. Every bug got patched with another regex case. Not once did Claude stop to ask whether a parser existed.Then I told it to migrateTwo minutes of Google had given me links to all three libraries. By midnight lib/ was down from ~3,000 lines to 1,259. The stripper became a shim over mwparserfromhell. The ten edit runners collapsed into one shim over pywikibot. RETF rules got fetched at runtime.And then Claude argued to keep the typo dictionary.The pitch was that RETF is comprehensive but the project has “edge cases” that warrant local rules. All 18 entries were already in RETF. Several were written worse. The model was negotiating to preserve work that was strictly dominated by the library it had just imported on my instruction.
Why this happensI don’t have a clean answer but here’s what I’d guess.The benchmarks punish the right behavior. Some public coding benchmarks run sealed. No network, no pip install, no web search. The only way to score is to write the code yourself. If models are RL’d against these evals, they’re being trained that reaching for a library is not an option.Sunk-cost defense. Once 3,000 lines exist in context, the model treats them as load-bearing. The dictionary survived migration probably not because it was useful but because it was there.I’ve seen the same pattern elsewhere. Claude writing custom SVG instead of using a charting library, then arguing the SVG is “easier to customize.” It isn’t. This post is licensed under CC BY 4.0 by the author.