Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,741 words · 5 segments analyzed
If you liked this piece, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA, Anthropic and OpenAI’s finances, and the AI bubble writ large. I recently put out the timely and important Hater’s Guide To The SaaSpocalypse, another on How AI Isn't Too Big To Fail, a deep (17,500 word) Hater’s Guide To OpenAI, and just last week put out the massive Hater’s Guide To Private Credit.I also just did a piece about how OpenAI will kill Oracle, and I’ve used some of the materials in today’s piece. It's one of my best pieces I've ever done and I'm extremely proud of it.Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week. Yesterday morning, GitHub Copilot users got confirmation of something I’d reported a week ago — that all GitHub Copilot plans would move to usage-based pricing on June 1, 2026. Instead of offering users a certain number of “requests,” Microsoft will now charge users based on the actual cost of the models they’re using, which it calls “...an important step toward a sustainable, reliable Copilot business and experience for all users.” Users instead get however much they spend on their GitHub Copilot subscription (EG: $19 of tokens a month on a $19-a-month plan).Translation: "we cannot continue to subsidize GitHub Copilot users, or Amy Hood will start hitting people with a baseball bat." Anyway, the announcement itself was a fascinating preview into how these price changes are going to get framed: Copilot is not the same product it was a year ago.It has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions, using the latest models, and iterating across entire repositories. Agentic usage is becoming the default, and it brings significantly higher compute and inference demands.Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.
GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable.Usage-based billing fixes that. It better aligns pricing with actual usage, helps us maintain long-term service reliability, and reduces the need to gate heavy users.You see, it’s not that Microsoft was subsidizing nearly two million people’s compute, it’s that AI has become so strong, powerful and complex that it’s basically a different product!While Copilot might not be “...the same product it was a year ago,” very little has changed about the underlying economic mismatch: that Microsoft was allowing users to burn more than their subscription costs in tokens every single month for three years. Per the Wall Street Journal in October 2023:Individuals pay $10 a month for the AI assistant. In the first few months of this year, the company was losing on average more than $20 a month per user, according to a person familiar with the figures, who said some users were costing the company as much as $80 a month.Naturally, GitHub Copilot users are in revolt, saying that the product is “dead” and “completely ruined.”And I called it two years ago in the Subprime AI Crisis:I hypothesize a kind of subprime AI crisis is brewing, where almost the entire tech industry has bought in on a technology sold at a vastly-discounted rate, heavily-centralized and subsidized by big tech. At some point, the incredible, toxic burn-rate of generative AI is going to catch up with them, which in turn will lead to price increases, or companies releasing new products and features with wildly onerous rates — like the egregious $2-a-conversation rate for Salesforce’s “Agentforce” product — that will make even stalwart enterprise customers with budget to burn unable to justify the expense.
And that day has finally arrived, because every single AI service you use subsidized compute, and every single service is losing money as a result:When you pay for access to an AI startup’s service — which, of course, includes OpenAI and Anthropic — you do so for a monthly fee, such as $20, $100 or $200-a-month in the case of Anthropic’s Claude, Perplexity’s $20 or $200-a-month plan, or OpenAI’s $8, $20, or $200-a-month subscriptions. In some enterprise use cases, you’re given “credits” for certain units of work, such as how Lovable allows users “100 monthly credits” in its $25-a-month subscription, as well as $25 (until the end of Q1 2026) of cloud hosting, with rollovers of credits between months.When you use these services, the company in question then pays for access to the AI models in question, either at a per-million-token rate to an AI lab, or (in the case of Anthropic and OpenAI) whatever cloud provider is renting them the GPUs to run the models. A token is basically ¾ of a word.As a user, you do not experience token burn, just the process of inputs and outputs. AI labs obfuscate the cost of services by using “tokens” or “messages” or 5-hour-rate limits with percentage gauges, and you, as the user, do not really know how much any of it costs. On the back end, AI startups are annihilating cash, with up until recently Anthropic allowing you to burn upwards of $8 in compute for every dollar of your subscription. OpenAI allows you to do the same, though it’s hard to gauge by how much.AI startups and hyperscalers assumed that they’d be able to get enough people through the door with subsidized, loss-making products to get them hooked on services badly enough that they’d refuse to change once businesses jacked up the prices. They also assumed, I imagine, that the cost of tokens would come down over time, versus what actually happened — while prices for some models might have come down, newer “reasoning” models burn way more tokens, which means the cost of inference has, somehow, gotten higher over time.
Both assumptions were wrong, because the monthly subscription model does not make sense for any service connected to a Large Language Model.The Core Economics of Generative AI Are BrokenThink of it like this. When Uber (and no, this is nothing like Uber) started jacking up the prices for its rides, the underlying economics stayed the same, as did those presented to both the rider and the driver — a user paid for a ride, a driver was paid for a ride. Drivers still paid for gas, car insurance, any permits that their local government might insist upon, and whatever financing costs might be associated with their vehicle, and said costs were not subsidized by Uber. Uber’s massive losses came from subsidies, endless marketing expenses, and doomed R&D efforts into things like driverless cars .Generative AI Subscriptions Are Nothing Like UberTo illustrate the scale of AI’s pricing mismatch, I’m going to ask you to imagine an alternate history where Uber had a very different business model.Generative AI subscriptions are like if Uber charged users $20 a month for 100 rides of any distance under 100 miles, and if gas was $150 a gallon, and Uber paid for the gas because somebody insisted that oil would one day be too cheap to meter.Uber would, eventually, decide to start charging users a monthly subscription to access rides, and bill them for the gas that they consumed. Suddenly users would go from paying $20 a month for 100 rides to paying $20 to access a driver and $26 for a 10 mile drive. Understandably, users would be a little upset.While this sounds a little dramatic, it’s actually a pretty accurate metaphor for what’s happening in the generative AI industry, and in particular, at Github Copilot. GitHub Copilot’s previous pricing allowed 300 premium requests a month, as well as “unlimited chat requests” using models like GPT-5 mini. Each of these requests (to quote Microsoft) is “...any interaction where you ask Copilot to do something for you,” with more-expensive models taking up more requests in the later life of the request-based system, such as Claude Opus 4.6 taking up three premium requests. When you ran out of premium requests, Copilot would let you use one of those cheaper models as much as you’d like for the rest of the month.
This wasn’t even always the case. Up until May 2025, Microsoft gave users unlimited access to models, and even then users were pissed off that there were any restrictions on the product. Microsoft — like every AI company — swindled its customers by selling an unsustainable service, because it never, ever made sense to sell LLM-powered services on a monthly subscription.If you’re wondering how much services are likely to cost under token based billing, a user on the GitHub Copilot Subreddit found that the token burn of what used to be a single premium request was somewhere around $11, as one “request” involved using 60,000 tokens in the context window, a few tools, and a bunch of internal “turns” (things that the model is doing) to produce the output. There’s also the underlying unreliability of hallucination-prone Large Language Models. While a premium request chasing its tail and spitting out half-broken code might be frustrating, that same fuckup is a lot less forgivable when you’re paying the costs yourself. Users have also been trained to use the product in an entirely different manner to token-based billing, and I’d imagine many of them don’t even really realize how many “tokens” they burn or how many of them a particular task takes, something which changes based on whatever model you use.This is absolutely nothing like Uber, and anyone telling you otherwise is attempting to rationalize bad behavior. Uber may have raised prices, but it didn’t have to dramatically change the underlying economics of the platform, nor did users have to entirely change how they used the product because Uber was suddenly charging them on a per-gallon basis.Monthly AI Subscriptions Are All Part of AI’s Subsidy Scam, A Deliberate Attempt To Separate Generative AI From Its Actual Costs There has never been — and never will be — an economically-feasible way to offer services powered by LLMs without charging the actual token burn of each user, and in the process of deceiving said users, these companies have created products with illusory benefits and questionable return on investment.And that’s been blatantly obvious for years. On an economic basis, a monthly subscription only makes sense with relatively static costs.