PACT: Anonymous Credentials for the Web – Mozilla Hacks - the Web developer blog

H hacks.mozilla.org ↗

▲ 66 points • 9 comments • by kevincox • 3d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,911

PEAK AI % 0% · §2

Analyzed

Jun 23

backend: pangram/v3.3

Segments scanned

5 windows

avg 382 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,911 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

This is the technical companion to our update on Distilled, “Keeping the web open and private in the bot era.” Here we take a deeper look at the problem space, the design we’re proposing, and the problems still left to solve. Bots (and privacy-preserving browsers) not welcome Browse a news site in a private window. Shop at a major retailer with a VPN. Visit a video streaming platform with anti-fingerprinting defenses tuned up. You’ll see the same responses: registration walls, block pages, and endless CAPTCHAs. The message is clear: if we think you might be a bot, you’re not welcome. Websites have valid reasons for wanting to block bots. Bots enable volumetric abuse, abuse that wouldn’t otherwise be feasible if they had to be carried out by humans. For example: SEO comment spam, credential stuffing and DDoSing. Consequently many sites employ dedicated anti-abuse tooling which aims to keep the bots out whilst minimizing friction for human visitors. Unfortunately, that tooling is increasingly failing at both tasks. Browser privacy protections are dismantling the passive signals that anti-abuse systems depended on to identify and distinguish visitors. Meanwhile advances in generative AI have rendered CAPTCHAs ineffective: bots now solve them faster and more reliably than humans. Many sites are switching to more invasive mechanisms and now ask visitors to disclose identifying information, e.g. an email address, a federated login or disabling their VPN. This means greater friction for users, since providing these details on a first visit takes time. It also compromises their privacy, since these details enable the same kinds of cross-site tracking that browser privacy protections were intended to mitigate. This leaves users with a dilemma. The more effectively they protect their privacy, the harder it is for websites to distinguish them from bots and the worse the treatment they receive. Website operators are also suffering. The additional friction they inflict upon well-behaved visitors harms their site, but many are willing to pay the costs if it mitigates volumetric abuse. Browser-based AI agents make this tension more acute. Sites may want to allow agents which are acting on behalf of individual users while blocking agents engaged in volumetric abuse. However, with no effective mechanisms to distinguish the two, websites are opting to block both.

§2 Human · 0%

That hurts users, who should be free to choose the user agent they use to access the web; it hurts new browsers and agents, which struggle to interoperate; and it hurts sites, which lose legitimate visitors. The consequence is that the web gets worse for everyone. Users get more friction or less privacy or both. Website operators see more volumetric abuse and the friction they add drives away users who would otherwise want to consume their content or services. New user agents struggle to access the same content as conventional browsers. The Costs of Convenient Solutions Some large ecosystem players have put forward solutions that leverage their control of the dominant operating systems and their deep integration with consumer hardware. These rely on device attestation: identifiers and privileged code baked into devices at the hardware level, which let manufacturers prove what software is running on a user’s device. Exposing this functionality to the web means attesting to sites that the user is running approved software with trusted hardware and therefore isn’t a bot. There have been two substantive proposals. Google’s Web Environment Integrity, abandoned in 2023, was the blunt version. It attested to the user agent itself, as well as the operating system and device in use. Users would have lost control in two ways: once to the attester, which would decide which operating systems and devices could be blessed, and again to the website, which would decide which software to accept. If sites had adopted allow-lists of approved user agents, building a new browser would have become virtually impossible, and sites could have withdrawn access from any user agent they chose. Apple’s Private Access Tokens, deployed across their ecosystem in 2022, have more subtle issues. Built on the Privacy Pass protocol standardized at the IETF, they get a lot right: a user receives a renewed, limited batch of one-time tokens that can be presented to websites without linking their visits together. This provides privacy for users and has shown rate limits to be an effective tool for sites – both points we’ll return to later in this post. However, Private Access Tokens rely on device attestation, requiring that the hardware manufacturer be in overall control of the user’s device. Presenting a PAT tells a website you are locked into Apple’s rules for what counts as acceptable software.

§3 Human · 0%

Due to PAT’s technical design[1], there’s no way to open the system to other sources of scarcity without compromising the system’s privacy properties, meaning that if more widely deployed, access to the web would become tied to having bought expensive hardware from a small, hard to change set of vendors. Both approaches are ultimately hostile to users and to the openness of the web. Both are premised on parts of a user’s device that sit within the manufacturer’s control and beyond the user’s own. Were they widely deployed, the web would become just another walled garden with centralized gatekeepers controlling acceptable hardware, operating systems and software. As convenient as these solutions are for the players who already dominate the ecosystem, we think there’s a better path. A Better Path Forward Bots’ harms arise from their ability to operate beyond human scale. For sites to prevent volumetric abuse they don’t actually need to know the user’s identity or receive cryptographic proof that they’re running approved software. If sites knew their visitors were restricted to a rate limit set by a site, that would be enough. Rate limits only make sense if they’re tied to something scarce; something an attacker can’t cheaply replicate to evade the limit. Without anchoring to a scarce resource, like the trusted hardware used in Private Access Tokens, attackers can generate as many fresh identities as they need to bypass the rate limit. However, hardware is just one option for scarcity. Anything a user already has that an attacker can’t trivially spin up at scale will work: email addresses and phone numbers are naturally scarce. A paid subscription costs an attacker the same as a real user. Even maintaining an account on a free service requires some non-trivial work. What if we could use these scarce signals across the web? We could build an open ecosystem with many parties offering scarcity signals, each site choosing which to accept. By opening up who can provide a signal, and letting sites choose which to accept, we can avoid transferring control to device manufacturers and the resulting harms. As a concrete example of who might be well positioned to provide such a signal, we can consider VPN providers acting as a subscription service. Sites routinely block VPN users indiscriminately, whether through a deliberate policy choice or through an indirect consequence of rate limiting visitors per IP address.

§4 Human · 0%

But a VPN subscription is a perfect source of scarcity. If the VPN provider could vouch for its users so that sites could rate limit each user individually – then users would be able to browse the web with less friction and without giving up their VPN usage. The catch is that building a system that can enable this on the open web whilst maintaining user’s privacy is genuinely difficult. It requires that we take information from one site — that this user holds some scarce thing — and expose it to other sites so that they can use that as the basis for their rate limiting. Letting one site verify a signal from another is the sort of information flow that privacy-preserving browsers have spent the last decade locking down to prevent cross-site tracking. Our goal would be that no more than the minimum information gets through: a single bit communicating whether the user is below the rate limit set by the site. Leaking anything more – like the source of the scarcity that the rate limit is anchored to – would be unacceptable. Enabling a new cross-site information flow might feel like compromising privacy to gain better access, but reality is more nuanced. If a new system moves sites away from demanding that visitors be identifiable (whether through fingerprinting or login forms), it can be a win for both privacy and access. The Foundations The good news is that the cryptographic foundations for a privacy preserving approach already exist. The Privacy Pass protocol, originally developed in 2018 to reduce the friction of Cloudflare CAPTCHAs for Tor users, introduced the core primitive: a token that is unlinkable between issuance and redemption. You prove something to an issuer (e.g. by solving a CAPTCHA), receive some tokens, and later present a token to a website. The website can verify the token is legitimate, but can’t link it to the user it was issued to.

Figure 1: In Privacy Pass, a CAPTCHA provider can issue tokens to a client which can then be used to bypass challenges for future site visits. Even if the CAPTCHA provider and sites collude, they can’t use the tokens to identify the user or their browsing history.

§5 Human · 0%

Privacy Pass has gone on to be successfully deployed in systems where the issuer and verifier have a prior trust relationship: Apple uses it to authenticate users of Private Cloud Compute and Private Relay without linking their activity to their identity, Chrome uses it for two-hop IP protection, and Kagi uses it to provide private search. These deployments work in part because a small number of parties have agreed in advance on who issues tokens and who accepts them. Applying this approach to an open system where any site can act as an issuer brings real challenges. Firstly, even though tokens are unlinkable, knowing a user has access to a specific issuer is a privacy leak on its own, because you can infer that the user meets the relevant issuance criteria. If one site can learn that you have a token from another site, that reveals that you have been to that site, which can be a major privacy problem. This compounds if sites can learn the set of issuers you have visited, since it becomes a fingerprint which can be used to identify you. Generic techniques exist for proving a statement in zero knowledge: we can prove that a client has a token from a set of acceptable issuers without revealing which specific issuer it is. We’ll call this issuer blinding. The generic approach is often slow, but bespoke approaches tailored to the underlying cryptography can improve this considerably. Another challenge is how sites using rate limits decide who to trust to issue tokens. If an issuer misbehaves then the site’s rate limits become ineffective, enabling volumetric abuse. However, if we need to prevent the site from learning which issuers a user has access to, the site is only going to know that one of its trusted issuers was used, not which one. This makes mistakes or misbehaviour by an issuer difficult to detect, and makes it hard for sites to evaluate new issuers. Solving this challenge is essential for openness. Without adequate information, sites are likely to lean towards conservative issuer selection. That could lead to less choice between Anchors, which in turn could lead to a new form of gatekeeper being created. To solve this, sites at least need a way to calculate an aggregate score for each issuer they use.