GPT‑NL: a sovereign language model for the Netherlands

T tno.nl ↗

▲ 251 points • 304 comments • by root-parent • 6d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

45 %

AI likelihood · overall

Mixed

58% human-written 42% AI-generated

SEGMENTS · HUMAN 1 of 2

SEGMENTS · AI 1 of 2

WORD COUNT 489

PEAK AI % 76% · §2

Analyzed

Jun 16

backend: pangram/v3.3

Segments scanned

2 windows

avg 245 words each

Distribution

58 / 42%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 489 words · 2 segments analyzed

Human AI-generated

§1 Human · 23%

GPT‑NL values We are building a responsible language model for the Dutch language and context: trustworthy, transparent, reciprocal and sovereign. Sovereign: control over technology that matters GPT‑NL is developed within the Netherlands and Europe. This gives us full control over the model, the data and the choices we make. We avoid dependency on non‑European providers and invest in a sustainable AI ecosystem aligned with our laws, values and societal goals. Open and transparent: insight from source to model GPT‑NL is built on transparency. We clearly document the choices we make during data collection and training, and how we address risks such as bias and ethical concerns. We publish the source code as open source and share detailed insights into the dataset. Model weights are made available under a controlled licence. This allows us to know who uses the model and to inform users about updates or changes, for example following a data opt‑out. In this way, we operate transparently without compromising security or regulatory compliance. Trustworthy: protecting users and citizens We train GPT‑NL entirely from scratch. This prevents unclear data provenance, copyright risks or potential personal data from being inherited from existing models. To ensure a reliable foundation, our data collection meets strict criteria: Safeguarding intellectual property Removing and anonymising personal data before model training Excluding confidential information Excluding harmful content Avoiding duplication within the dataset Reciprocal: fair agreements on data and value GPT‑NL deliberately works with a clean and lawful data supply chain. We collaborate closely with data providers and actively involve them in the development of the model. Through the Content Board, these data providers and rights holders have a voice in the future of GPT‑NL. Part of the revenues flows back to the creators. This creates a fairer innovation model in which value is shared rather than extracted. Using resources efficiently AI development requires significant computing power and energy. That is why we actively focus on energy efficiency and responsible use of resources. Based on scientific research, we optimise both the size of the model and the training process, with explicit attention to energy and water consumption. Publicly funded, publicly accountable GPT‑NL is funded by the Netherlands Enterprise Agency (RVO) on behalf of the Ministry of Economic Affairs and Climate Policy.

§2 AI · 76%

A total of €13.5 million has been allocated to the project. This public investment underlines the importance of an independent, trustworthy and future‑proof Dutch language model. GPT‑NL shows that powerful AI and public values can go hand in hand. Together, we are building technology that makes the Netherlands stronger, more autonomous and fairer. Get inspired 48 resultaten, getoond 1 t/m 5 Impact Acceleration Challenge: Futureproof AI - Pitching and Ecosystem Building Day Join us on Thursday, 18 June for the closing event of the Futureproof AI challenge where teams will pitch ideas and solutions for AI that is futureproof: sustainable and sovereign. Startdatum: 18 Jun Locatie: World Trade Centre Amsterdam, Strawinskylaan 1, 1077 XW Amsterdam