Skip to content
HN On Hacker News ↗

Language Models Need Sleep

▲ 212 points 140 comments by juxtapose 3w ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

2 %

AI likelihood · overall

Human
100% human-written 0% AI-generated
SEGMENTS · HUMAN 1 of 1
SEGMENTS · AI 0 of 1
WORD COUNT 192
PEAK AI % 2% · §1
Analyzed
May 26
backend: pangram/v3.3
Segments scanned
1 windows
avg 192 words each
Distribution
100 / 0%
human / AI fraction
Verdict
Human
Pangram v3.3

Article text · 192 words · 1 segments analyzed

Human AI-generated
§1 Human · 2%

View PDF HTML (experimental) Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2605.26099 [cs.CL]   (or arXiv:2605.26099v1 [cs.CL] for this version)   https://doi.org/10.48550/arXiv.2605.26099 arXiv-issued DOI via DataCite (pending registration) Submission history From: Sangyun Lee [view email] [v1] Mon, 25 May 2026 17:55:39 UTC (319 KB)