Skip to content
HN On Hacker News ↗

Self-Distillation Enables Continual Learning

▲ 107 points 25 comments by teleforce 1w ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

15 %

AI likelihood · overall

Human
100% human-written 0% AI-generated
SEGMENTS · HUMAN 1 of 1
SEGMENTS · AI 0 of 1
WORD COUNT 183
PEAK AI % 15% · §1
Analyzed
May 17
backend: pangram/v3.3
Segments scanned
1 windows
avg 183 words each
Distribution
100 / 0%
human / AI fraction
Verdict
Human
Pangram v3.3

Article text · 183 words · 1 segments analyzed

Human AI-generated
§1 Human · 15%

View PDF HTML (experimental) Abstract:Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2601.19897 [cs.LG]   (or arXiv:2601.19897v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2601.19897 arXiv-issued DOI via DataCite Submission history From: Idan Shenfeld [view email] [v1] Tue, 27 Jan 2026 18:59:08 UTC (1,240 KB)