Self-Distillation Enables Continual Learning

A arxiv.org ↗

▲ 109 points • 26 comments • by teleforce • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

15 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 1 of 1

SEGMENTS · AI 0 of 1

WORD COUNT 183

PEAK AI % 15% · §1

Analyzed

May 17

backend: pangram/v3.3

Segments scanned

1 windows

avg 183 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 183 words · 1 segments analyzed

Human AI-generated

§1 Human · 15%

View PDF HTML (experimental) Abstract:Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while substantially reducing catastrophic forgetting. In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2601.19897 [cs.LG] (or arXiv:2601.19897v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2601.19897 arXiv-issued DOI via DataCite Submission history From: Idan Shenfeld [view email] [v1] Tue, 27 Jan 2026 18:59:08 UTC (1,240 KB)