Skip to content
HN On Hacker News ↗

Stable Audio 3

▲ 98 points 18 comments by guardienaveugle 4d ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human
100% human-written 0% AI-generated
SEGMENTS · HUMAN 1 of 1
SEGMENTS · AI 0 of 1
WORD COUNT 233
PEAK AI % 0% · §1
Analyzed
May 20
backend: pangram/v3.3
Segments scanned
1 windows
avg 233 words each
Distribution
100 / 0%
human / AI fraction
Verdict
Human
Pangram v3.3

Article text · 233 words · 1 segments analyzed

Human AI-generated
§1 Human · 0%

Title:Stable Audio 3View PDF HTML (experimental) Abstract:Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline. Comments: Training code: this https URL Inference and weights: this http URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI) Cite as: arXiv:2605.17991 [cs.SD]   (or arXiv:2605.17991v1 [cs.SD] for this version)   https://doi.org/10.48550/arXiv.2605.17991 arXiv-issued DOI via DataCite (pending registration) Submission history From: Jordi Pons [view email] [v1] Mon, 18 May 2026 07:47:03 UTC (67 KB)