FatGid - FreeBSD 14.x kernel LPE

F fatgid.io ↗

▲ 104 points • 41 comments • by WhyNotHugo • 3d ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

10 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 8 of 8

SEGMENTS · AI 0 of 8

WORD COUNT 1,397

PEAK AI % 13% · §3

Analyzed

May 21

backend: pangram/v3.3

Segments scanned

8 windows

avg 175 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,397 words · 8 segments analyzed

Human AI-generated

§1 Human · 1%

SummaryA kernel stack buffer overflow exists in the setcred(2) system call introduced in FreeBSD 14.x. The overflow occurs before any privilege check, allowing any unprivileged local user to trigger arbitrary behaviour ranging from a kernel panic to full local privilege escalation. Working LPE exploits against an amd64 GENERIC kernel both without SMAP/SMEP and with SMAP/SMEP enabled have been developed and are described below. The SMAP/SMEP-safe variant requires only that zfs.ko be loaded -- the case on every FreeBSD installation with a ZFS pool. The root cause is a single sizeof type error in kern_setcred_copyin_supp_groups() (sys/kern/kern_prot.c).The bug was silently fixed in the main branch on 2025-11-27 (commit 000d5b52c19ff3858a6f0cbb405d47713c4267a4) as a side effect of a broader function refactoring. The FreeBSD Security Team published FreeBSD-SA-26:18.setcred on 2026-05-21, and patches have been issued for all currently supported branches. Users of 14.3, 14.4 and 15.0 should update to 14.3-RELEASE-p14, 14.4-RELEASE-p5 or 15.0-RELEASE-p9 respectively.On FreeBSD 15.0 the surrounding code differed enough from 14.4 that the chain primitives developed here did not lift the overflow into a working LPE; on that branch the bug remained a kernel panic triggered by any unprivileged user. 15.0 is now patched as well.

§2 Human · 7%

Impact01 / LPE - SMAP & SMEP enabledModern-kernel root, no info-leakA single setcred(2) syscall lifts an unprivileged shell to uid=0 on a kernel with SMAP and SMEP enabled. No kernel info-leak primitive is required. This is the headline result.02 / LPE - no mitigationsLegacy-kernel rootSame single syscall, on a kernel without SMAP/SMEP. Useful as a stepping stone and as a reference for the amd64_syscall+0x155 chain primitive that both techniques share. Full chain on FreeBSD 14.4-RELEASE-p3 amd64. Am I affected?FreeBSD-SA-26:18.setcred was published on 2026-05-21 and patches have been issued for every currently supported branch. If your system is at or above the patchlevel listed below, you are not affected. Patched (update available) FreeBSD 14.3-RELEASE — fixed in 14.3-RELEASE-p14 (2026-05-20) FreeBSD 14.4-RELEASE — fixed in 14.4-RELEASE-p5 (2026-05-20) FreeBSD 15.0-RELEASE — fixed in 15.0-RELEASE-p9 (2026-05-20) FreeBSD stable/14 — fixed in tree (2026-05-20) FreeBSD stable/15 — fixed in

§3 Human · 13%

tree (2026-01-06) Vulnerable + exploitable if unpatched Any 14.3 / 14.4 system below the patchlevels listed above (working LPE under SMAP & SMEP) Vulnerable, panic only if unpatched Any 15.0 system below 15.0-RELEASE-p9 (same source-level typo, but the surrounding code differs enough from 14.4 that no chain primitive we know of lifts the overflow into a working LPE) Not affected FreeBSD main (fixed in commit 000d5b5, 2025-11-27) FreeBSD 13.x and earlier (setcred(2) not present) Vulnerability detailsFile: sys/kern/kern_prot.c Function: kern_setcred_copyin_supp_groups() Lines: 528-533The function signature uses a double pointer for the groups argument: static int kern_setcred_copyin_supp_groups(struct setcred *const wcred, const u_int flags, gid_t *const smallgroups, gid_t **const groups)Because groups has type gid_t **, the expression sizeof(*groups) evaluates to sizeof(gid_t *) == 8 on LP64, rather than the intended sizeof(gid_t) == 4. This sizeof expression is used in two places: /* line 528-530: allocation */ *groups = wcred->sc_supp_groups_nb < CRED_SMALLGROUPS_NB ?

§4 Human · 12%

smallgroups : malloc((wcred->sc_supp_groups_nb + 1) * sizeof(*groups), M_TEMP, M_WAITOK); /* sizeof(*groups) == 8 */

/* line 532-533: copyin */ error = copyin(wcred->sc_supp_groups, *groups + 1, wcred->sc_supp_groups_nb * sizeof(*groups)); /* sizeof(*groups) == 8 */The allocation on the heap path is 2× oversized, which is safe. However, for the stack path (when sc_supp_groups_nb < CRED_SMALLGROUPS_NB == 16), *groups is set to smallgroups, a gid_t[CRED_SMALLGROUPS_NB] array declared as a local variable in the caller user_setcred(): gid_t smallgroups[CRED_SMALLGROUPS_NB]; /* 16 * 4 = 64 bytes */The copyin destination is *groups + 1 == &smallgroups[1], which leaves 15 * 4 == 60 bytes of usable space. The copyin copies sc_supp_groups_nb * sizeof(*groups) == sc_supp_groups_nb * 8 bytes. With the maximum stack-path value of sc_supp_groups_nb == 15: Bytes written: 15 * 8 = 120 Buffer capacity: 15 * 4 = 60 Overflow: 60 bytes past the end of smallgroups[]The overflow is written with fully attacker-controlled data from user space (wcred->sc_supp_groups points to an attacker-supplied buffer).Trigger path and privilege-check orderingThe overflow happens in kern_setcred_copyin_supp_groups(), which is called from user_setcred() at line 604 -- before the privilege check. The privilege check (priv_check_cred(PRIV_CRED_SETCRED)) does not occur until kern_setcred() is called at line 623, and within that function at line 813.

§5 Human · 8%

Any local user can trigger the overflow by issuing: setcred(SETCREDF_SUPP_GROUPS, &wcred, sizeof(wcred))with wcred.sc_supp_groups_nb == 15 and wcred.sc_supp_groups pointing to a 15 * 8 == 120-byte user-space buffer.LPE technique (no SMAP, no SMEP)The 60-byte overflow corrupts every callee-saved register slot in user_setcred()'s prologue except saved RBP. Compiler ordering on 14.4 GENERIC places the corruption window at [rbp - 0x40 .. -0x05]: buf[60..67] mac.m_buflen buf[68..75] mac.m_string buf[76..83] td pointer spill <- controls kern_setcred(td=...) buf[84..91] saved rbx buf[92..99] saved r12 <- propagates up the stack buf[100..107] saved r13 buf[108..115] saved r14 buf[116..119] low 32 bits of saved r15The crucial observation is that sys_setcred()'s prologue saves only rbp/r14/rbx -- it does not save r12. The corrupted r12 popped by user_setcred()'s epilogue therefore propagates unchanged through sys_setcred() up to amd64_syscall(), which at +0x155 uses it as if it were the live td_proc pointer: ffffffff8105b6e5: mov rcx, [r12 + 0x3f8] ; r12 fully controlled ffffffff8105b6ed: mov

§6 Human · 3%

rdi, rbx ; rdi = real curthread ffffffff8105b6f0: mov esi, eax ; esi = setcred retval ffffffff8105b6f2: call [rcx + 0xc8] ; INDIRECT CALLThis is a two-level indirect call entirely controlled by the attacker: *(r12+0x3f8) supplies rcx, and *(rcx+0xc8) is the call target.Without SMAP, the kernel happily dereferences user-mode pointers, so both indirections can be satisfied by fake structures placed in user memory. Without SMEP, the indirect call may target user-space code.The published no-SMAP exploit constructs a fake struct sysentvec whose sv_set_syscall_retval slot (offset 0xc8) points to user-space shellcode. The shellcode reads gs:[0] for the real curthread, restores r12, then zeroes cr_uid/cr_ruid/cr_svuid/cr_rgid/cr_svgid on the real td_ucred and returns.LPE technique (SMAP/SMEP, no info-leak)The chain primitive at amd64_syscall+0x155 reaches its target with rcx = K1 (an attacker-chosen 8-byte value). If the target gadget writes rcx + 1 to td->td_ucred, the current thread's credential pointer is now set to any address we choose -- and if that address happens to lie inside a kernel buffer we control (a heap-resident pargs slab), the fake credential we planted there immediately takes effect.

§7 Human · 9%

The gadget lives inside zfs.ko, in ZSTD_initCStream_advanced: push rbp; mov rbp, rsp push r15; push r14; push rbx sub rsp, 0x38 mov rbx, rdx mov r14, rsi mov r15, rdi ; r15 = arg1 = real_td (from chain) mov rax, [rip + __stack_chk_guard] mov [rbp - 0x20], rax ; canary spill xor eax, eax cmp dword ptr [rbp + 0x2c], 0 lea rdx, [rcx + 1] ; rdx = K1 + 1 cmovne rax, rdx test rcx, rcx mov dword ptr [rdi + 0x430], 0 cmovne rax, rdx ; rcx != 0 (always) -> rax = K1 + 1 mov qword ptr [rdi + 0x180], rax ; *** td->td_ucred = K1+1 ***The two cmovne instructions both fire whenever rcx != 0. The function continues with stores into td+0x10..0x3c which corrupt TAILQ_ENTRY scheduler-link fields with garbage drawn from amd64_syscall's stack frame, then performs its canary check and returns. Empirically the corruption is survivable until the thread next reaches the scheduler.Fake ucred placement (parent's pargs slab)setproctitle(2) is exposed to unprivileged users; the kernel allocates a 256-byte slot in the PARGS UMA zone and copies up to 244 user bytes verbatim into the ar_args field.

§8 Human · 8%

The parent process's pargs slab P_base becomes our fake_ucred: slot offset field value +0x20 cr_ref 0x7fffffff (high; defeats crfree) +0x28 cr_users 0x7fffffff +0x2c cr_flags 0 +0x60 cr_uid 0 +0x64 cr_ruid 0 +0x68 cr_svuid 0 +0x6c cr_ngroups 1 +0x88 cr_prison &prison0 (real kernel symbol) +0xb0 cr_groups &prison0 (TRICK: see note) +0xb8 cr_agroups 1 +0xc7 call target ZSTD_initCStream_advancedcr_groups trick: setting cr_groups = &prison0 makes cred->cr_groups[0] read the first 4 bytes of struct prison, which is pr_id = 0 = wheel gid. This lets the in-kernel groupmember(0, cred) check inside the VFS chmod path return 1 without a NULL dereference.K1 placement (child's pargs slab)The chain primitive reads K1 via mov rcx, [r12 + 0x3f8]; we want K1 = P_base - 1 so that K1+1 = P_base is our fake ucred. We can't write P_base - 1 back into the parent's slab (it already contains fake ucred fields), and we can't use the td_name trick: UMA-heap addresses always have a NUL byte at byte offset 4 of P_base - 1, which truncates thr_set_name's strlcpy.Solution: fork a CHILD process that does its own setproctitle with the qword P_base - 1 placed at offset 0xd0 of its own pargs.