The Linux Kernel Startup

I internals-for-interns.com ↗

▲ 80 points • 21 comments • by valyala • 1w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily AI-generated with some human-written content

86 %

AI likelihood · overall

12% human-written 88% AI-generated

SEGMENTS · HUMAN 2 of 6

SEGMENTS · AI 4 of 6

WORD COUNT 1,833

PEAK AI % 100% · §1

Analyzed

May 14

backend: pangram/v3.3

Segments scanned

6 windows

avg 306 words each

Distribution

12 / 88%

human / AI fraction

Verdict

Pangram v3.3

Article text · 1,833 words · 6 segments analyzed

Human AI-generated

§1 AI · 100%

Have you ever wondered what really happens between the moment you press the power button and the moment your login screen shows up? That gap—usually some seconds—hides one of the most intricate initialization sequences in computing. Today I want to walk you through it.This is the first article in a series where I’ll try to make sense of the Linux kernel internals together with you. We’ll talk about how Linux boots, how it manages processes and memory, how it deals with hardware, and so on. If you’ve ever been curious about what’s happening under the hood, you’re in the right place.⚠️ Quick disclaimerI’m not a kernel expert—I’m learning out loud. The goal here isn’t a deep, exhaustive tour but a useful map: what the main pieces are and how they fit together. For the deep dives, the source is the real teacher.This article focuses on x86_64. The big picture applies broadly, but specifics vary on ARM, RISC-V, etc.Now let me throw a metaphor at you, because this is going to be a long ride and we’ll need a thread to hold onto.Imagine We’re Setting Up a Space ColonyPicture a barren planet.

§2 Human · 14%

No air to breathe, no roads, no buildings, no power, no comms. We send a small advance team in a dropship. Their mission: turn this rock into a working colony, and do it before life support runs out.The advance team can’t just unload everyone and start hosting town hall meetings. They have to do things in a very specific order. First the basics: confirm the lander didn’t crash, set up emergency procedures in case anything goes wrong. Then map the terrain, find usable resources, set aside areas for storage. Then bring up the construction equipment, build the first habitats, the power grid, the comms tower. Then start the proper governance: a colonial governor, a dispatch office that handles future crew arrivals, and a maintenance crew that takes over the boring “keep things running” duties. Finally, they wake up the rest of the colonists from cryosleep and hand them the keys to the place.That’s pretty much what the Linux kernel does at boot. The bootloader is the dropship. Your computer is the barren planet. The advance team is the execution of the startup code in the Linux kernel—the one we’ll be following the whole time. And by the end of this article, that advance team will literally have transformed itself into the standby maintenance crew while a brand-new civilian government takes over. Bear with me—it’ll make sense as we go.Here’s the rough trip we’re about to take:Let’s start where the bootloader leaves off.The Handoff: What the Bootloader Hands UsSo GRUB (or whatever bootloader you’re using) hands control to the kernel. What do we actually have to work with?Honestly, not much.The CPU is already running, but in one of several modes—roughly, how wide the registers are and how memory works.

§3 AI · 100%

On x86 that’s 16-bit Real Mode, 32-bit Protected Mode, or 64-bit Long Mode. UEFI puts us straight in Long Mode; legacy BIOS usually leaves us in Protected Mode. We’ll deal with the rest of the CPU’s state (page tables, interrupts) once we get to Phase 1.Memory is awkward. The kernel was loaded low in RAM (typically around 0x1000000), but it’s compiled to run at a high virtual address (something like 0xffffffff81000000). That mismatch is going to bite us soon.What else? A memory map from the firmware (the E820 map on x86) telling us where RAM is, what’s reserved, and where the ACPI (Advanced Configuration and Power Interface) tables live; a bag of boot parameters (the command line, the initrd location, etc.); and that’s it. No console, no allocator, no interrupts, no log.Let’s build something.Phase 1: The Assembly TrampolineUnpacking the Kernel FirstOne small twist before anything else: the file the bootloader handed us is a bzImage, and most of it is compressed. Shipping the kernel compressed saves space on disk and in memory during boot, but the CPU obviously can’t execute compressed bytes. So the very first code that runs isn’t the kernel proper—it’s a tiny decompressor living in arch/x86/boot/compressed/. Its job is to unpack the real kernel image into memory and then jump to it.The decompressor also picks a random base address to load the kernel at—this is KASLR (Kernel Address Space Layout Randomization), and it makes life harder for attackers who’d like to guess where kernel code lives.Once the decompressor is done, control jumps into the real kernel.Into the Real KernelWhere we land depends on the bootloader. On a legacy 32-bit boot we start at startup_32 in the decompressor, which has to climb the CPU into 64-bit Long Mode itself—building a tiny page table where every virtual address points to the same physical address (an identity mapping—the simplest possible setup), flipping the “you are now a 64-bit chip” bit, turning paging on, and jumping to startup_64.

§4 Human · 13%

A modern UEFI bootloader skips the climb and jumps straight to startup_64. Either way, every path converges there. And we’re as bare-bones as it gets: almost-pure assembly, no C runtime, no library calls—just instructions and registers.So what’s the very first thing the kernel does? Two pieces of plumbing it can’t function without: it points the stack pointer at a small pre-allocated buffer (you can’t call any function without a working stack), and it installs a minimal GDT and IDT—two CPU-required lookup tables for memory segments and exception handlers, respectively. With those in place, the more interesting work can begin.A Detour for Encrypted HardwareFirst interesting step: memory encryption. Some AMD CPUs can transparently encrypt RAM so that someone with physical access to the memory chips can’t just read your data off them. The feature is called SME (Secure Memory Encryption), with a sibling for virtual machines called SEV (Intel’s analogue here is TDX); the simpler “encrypt all RAM with one key” feature on Intel is TME (Total Memory Encryption). If we’re on hardware like that, the kernel has to turn encryption on right now—you can’t go back and encrypt data you’ve already written in the clear.With encryption sorted, we can ask the next question.Did the Lander Even Survive?Before going any further, we should check that the gear works. The advance team isn’t going to start unpacking power generators if the air recyclers don’t even turn on. The kernel does its equivalent with verify_cpu:verify_cpu: # Check for Long Mode support # Verify SSE2 (required by x86_64 ABI) # Validate other CPU features Long Mode? Check. SSE2? Check. If something essential is missing, we just stop right here. This is, by the way, exactly why you can’t run a 64-bit kernel on a 32-bit CPU—it’s not that something subtle goes wrong later, it’s that we fail the very first checklist item.

§5 AI · 100%

With the gear verified, it’s time to deal with that awkwardness from the inventory list.The Address Mismatch ProblemRemember the awkwardness from earlier? The kernel was loaded at one address but compiled for another, and we have to fix it before going further. Think of the advance team carrying a detailed map of the colony, but landing two kilometers off from where the map says “you are here.” Every reference like “the power station is 500m north” is now wrong.In kernel terms, this fix is called page table fixup. A page table is the lookup table the CPU uses to turn the virtual addresses in your code into the physical addresses where bytes actually live. The kernel ships with a small set of these tables pre-filled by the linker, assuming it’ll be loaded at a specific address—an assumption KASLR just broke.So startup_64 calls a C helper, __startup_64() (via a position-independent thunk called __pi___startup_64, in case you go grepping the source), which computes the difference between “where the code thinks it is” and “where we actually landed” and patches the page table entries by that offset. Once it returns, virtual addresses translate to the right physical bytes, and the map matches reality.With addresses sorted, we can finally leave the bare-metal assembly world behind.Jumping into CWith the page tables fixed, the kernel switches the CPU over to them and jumps to its first real C function, x86_64_start_kernel. From this point on, the kernel runs at the high virtual addresses the linker originally targeted. We’re leaving bare assembly behind—but we get C only barely: no allocator, no console, no library calls. Just C with raw pointers and discipline.Phase 2: Early C InitializationThe first C function is x86_64_start_kernel in arch/x86/kernel/head64.c. We’re still in advance-team mode, but we have slightly better tools now.Before the more interesting work, x86_64_start_kernel does some bookkeeping we won’t dwell on: cr4_init_shadow() caches a copy of the CR4 control register so future code can avoid re-reading it from the CPU, and reset_early_page_tables() throws away the identity-mapped page tables that got us this far—we don’t need that training-wheels mapping anymore.

§6 AI · 100%

The first chore worth talking about is embarrassingly mundane.Wiping Down the WorkbenchesIn a normal C program, the runtime takes care of zeroing out uninitialized globals before main() runs. But we are the runtime. Nobody’s going to do it for us:clear_bss(); This zeroes out the .bss section—the region holding uninitialized globals and statics. It’s the equivalent of unpacking the gear and making sure every storage bin starts empty before anyone puts anything in.Next, we want to get some safety machinery on its feet—even if only as a placeholder.KASAN, the Safety Inspector with No Office YetRight before KASAN, a quick sme_early_init() finishes wiring up the memory-encryption setup we started back in Phase 1, so any page table entries we touch from now on come out encrypted on hardware that needs it.Then, if the kernel was built with KASAN (Kernel Address Sanitizer)—the safety inspector that catches use-after-free bugs and buffer overflows—we need to bring it up here. The catch: KASAN needs a huge region of shadow memory to track every byte the kernel allocates, and we don’t have a real memory allocator yet.The trick is to point that entire shadow region at a single zero page. KASAN-instrumented code can read from any shadow address and just see zeros, which keeps it from crashing even though nothing is actually being tracked. Once the real memory allocator comes online later, KASAN gets proper shadow memory and starts doing its job for real.Speaking of safety nets, we also need to handle the case where something blows up.Emergency Procedures Before You Need ThemIf something goes wrong now—a bad memory access, a divide by zero, anything—the CPU needs to know who to call. The way it figures that out on x86 is by looking up a table called the Interrupt Descriptor Table (IDT): a fixed-size array where each entry says “if exception number N happens, jump to this handler function.” If we haven’t set one up, the CPU has no handler to run for the original problem, which itself becomes a second exception, and the handler-lookup for that fails too. After three failures in a row, the CPU gives up and resets the machine—a triple fault, which from the outside just looks like a silent reboot with no error message.