When 'Close to the Hardware' Isn't Close Enough

L lmilz.dev ↗

▲ 23 points • 3 comments • by lmilz • 2w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with some AI-generated content detected

18 %

AI likelihood · overall

Mixed

87% human-written 13% AI-generated

SEGMENTS · HUMAN 5 of 6

SEGMENTS · AI 1 of 6

WORD COUNT 1,670

PEAK AI % 100% · §4

Analyzed

May 14

backend: pangram/v3.3

Segments scanned

6 windows

avg 278 words each

Distribution

87 / 13%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,670 words · 6 segments analyzed

Human AI-generated

§1 Human · 1%

I recently bought myself an STM32 Nucleo microcontroller board to play around with. What fascinated me was how much more flexible things are at this level, how much more you can do yourself. With an ESP32 that’s not really the case, you’re always tied to ESP-IDF or some other framework.I started with a first simple example, the kind everyone knows and has done before: the famous Hello World. It’s simple, and that’s exactly why it’s useful. You don’t learn a language’s syntax with it, it’s too small for that. You learn how to actually use the language. What file format, how to compile, how to link, how to run the result.That’s why I always reach for Hello World first whenever I pick up a new language or a new environment. It forces me to run the whole build system once before I do anything else. Two things I learned this way that weren’t obvious to me at the start. First: you can learn a surprising amount from a simple example if you take it seriously. Second: simple is almost never really simple. Most things that look easy are easy because someone else hid the complexity for you.The embedded world has a Hello World too: the blinking LED. Most microcontroller boards have an onboard LED that you turn on and off at some frequency. Sounds trivial. And it is, if you use a Hardware Abstraction Layer (HAL) and some ready-made project template.I’ve been working in automotive software for years, and before that on physics simulations at university. In my head I’d always been “close to the hardware”, I write embedded software after all, not frontend. A while back I wrote about the roles of C, C++, and Rust in automotive and quietly took for granted that “embedded equals close to the hardware”. At some point it hit me that this was a delusion. MCAL, AUTOSAR OS, RTE: there are more layers between me and the silicon than between a web app and the kernel. I wanted to actually get down to the bottom for once. No HAL, no framework, no vendor black box. Just the reference manual and the compiler.A Blinky in RustIn the Rust ecosystem the example quickly ends up looking like this:#![no_std] #![

§2 Human · 1%

no_main]

use cortex_m_rt::entry; use panic_halt as _; use stm32f4xx_hal::{pac, prelude::*};

#[entry] fn main() -> ! { let dp = pac::Peripherals::take().unwrap(); let rcc = dp.RCC.constrain(); let clocks = rcc.cfgr.sysclk(48.MHz()).freeze();

let gpiob = dp.GPIOB.split(); let mut led1 = gpiob.pb0.into_push_pull_output(); let mut led2 = gpiob.pb7.into_push_pull_output(); let mut delay = dp.TIM1.delay_ms(&clocks);

loop { led1.toggle(); delay.delay_ms(400u32); led2.toggle(); delay.delay_ms(100u32); } } It’s short, type-safe, and it works. The compiler keeps me from toggling an input pin. The clock config goes through a builder pattern. Pin types carry their configuration in the type system, so toggle() on an input pin is a compile error. The delay is timed off SYSCLK. And everything you don’t see, the vector table, the reset handler, the copy loop for .data, the zeroing of .bss, all of that comes from the cortex-m-rt crate. The linker just gets a small memory.x that tells it where flash and RAM are.That’s exactly the problem I wanted to dig into. Not in the “the HAL is bad” sense (I actually came away appreciating it more), but: I wanted to see what the HAL does for me. So the same thing again, but in C, no HAL, no CMSIS, just register addresses straight out of the reference manual.Hello World, embeddedThe board I picked was the Nucleo-F446ZE, and I started reading the docs (Reference Manual RM0390, chapter 6 for RCC and chapter 8 for GPIO).The blinky itself is quickly explained. Enable the GPIOB clock, configure PB0 as an output, in a loop toggle the output register. In C, with no abstraction, it looks like this.

§3 Human · 6%

First the registers as macros, then main():#include <stdbool.h> #include <stdint.h>

#define RCC_BASE 0x40023800UL #define GPIOB_BASE 0x40020400UL

#define RCC_AHB1ENR (*(volatile uint32_t*)(RCC_BASE + 0x30UL)) #define GPIOB_MODER (*(volatile uint32_t*)(GPIOB_BASE + 0x00UL)) #define GPIOB_ODR (*(volatile uint32_t*)(GPIOB_BASE + 0x14UL))

#define RCC_AHB1ENR_GPIOBEN (1UL << 1) #define LED_PIN 0U

static void delay(volatile uint32_t n) { while (n--) { __asm__("nop"); } }

int main(void) { RCC_AHB1ENR |= RCC_AHB1ENR_GPIOBEN;

GPIOB_MODER &= ~(3UL << (LED_PIN * 2)); GPIOB_MODER |= (1UL << (LED_PIN * 2));

while (true) { GPIOB_ODR ^= (1UL << LED_PIN); delay(500000); } } Three things about this code need explaining.

§4 AI · 100%

First: *(volatile uint32_t*)(...). That’s memory-mapped I/O in its purest form. The hardware exposes certain addresses that don’t point to ordinary RAM cells, but to registers of the peripherals. Writing to RCC_AHB1ENR doesn’t mean “write into a memory cell”, it means “tell the RCC block which clocks to enable”. The volatile cast isn’t a style choice, it’s mandatory. Without volatile, the compiler wouldn’t care how often you write, it would optimize the accesses away as dead stores, and the blinky would silently do nothing. volatile is the contract with the compiler: “hands off, every access has a side effect you can’t see.”Second: initializing GPIOB_MODER. I clear the two mode bits for PB0 first, then set them to 01 (General Purpose Output). Read-modify-write with &= and |=, so that other pins in the same register stay untouched. On Cortex-M, by the way, this is not atomic, that’s three instructions (LDR, ORR/BIC, STR), and an ISR could fire in between. It works here because no interrupts are active during init. If you actually need atomicity, you use the bit-band region (where available, it’s gone on the Cortex-M7) or LDREX/STREX. For pure set-or-clear on GPIO output pins there’s also the BSRR register, which is specifically designed to let you set or reset individual bits atomically in one write, no read-modify-write required.Third: delay(). The combination of volatile on the parameter and the explicit nop isn’t decoration. Without volatile, and depending on the optimization level, the compiler may simply skip decrementing the counter, because nobody reads the value. Without the nop, it’s free to collapse the loop body. Together they force the loop to actually run. The comment “500 ms at 16 MHz” is wishful thinking, since the real duration depends on the optimizer, flash wait states, and the pipeline.

§5 Human · 7%

For a blinky that’s fine, in production you’d use SysTick.So much for the functionality. The really interesting question isn’t what’s in main(), it’s: how does main() ever get called in the first place? On a PC the operating system does that. On a microcontroller there is no operating system, no loader, no process, nothing that reads in code, allocates memory, or prepares a runtime. Someone has to do all of this by hand. That’s where it got interesting for me.The hardware doesn’t know about main()When the ARM Cortex-M4 in the STM32 powers on, it does something very concrete. It reads 4 bytes from address 0x08000000 and loads them as the initial stack pointer. Then it reads the next 4 bytes from 0x08000004, interprets them as an address, and jumps there. That’s not a software instruction, that’s circuit logic, set in silicon. Everything that happens after that is software.One detail that can cost you hours if you don’t know it: bit 0 of the reset vector address has to be set. The Cortex-M4 only knows the Thumb instruction set, and the CPU uses bit 0 of the jump address as a mode bit. If it’s zero, you get a HardFault right after reset. The linker usually takes care of this for you, but anyone who builds the vector table by hand and has to cast a function pointer symbol themselves will learn this one the hard way.Which gives us a clear requirement: at address 0x08000000 exactly the right thing has to be sitting there. This structure is called the vector table, and it’s really just an array of function pointers. First entry is the stack pointer (cast as a function pointer, the hardware doesn’t care about the type, it just reads 4 bytes). Second entry is the address of the reset handler. After that come NMI, HardFault, and the other handlers. On an interrupt, the hardware looks into this table, reads the address, jumps there. It’s a hardware jump table, not a software dispatch.In code, heavily shortened, it looks like this.

§6 Human · 1%

The full table also has MemManage, BusFault, UsageFault, SVCall, PendSV, SysTick, and then the roughly 80 STM32-specific IRQs:__attribute__((section(".isr_vector"))) void (*const vector_table[])(void) = { (void (*)(void))(&_estack), Reset_Handler, Default_Handler, /* NMI */ Default_Handler, /* HardFault */ }; The section(".isr_vector") attribute matters. It tells the compiler: this data belongs in a specially named section. Where that section ends up in memory, though, isn’t decided here. That was the first moment I realized the compiler and the hardware don’t talk to each other directly. Something’s missing in between.Since the Cortex-M4 is a licensed ARM core, none of this is STM-specific. It works the same way on boards from NXP, Microchip, or TI. Once you’ve understood it once, you can dive right in on a different board.Sections floating in nothingThe STM32 has two memory regions. Flash, non-volatile, starting at 0x08000000. RAM, volatile, starting at 0x20000000. Both on the same 32-bit address bus. From the CPU’s point of view both regions are equally addressable; which addresses point to flash and which to RAM is decided by how the chip is wired.The C compiler knows none of this. It takes main.c, produces machine code, puts it into a section called .text. Constants go into .rodata, initialized variables into .data, uninitialized variables into .bss. These are all just names. The compiler has no idea that .text is supposed to end up in flash later and .bss in RAM. It doesn’t even know that flash and RAM exist. The sections have no absolute addresses. They’re just floating in nothing.So someone has to decide which section ends up at which physical address. That’s the job of the linker script.The linker script is the floor planA linker script is a text file with a .ld extension. It describes two things: which memory regions exist, and which section goes where.The lineENTRY(Reset_Handler) tells the linker where the entry point is.The MEMORY block lists the physical regions.