Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,834 words · 5 segments analyzed
Nick Kossolapov · April 2026 I’ve been working as a software engineer for over 8 years at this point, and admittedly I’ve never understood how computers actually work. So I figured I’d try to learn how they work by emulating one. Sorry Ben Eater, I’m not going to build one just yet. I spent hundreds of hours as a kid catching Pokémon, so the Game Boy was the perfect candidate: real hardware, relatively simple in scope, and something with a strong personal connection. Instead of jumping straight into it, I first did From NAND to Tetris. It was a great course, and it made me really understand the fundamentals of computers, like registers, memory, and the ALU. Then to get used to building an emulator, I built a CHIP-8 emulator in F#: Fip-8. A few months later, and after many nights of going to bed at 2 AM even though I told myself I’d only work on it for an hour or two, I have a working Game Boy emulator: Fame Boy. Complete with sound and runs on desktop and web. How it works I wanted to have the emulator work on both desktop and web, so I focused on having a simple interface between the emulator core and whatever frontend is running it. The interface between the frontends and core is essentially just two arrays and two functions: framebuffer - a 160x144 array of shades (white, light, dark, black). audiobuffer - a ring audio buffer at sample rate of 32768 Hz with read and write heads. stepEmulator() - a function that executes one CPU instruction and returns the number of cycles taken. getJoypadState(state) - a callback for the frontend to give the emulator the joypad state, usually once a frame. I tried to model Fame Boy in a similar way to the actual hardware of the Game Boy. The CPU, like the real Sharp LR35902 in the Game Boy, knows nothing about the hardware except a memory map (and the IoController for interrupt signals only). It’s also the most F#-ish part of my codebase, leaning heavily into functional domain modelling. Memory.fs holds most of the RAM used in the Game Boy, and acts as the memory map and bus between the CPU, IO Controller, and cartridge.
It also shares a reference to the same VRAM and OAM RAM arrays with the PPU for performance. IoController.fs emerged when I found myself adding too much logic to Memory.fs. While a singular IO controller doesn’t exist in the Game Boy hardware, handling all the hardware registers through it simplified and added safety to the interfaces for the individual components. The stepper function in Emulator.fs is the glue that brings the whole emulator together, composing all the components’ individual step functions: let stepper () = // Execute a single instruction // Each instruction uses a different amount of cycles let mCycles = stepCpu cpu io for _ in 1..mCycles do stepTimers timer io stepSerial serial io // The APU technically runs at 4x CPU-cycles, but can be batched stepApu apu let tCycles = mCycles * 4 // The PPU operates at 4x CPU-cycles. The APU should be here too for _ in 1..tCycles do stepPpu ppu // Return cycles taken so the frontend runs the emulator at the right speed mCycles While the real hardware components all run in parallel based on a central master oscillator, my emulator is single threaded and so the components have to run in sequence. The stepper function centralises the execution and ensures that all the components are synchronised. Lastly, for the emulator to be playable it needs to run at the correct number of cycles per second, around 17500 CPU-cycles per 60 FPS frame. The frontends use the audio sampling rate to drive the emulator when the sound is on, and the frame rate to drive the emulator when it’s muted. More on that later. Emulating the CPU, and F# First of all, I’d like to apologise to the functional programming purists. While my CHIP-8 emulator is completely pure (no mutable members and all arrays are copied for none of that side-effect nonsense), Fame Boy uses mutability liberally. The Game Boy runs a lot faster than the CHIP-8, and copying 16+ kB of memory a million times every second didn’t seem like the smart thing to do. So, why F# for Fame Boy? Firstly, I think its extensive typing system works really well for modelling CPU instructions.
Secondly, and more importantly, I just really like F#. I used to work primarily in F# at my previous company, and so I’m always looking for an excuse to keep on using it. Domain modelling As an example why I think the CPU modelling works well in F#, I was following Gekkio’s Complete Technical Reference when implementing the CPU. I grouped the instructions like the reference, and ended up with something like this in Instructions.fs: type LoadInstr = | Load8Immediate of uint8 | Load8Direct of Register | Load8Indirect // ... other load instructions type ArithmeticInstr = | IncrementDirect of uint8 | IncrementIndirect of Register // ... other arithmetic instructions And it wasn’t just the load instructions. A lot of the other instructions shared similar concepts, like the location of the instruction’s operand: read the byte value immediately after the instruction in memory (immediate), read/write a CPU register (direct), read/write a memory location specified by the HL CPU register (indirect). Even though this is a small domain and most Game Boy devs know the opcodes/instructions basically as is, I felt like it could be neatened up. The code below shows the extraction of the location concept. The code uses different names and order from the source code to make the load instruction more readable for anyone not familiar with F#’s DU. type To = | Direct of Register | Indirect type From = | Immediate of uint8 | Direct of Register | Indirect type LoadInstr = | Load of From * To // These form a tuple, like Load<From, To> in C# // ... other instructions This helped reduce the CPU instructions down from 512 opcodes to just 58 instructions. Generalising a domain like this risks allowing invalid states, but using a good type system can avoid them. For example, if I had used a location type, Loc, instead of the From and To types, this instruction would compile without any complaints: Load(Loc.Direct D, Loc.Immediate) (storing a register to the immediate value). The Game Boy’s hardware (its domain) doesn’t support writing to the immediate value, so the domain would contain an illegal state. By using the F# type system to model the domain correctly, you get a guarantee that illegal states can’t be expressed in your system.
You don’t necessarily need unit tests, it just won’t compile. So with simplified types Fame Boy still captures exactly what the Game Boy’s CPU supports and nothing more (with one cheeky exception). Now the eagle-eyed Game Boy emulator devs would say to me “hey Nick, but what about the opcode 0x76?”, and I would reply “A monad is a monoid in the category of endofunctors” to show that I’m using a functional programming language and therefore smarter than them. Joking aside though, it’s a compromise I decided on because I felt it simplified the CPU domain a lot. If you look at the patterns that the opcodes follow, 0x76 would be Load(From.Indirect, To.Indirect), or load the 8-bit value from the memory at location HL to the memory at location HL. My emulator’s typing allows it, but the instruction doesn’t actually exist on the Game Boy. Logically, it’s a NOP and not dangerous, and in practice it’s unreachable since the opcode reader decodes 0x76 as HALT. But it’s a notable blemish in what I think is otherwise a decent domain model. Now you can do something similar in most languages, but if you’ve worked with a functional language it’s hard to properly describe how smooth it feels working with these types. After using a match statement or Options in F#, going back to a switch statement feels clunky and prone to mistakes. For anyone who hasn’t worked with a functional programming language I’d recommend you go out and try one. Keep it simple, stupid Since the goal of this project was to learn about computer hardware rather than building the best emulator, I almost never looked at other emulators’ code in depth. However, while casually browsing the source code for CAMLBOY, I spotted lines like this: set_flags ~h:false ~z:(!a = zero) (); I really liked that you could pass in however many flags you wanted, in any order. And since they are just named parameters for the method, the performance overhead is minimal. But I couldn’t create something exactly like it because F# avoids method overloading and default parameters due to its type system supporting partial application.
Instead I settled on something like this: cpu.setFlags [ Half, false; Zero, a = 0uy ] It never sat well with me, it required an array and a flag type (e.g. Half). But I carried on anyway as I wanted to make progress. As I got near the end, I spent a lot of time revisiting my old code and refactoring, and wanted to try and improve the setFlags function. So after a lot of mulling over and trying out other approaches, I ended up with this (Cpu/State.fs L81): module Flags = let inline setZ (v: bool) (f: uint8) = if v then f ||| ZMask else f &&& ~~~ZMask let inline setH (v: bool) (f: uint8) = // ... the other flag functions and definitions // Other files cpu.Flags <- cpu.Flags |> setH false |> setZ (a = 0uy) The functions are exactly what I wanted. Effortlessly composable and testable, just simple pure functions. Chef’s kiss. The previous function required hoisting the values into DU types and putting them in an array, and the setFlags function was more verbose as a result. Furthermore, because the functions are inline and don’t require any heap allocations, the new functions are actually much more performant, they increased the emulator’s FPS by about 10%. I think that simple 16-line Flags module is possibly my favourite F# I’ve ever written. Testing I initially tackled the CPU with just this function and running the Tetris ROM: match opcode with | 0x00 -> Nop | _ -> failwith "Unimplemented opcode" And then every time it hit that exception I would implement the instruction for that opcode. I quickly hit two issues with this approach: the loop was getting a bit tedious navigating around technical references randomly instead of focusing on groups of instructions at a time, and I had no idea if I was actually implementing the instructions correctly. Fixing both of these was simple: unit tests. This is where AI really came in handy. To improve my learning I wanted to write the emulator code myself, but coming up with test cases would be tedious, and I might have tunnel vision and miss some important test cases.