Hunting a 34 year old pointer bug in EtherSlip

B brutman.com ↗

▲ 38 points • 9 comments • by mbbrutman • 3mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

1 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 6 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,720

PEAK AI % 1% · §1

Analyzed

Apr 22

backend: pangram/v3.3

Segments scanned

6 windows

avg 287 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,720 words · 6 segments analyzed

Human AI-generated

§1 Human · 1%

mbbrutman@gmail.com Posted: 2026-04-19 Tags: DOS, Networking, Segmented memory is hardA few weeks ago I was revisiting my instructions for running a SLIP connection between a DOS PC and Linux. If you are not familiar with SLIP it stands for Serial Line Internet Protocol, and it lets you run TCP/IP over a PC serial port. TCP/IP is much faster over Ethernet, but a serial port can work too.There are several packet drivers for DOS that let you make SLIP connections. One that I use often is "EtherSLIP" which is handy because it emulates an Ethernet packet driver but it is really just SLIP over a serial port. The emulation allows you to use programs designed for Ethernet packet drivers unmodified; otherwise, you'd have to run programs that are designed specifically for SLIP packet drivers. All of the mTCP programs expect Ethernet, but they don't actually know what is happening "under the covers" so any packet driver that emulates Ethernet works too. (Besides EtherSLIP there is also a Token Ring packet driver that emulates Ethernet.) EtherSlip is included in the Crynwr packet driver collection, which covers most classic ISA Ethernet cards.I used Telnet to do my testing and there was something wrong with my cabling; it was slow and dropping packets like crazy. (It turned out to be a hardware problem.) When Telnet exited it gave me this error message:*** NULL assignment detectedWell, that doesn't sound good. The compiler I use (Open Watcom 1.9) checks the heap at the end of a program to let you know if there was heap corruption, but this is a different error message. I dug through the PDF documentation and I found an explanation in the "Watcom C/C++ Programmer's Guide." Here is a summary of the problem: Normally it is an error to use a NULL pointer and on a real operating system you will get a signal or an interrupt if you try to read or write using one. 16-bit DOS doesn't have that capability so it is allowed, even if it is an error. While you can't detect reads using a NULL pointer, the compiler has a trick for trying to detect writes using it. The compiler reserves 32 bytes at the start of the data segment and writes a known pattern to it.

§2 Human · 0%

At the end of the program the compiler checks to see if those 32 bytes have been altered. If they have, then something might have used a NULL pointer to do it. (Nothing else should point into that area.) If you get the warning message then something clobbered those first 32 bytes in the data segment and you probably have a bug. Even if you don't get the warning message you might have a bug, but this trick can't detect that - a write outside of those 32 bytes will not be detected by this mechanism.Ok, so here is the situation: This only happens when using EtherSLIP. I've never even seen this error before and I've been using this compiler for 15 years. It seems to be triggered only when I have packets getting lost, which then requires retrying sending those packets. The machine I'm using is an 8088 class machine so I can't use the Open Watcom debugger to catch the code that is causing this. My first attempt: Lots of if-checksThe compiler run-time is telling me that I'm writing using a NULL pointer, so all I need to do is add some trace points on the suspected path and write a warning if I see a NULL pointer being used. That is simple to do but somewhat tedious as I might have to add a trace point for every pointer I use. But the suspect code path (resending a lost packet) is not that complicated so I started with this approach. Here is a sample of what I did:void near TcpSocket::resendPacket( TcpBuffer *buf ) { if ( buf == NULL ) { TRACE_WARN(("Whoops: resendPacket tried to reference a NULL pointer.")); return; } TcpPacket_t* packetPtr = &buf->headers; ...I kept running the code and recreating the problem, but I never got my warning message. So I kept adding trace points in my code until I eventually determined that this approach was not working and I would need to try something different.My second attempt: Detect the corruption earlierThe compiler can detect the corruption, but it only runs the check when the program exits. To get closer to the problem I can do the same check and do it while the program is running, hopefully narrowing down when and where it happens.

§3 Human · 0%

To get started I first looked at the compiler source code to see exactly what it was doing, and I found what I needed in bld/clib/startup/a/cstrt086.asm. (I've slightly simplified and reformatted it here for clarity.)This is where the 32 bytes of reserved storage are defined. (It is allocated as 16 words of 0x0101.) assume ds:DGROUP

INIT_VAL equ 0101h NUM_VAL equ 16

_NULL segment para public 'BEGDATA' __nullarea label word dw NUM_VAL dup(INIT_VAL) public __nullarea _NULL endsHere is the error message that I was seeing:; ; miscellaneous code-segment messages ; NullAssign db '*** NULL assignment detected',0And here is the code that checks the storage for changes at the end of the program:__exit proc near public "C",__exit push ax mov dx,DGROUP mov ds,dx cld ; check lower region for altered values lea di,__nullarea ; set es:di for scan mov es,dx mov cx,NUM_VAL mov ax,INIT_VAL repe scasw pop ax ; restore return code je ok ; ; low memory has been altered ; mov bx,ax ; get exit code mov ax,offset NullAssign ; point to msg mov dx,cs ; . . . ...That code defines 32 bytes of 0x01 at the beginning of the data segment, and they can be addressed using the variable name "__nullarea". The bytes are present and initialized before the program starts. At the end of the program the __exit routine will be called and it will check to see that those 32 bytes are still 0x01. If they are not, you will get an error message.

§4 Human · 0%

I created a callable function in C that does the same thing:extern "C" uint8_t _nullarea; uint8_t *_nullareap = &_nullarea;

bool failed = false;

extern "C" void nullCheck( const char *loc ) {

// Only generate a trace message the first time it is detected. if (failed == true) return;

int good = true; for ( int i=0; i < 32; i++ ) { if ( _nullareap[i] != 0x01 ) { good = false; break; } }

if ( good == false ) { TRACE_WARN(("Null check failed at %s\n", loc)); Utils::dumpBytes( Trace_Stream, _nullareap, 32 ); failed = true; } }And then, I inserted a call to this code in various places in my Telnet code to try to narrow down where the problem was happening.Eventually I got to the code that calls the packet driver to send a packet on the wire:nullCheck("Packet_send_pkt sendattempt"); int86x( Packet_int, &inregs, &outregs, &segregs); nullCheck("Packet_send_pkt after soft int");The second call to nullCheck was tripping. So it was not a problem in my Telnet code, it was something in the packet driver which is why my if-checks for NULL pointers never showed anything.The trace showed me the following:2026-04-17 16:41:13.76 Nullarea is at 318a:0000 . . . 2026-04-17 16:41:28.59 W Null check failed at Packet_send_pkt after soft int Buffer address: 318a:0000 01 01 00 02 12 00 56 34 01 01 01 01 01 01 01 01 ......V4........ 01 01 01 01

§5 Human · 0%

01 01 01 01 01 01 01 01 01 01 01 01 ................Interestingly, only six bytes were corrupted and the contents of the six bytes were the same in every trace that I looked at. Sometimes the six bytes would move around but that was probably due to the changes I was making to Telnet to add the code or the trace points.Knowing that this was only happening on the packet send path I looked for those six bytes in the trace and found this right at the very top:2026-04-17 16:41:13.22 mTCP telnet Version: Apr 17 2026 2026-04-17 16:41:13.27 PACKETINT=0x60 MAC=00.02.12.00.56.34 MTU=1400 2026-04-17 16:41:13.27 IPADDR=192.168.2.122 NETMASK=255.255.255.255 GATEWAY=192.168.2.121 2026-04-17 16:41:13.33 Debug level: 0xff, DOS Version: 6.00 2026-04-17 16:41:13.33 Tcp: Allocated 1 sockets, MTU is 1400, My MSS is 1360 2026-04-17 16:41:13.33 NAMESERVER=192.168.2.1 2026-04-17 16:41:13.38

§6 Human · 0%

DOS Sleep calls enabled: int 0x28:1 int 0x2f,1680:0So the six bytes of corruption in the _nullarea are the MAC address of the simulated Ethernet device that EtherSLIP is providing. This is a very powerful clue - I now knew to look in EtherSlip on the send path, specifically where it might be copying a MAC address.Some quick notes on x86 programmingBefore we pick apart the packet driver code and expose the bug, we should review some x86 architecture.Classic x86 architecture has 16 bit registers and a segmented memory model that allows you to address up to 1MB of memory. A segment is a region of memory that starts on a 16 byte boundary (a paragraph); on a classic IBM PC there are 64K possible segments, each spaced 16 bytes apart. Segment values are stored in special registers called segment registers.To address a single byte of memory you combine a segment register and a 16 bit offset to construct a pointer. The segment defines the start of the memory region (always at a paragraph boundary) and the offset lets you reach any byte in that region, up to the range of the offset. Since the offset is a 16 bit value, that lets you address up to 64KB. To go outside of that 64KB region you have to change the segment register to point at a different paragraph.Sixteen bit x86 has four segment registers and four corresponding offset registers: Segment register Offset register CS (Code segment) IP (Instruction Pointer) SS (Stack segment) SP (Stack Pointer) DS (Data Segment) SI (Source Index) ES Extra Segment) DI (Destination Index) In addition to these registers, there are four general purpose registers (AX, BX, CX, and DX), another stack offset register (BP), and a flags register.A full pointer is usually written as the segment register and the offset register, such as "DS:SI" or "ES:DI". If the segment register is not specified it is implied.Here is a more concrete example of how this mechanism works. Assume we want to look for the second parallel port I/O address in the BIOS data area.