Learn Something Old Every Day, Part XX: 8087 Emulation on 8086 Systems

O os2museum.com ↗

▲ 67 points • 33 comments • by ingve • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

1 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 7 of 7

SEGMENTS · AI 0 of 7

WORD COUNT 1,662

PEAK AI % 5% · §5

Analyzed

Apr 24

backend: pangram/v3.3

Segments scanned

7 windows

avg 237 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,662 words · 7 segments analyzed

Human AI-generated

§1 Human · 0%

Not too long ago I had a need and an opportunity to re-acquaint myself with the mechanism used for software emulation of the 8087 FPU on 8086/8088 machines.As mentioned elsewhere, the 8086 CPU (1978) had a generic co-processor interface first utilized by the Intel 8089 I/O processor (1979) and later the Intel 8087 FPU (1980), initially called the Numeric Processor Extension or NPX.The 8087 was a somewhat expensive add-on, assuming that a given system actually had a socket to plug the 8087 into (IBM PCs did, but other 8086/8088 systems did not necessarily have one). There was a largish class of software which could significantly benefit from the 8087 (e.g. spreadsheets), but in the era of shrink-wrapped software, there was a significant incentive to ship software which could use an 8087 when present, yet would still run on a bare 8086/8088 machine with no FPU.There was also a desire to develop and test floating-point software without having to install an 8087 into every system. Given the initial limited availability of 8087 chips, it was in Intel’s best interest to give developers a way to write 8087 software without requiring 8087 hardware.Intel released the E8087 software emulation package together with the 8087 chip. This is evidenced by the original Numerics Supplement to The 8086 Family User’s Manual from July 1980, Intel order no. 121586-001. The Numerics Supplement outlines how the E8087 package works. Actually there were two packages — the full E8087 library, and also a “partial” PE8087 library which implemented just enough functionality for Intel’s PL/M language tools.

§2 Human · 1%

Intel’s PL/M compiler was the first high-level language translator capable of utilizing the 8087.Because the 8086 had no facility for emulating an FPU (unlike the 80286 and later processors), the emulation mechanism was somewhat complex and required tight cooperation of assemblers/compilers, linkers, and run-time libraries.Assembler/Compiler – Intel OriginalThe assembler or compiler generated “emulatable” 8087 code. The translator in fact produced normal 8086/8087 code, but the object modules included special fix-ups for every 8087 ESC instruction and for (F)WAIT.Early on, Intel established the convention that the WAIT mnemonic was translated directly to the WAIT opcode, while the FWAIT mnemonic could be emulated.A key fact is that the language translator did not directly produce 8087 emulation code. It only prepared object modules for emulation while emitting regular 8087 instructions, and the actual decision whether to emulate or not was made at link time.Linker – Intel OriginalDuring the linking process, the decision to emulate or not was made. The user could link with a no-emulation library (8087.LIB), in which case the linker effectively left the object code alone.Much more interesting things happened when linking with emulation libraries (E8087.LIB or PE8087.LIB). In that case, the special fix-ups caused the linker to replace the 90/Dx (NOP/ESC) or 98/Dx (WAIT/ESC) sequences with software INT instructions.In Intel’s original implementation, ESC opcodes D8h-DFh were replaced with INT 18h-1Fh, as shown in the Intel ASM86 Reference Manual, order no. 121703-003 (1983). Note that eight separate interrupt vectors were required to replace eight possible ESC opcode bytes. The emulator may (and likely does) use a single routine to handle all eight interrupt vectors, but the 8 vectors are needed to preserve the 3 bits of FPU opcode information from the ESC instruction.

§3 Human · 0%

Microsoft’s DOS ImplementationIntel’s 8087 emulation mechanism was adopted by Microsoft and with a few changes implemented in their DOS development tools. It was also used by several other vendors of DOS development tools (Borland, Watcom, and others).For obvious reasons, Microsoft needed to change the range of software interrupts used by the 8087 emulator. Instead of interrupts 18h-1Fh, the DOS emulator uses vectors 34h-3Dh. Yes, that’s 10 vectors instead of 8. While Intel replaced WAIT instructions with NOP for emulation, Microsoft emulated WAIT instructions as well, and Microsoft also had a provision to emulate FPU instructions with ES segment override.Emulator + 8087Microsoft added one significant improvement compared to Intel’s original emulator. If the program with a built-in emulator was run on a system with an 8087 present, the emulator detected that during startup. Whenever an emulated instruction was executed (via INT 34h-3Dh), the emulator replaced the software INT instruction with the original NOP or WAIT plus the corresponding ESC opcode, and returned to execute the real floating-point instruction.This mechanism had a minimal performance impact (emulated instructions were replaced with real 8087 instructions the first time they were executed) and ensured that programs with the emulator ran at effectively 100% speed on systems with an 8087, yet the same binary could still run on a system with no FPU.This was often used for binaries shipped to end users, since the program could take advantage of an 8087 but didn’t require it.MASM ImplementationThe oldest implementation of Microsoft’s FPU emulation mechanism I could find was in MASM 1.12 and 1.25 from 1983 (no, I don’t understand the version numbering, and I am not sure which is older). Note that these assemblers do not support the .8087 directive yet and do not accept 8087 instructions by default, unlike Intel’s ASM86. To assemble FPU instructions, the /R switch must be used. To generate emulation-ready code, the /E switch must be used as well.

§4 Human · 1%

I prepared the following miniature test module:_TEXT SEGMENT PUBLIC 'CODE'ASSUME CS:_TEXTstart: finit fwait fstsw [bp] fstsw ds:[bp] ; MASM 1.x/2.x gets this wrong fstsw [bx] fstsw es:[bx]; fstsw ss:[bx] ; MASM 1.x/2.x can't do this; fstsw cs:[bx] ; MASM 1.x/2.x can't do this wait ret_TEXT ENDSEND startThen I assembled the module using MASM 1.25 with help of the the EMU2 emulator (setting EMU2_LOWMEM=1 so that early MASM versions would not hang):c:\emu2>emu2 TOOLS\MASM125.EXE /E /R emu.asm;The Microsoft MACRO Assembler , Version 1.25 Copyright (C) Microsoft Corp 1981,82,83Warning SevereErrors Errors0 0Then I disassembled the result with the Watcom disassembler, showing the object code emitted by the assembler but also the fix-ups the assembler added as a result of the /E switch:Module: ASegment: _TEXT PARA USE16 00000017 bytes ; FPU fixup FIDRQQ0000 9B DB E3 finit ; FPU fixup FIWRQQ0003 90 nop0004 9B fwait ; FPU fixup FIDRQQ0005 9B DD 7E 00 fstsw word

§5 Human · 5%

ptr [bp] ; FPU fixup FIDRQQ0009 9B 3E DD 7E 00 fstsw word ptr ds:[bp] ; FPU fixup FIDRQQ000E 9B DD 3F fstsw word ptr [bx] ; FPU fixup FIERQQ0011 9B 26 DD 3F fstsw word ptr es:[bx]0015 9B fwait0016 C3 retRoutine Size: 23 bytes, Routine Base: _TEXT + 0000No disassembly errorsThe /R /E switch combination causes MASM to produce almost the same object code as /R alone, but adds fix-ups to all FPU instructions and FWAIT.Attempting to assemble the commented out instructions results in the following errors:c:\emu2>emu2 TOOLS\MASM125.EXE /E /R emu.asm;The Microsoft MACRO Assembler , Version 1.25 Copyright (C) Microsoft Corp 1981,82,83 0015 9B 36: DD 3F fstsw ss:[bx]do this E r r o r --- 84:8087 opcode can't be emulated 0019 9B 2E: DD 3F fstsw cs:[bx]do this E r r o r --- 84:8087 opcode can't be emulatedWarning SevereErrors Errors0 2Notice that the old MASM version in fact couldn’t handle the FSTSW DS:[BP] instruction either, although it did not report an error. It just effectively dropped the DS: prefix, which would cause incorrect execution, since addressing through BP uses the SS segment register by default.

§6 Human · 2%

The problems were clearly noticed and fixed in Microsoft MASM 3.0 (1984), which can deal with the lines commented out for the old assemblers (and no, Microsoft’s MASM 2.0 has no 8087 support, because the version numbering was a complete mess):c:\emu2>emu2 TOOLS\MASM300.EXE /E /R emu.asm;Microsoft MACRO Assembler Version 3.00(C)Copyright Microsoft Corp 1981, 1983, 198449722 Bytes freeWarning SevereErrors Errors0 0Disassembling the MASM 3.0 output, we now see the following:Module: ASegment: _TEXT PARA USE16 0000001F bytes ; FPU fixup FIDRQQ0000 9B DB E3 finit ; FPU fixup FIWRQQ0003 90 nop0004 9B fwait ; FPU fixup FIDRQQ0005 9B DD 7E 00 fstsw word ptr [bp] ; FPU fixup FIARQQ0009 9B 3E DD 7E 00 fstsw word ptr ds:[bp] ; FPU fixup FIDRQQ000E 9B DD 3F fstsw word ptr [bx] ; FPU fixup FIERQQ0011 9B 26 DD 3F fstsw word ptr es:[bx] ; FPU fixup FISRQQ0015 9B 36 DD 3F fstsw word ptr ss:[bx] ; FPU fixup FICRQQ0019 9B 2E DD 3F fstsw word ptr cs:[bx]001D 9B fwait001E C3

§7 Human · 1%

retRoutine Size: 31 bytes, Routine Base: _TEXT + 0000No disassembly errorsWe can now observe six different fix-ups: FIDRQQ – normal FP instructions

FIWRQQ – FWAIT

FIARQQ – FP instructions with DS segment override

FICRQQ – FP instructions with CS segment override

FIERQQ – FP instructions with ES segment override

FISRQQ – FP instructions with SS segment override The names of the fix-ups certainly look strange. Although they are normal symbol names, they are quite unlikely to be used by normal programs.Microsoft (unlike Intel) never supplied a standalone 8087 emulator for use with MASM; only Microsoft’s high-level language libraries came with the emulator. One of the first Microsoft products with 8087 support was Microsoft Pascal version 3.04 (February 1983). At least since 1981, MS Pascal used symbol names ending with QQ for implementation internals—this used ancient conventions where symbols were limited to 6 significant characters and a double underscore was not yet used for reserved symbols. I am not sure if other Microsoft languages used the same convention, but certainly the fix-up names fit right in with Microsoft Pascal internals.Linker and the Fix-UpsAt least in Microsoft’s initial implementation, the linker did not need any special support for floating-point emulation all. All the magic was achieved through carefully coordinated cooperation between the language translators and run-time libraries.How does that work? The fix-ups refer to library symbols. These are absolute symbols with carefully chosen values. For example, FIWRQQ (FWAIT fix-up) has the value 0A23Dh. Why is that?The assembler emits FWAIT as NOP/WAIT, opcode sequence 90 9B. When interpreted as a little-endian 16-bit value, it is 9B90h. 09B90h ; NOP/WAIT+ 0A23Dh ; FIWRQQ value-------- 13DCDhThe high bit is discarded and the byte sequence 90 9B in the object file is replaced with CD 3D in the final executable. And that of course is INT 3Dh.