RISC-V and Floating-Point

F fprox.substack.com ↗

▲ 69 points • 49 comments • by hasheddan • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,490

PEAK AI % 0% · §3

Analyzed

May 20

backend: pangram/v3.3

Segments scanned

5 windows

avg 298 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,490 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

RISC-V support for floating-point arithmetic is a topic we have partially covered in a few previous posts but we felt like it deserves a full overview post of its own.RISC-V base ISA (RV32I or RV64I) does not define any floating-point instructions. RISC-V provides extensions to the base ISA to bring such support for floating-point arithmetic.The map of ratified floating-point ISA extensions (and their dependencies) is presented in the figures below. The first figure presents both scalar and vector ratified floating-point extensions. The listed scalar extensions are built around a dedicated scalar floating-point register file (FRF).There exists a mirror set of scalar floating-point extensions operating from the general purpose register file (XRF). They are listed in the figure below:RISC-V introduced support for floating-point through the F extension. This extension specifies a new register file (FRF) with FLEN-bit wide registers, and a set of operations to perform single precision floating-point (and related) operations. It builds on the 2008 revision of the IEEE-754 standard and includes basic operations (addition, subtraction, division/square root, conversions, …), but also the Fused Multiply-Add (FMA) which was not present in the original version of the IEEE-754 standard (1985).RISC-V Architects made the choice of assigning a dedicated register file for floating-point operands and results of floating-point operations. We covered this in a previous post:Compared to a unified integer1/floating-point register file, this choice implies additional cost for supporting Floating-Point as it adds 32 extra registers and the need for dedicated floating-point load/store and data moves between FRF and XRF; at the same time it simplifies register allocation in programs and provide more flexibility to assign architectural storage to floating-point data operands without competing for the same storage resources as the general purpose operations. The separate register files offer an extra layer of flexibility: general purpose register width, XLEN, can differ from floating-point register width, FLEN; allowing to tune the register size for the actual workload needs.

§2 Human · 0%

For example RV32 + D requires 32 x 32-bit general purpose architectural registers (well actually “31 x”, since x0 is free) and 32 x 64-bit floating-point registers; similarly RV64 + F requires 31 x 64-bit general purpose registers and 32 x 32-bit floating-point registers: you just pay for what you need.The base floating-point extension to the RISC-V ISA, the F extension, specifies format operations which make implementations compliant with the IEEE 754 standard (in particular its 2008 revision).The IEEE-754 standard is the most widely accepted floating-point arithmetic standards for CPUs, so selecting it appear as a natural choice for RISC-V. You can find a brief history of the IEEE 754 standard(s) in this post: The F extension defines RISC-V support for single-precision format (IEEE-754’s binary32, sometimes dubbed FP32) and an associated of general operations (including conversions from and to integer formats). RISC-V floating-point support is built around IEEE-754 formats.Historically, the second Floating-Point extension is the D extension, bringing scalar support for double precision format (IEEE-754’s binary64) and instructions operating on it.A Quadruple Precision (IEEE-754’s binary128) extension, dubbed Q, has also been specified. It seems it has seen only limited interest from the RISC-V community and limited adoption.The original set of F, D, Q extensions has since been extended with support for half precision (IEEE 754’s binary16) which was added with the Zfh and Zfhmin instruction set extensions. Zfhmin is a strict subset of Zfh and only contains basic data moves and conversions. Note: One of the rationales behind Zfhmin is that it allows implementaters to chose to only support binary16 as a storage format and not bother with a full scalar support; saving hardware and still allowing computing on half precision datum after a promotion to binary32.The F extension (scalar support for single precision) is mandated for most of the other floating-point extensions.

§3 Human · 0%

For example, it is required to enable half precision support (Zfh or Zfhmin) or vector floating-point support (Zve32f onward).Later in this post, we will cover the IEEE-754 support on the vector side and also look at support for non-standard floating-point formats.The F extension has been extended by the Zfa extension which offers a few useful scalar floating-point operations. Zfa operations include a floating-point load immediate instruction (with 32 useful floating-point constants), a set of quiet floating-point comparisons, various rounding to integer values (in floating-point format) and a few other operations.We covered Zfa in more details in this post:Zfa is defined for the 3 standard formats (binary32, binary64, and binary16) with respective dependency on the F, D, and Zfh.Note: fli.h is defined if and only if Zfh or Zvfh are defined. Since Zvfh depends on Zfhmin, it means that Zfhmin is necessary but not sufficient for fli.h to exist. The rationale being that if they are no vector support fli.h is not very useful on its own, e.g. using fli.s makes more sense than fli.h followed by fcvt.s.h.Part of the extra cost of RISC-V floating-point support, namely the additional floating-point register file, can be avoided by selecting the Zfinx family of Floating-Point support for RISC-V. In this case, floating-point operations operate from general purpose registers. Zfinx should be read as “Z-F-in-X”, and indicates that the operations from the F extension are implemented using the XRF, a.k.a. the general purpose register file.RISC-V specifies Zfinx as the equivalent to the F extension (single precision support), Zdinx as the equivalent to the D extension, and finally Zhinx / Zhinxmin as the equivalents to Zfh / Zfhmin. Those extensions reuse the F / D / Zfh / Zfhmin encodings and remove a few instructions (namely floating-point loads, stores and moves between register files) which would be redundant with existing general purpose instructions.

§4 Human · 0%

Note: Zfinx does not specify NaN boxing when a 32-bit F value is stored in a XLEN=64 RV64 register; it mandates sign extension (same as for integer values). The baseline vector support for floating-point in RISC-V comes with RVV 1.0 which specifies support for single and double precision support. The support is even more comprehensive than on the scalar side:RVV 1.0 defines vector variants for all existing scalar instructions, but also instructions without scalar counterparts:Widening addition/subtraction/multiply/multiply-accumulate instructionsReciprocal and reciprocal square-root estimate instructionsNarrowing float-to-float conversions with rounding towards oddOne difference to notice is that contrary to their scalar counterparts, the vector multiply-accumulate instructions are destructive: one of the operand is overwritten as the destination. Multiple variants are defined with different “destructive” scheme:Note: Both vfmacc and vfmadd implement FMA-like semantic: fused multiply-add with a single final rounding.The vector extension also defines vector-specific operations: floating-point reductions (sum/min/max), vector-scalar/vector-vector variants (including reverse vector-scalar operations such as vfrdiv.vf and vfrsub.vf).Later extensions, Zvfhmin and Zvfh, added large (resp. minimal) vector support for half precision / binary16. More recently minimal support for BFloat16 was added with Zvfbfmin (conversions) and Zvfbfwma (widening multiply-accumulate BF16.FP32). A few non standard floating-point formats have made their apparition into ratified or under-specification RISC-V extensions. Some formats inherit from IEEE 754 patterns (e.g., BFloat16) others differ more widely (e.g., OpenCompute’s OFP8).The first one is BFloat16 (a.k.a. BrainFloat16). This 16-bit floating-point format was never officially listed in any IEEE 754 specifications (at least up to the 2019 revision) but draws from the standard pattern: same encodings, same bias defintion, same special values.

§5 Human · 0%

The original definition2 corresponds to a truncation of IEEE 754 binary32 (keeping only the upper 16 bits) and the original specification of operation on BFloat16 numbers specifies that subnormal numbers should be flushed to zero. RISC-V does use the standard encoding definition but mandate full subnormal support (adhering to IEEE 754 floating-point arithmetic mandate).On the scalar side, RISC-V ISA can be extended with Zfbfmin (spec source) which defines basic data moves and BFloat16 conversions from/to single precision.On the vector side, RISC-V can be extended with either basic data moves and conversions with Zvfbfmin or with widening multiply-accumulate with Zvfbfwma.Those extensions were covered in a previous post:Those 3 extensions are ratified and listed as optional into the user level RISC-V Application profile, RVA23U64. Thus, it can be expected that many RISC-V implementation will support them. A new extension bringing extended vector BFloat16 support ro RVV is making its way through RISC-V specification process: Zfvfbfa.The extension project is available as a pull-request against the official riscv-isa-manual repo. We covered this extension in this section from a previous post.Zvfbfa represents a very large step towards full vector BFloat16 support compared to Zvfbfmin and Zvfbfwma: it offers almost as wide of a support for BFloat16 as Zve32f does for binary32. The only excluded operations are “division, square root, reductions, and conversions to/from integers wider than 8 bits”.Note: at the time of writing (April 2026), there are no specified instructions to convert from half precision from/to BFloat16. Such conversions would have to go through single precision. It is assumed that such cases are rare and would not justify the opcode allocation. Similarly there are not mixed format operations, e.g. a widening product between half precision and BFloat16 with a binary32 result.With the growing interest for small precision formats (e.g. 8-bit floating-point), RISC-V has been extended to support other non-IEEE floating-point formats.