GitHub - xtellect/fc: fc is a research-grade, lossless floating point compressor.

G github.com ↗

▲ 108 points • 33 comments • by enduku • 2w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

46 %

AI likelihood · overall

Mixed

67% human-written 33% AI-generated

SEGMENTS · HUMAN 1 of 7

SEGMENTS · AI 1 of 7

WORD COUNT 1,494

PEAK AI % 83% · §2

Analyzed

May 13

backend: pangram/v3.3

Segments scanned

7 windows

avg 213 words each

Distribution

67 / 33%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,494 words · 7 segments analyzed

Human AI-generated

§1 Human · 11%

Copyright (c) 2026 Praveen Vaddadi thynktank@gmail.com Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE. fc is a lossless compressor for streams of IEEE-754 64-bit doubles. It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result. Compression and decompression are multi-threaded with POSIX threads and the hot paths are hand-vectorized for x86-64 with AVX2 + SSE4.2 + BMI + LZCNT. The current version string is fc 1.56 (see fc_ver in fc.c). Representative numbers Aggregate result of the bundled test_fc harness across 17 synthetic datasets, 1 Mi doubles each (8 MiB per dataset, ≈136 MiB / 142.6 MB decimal total), median-of-5 trials per dataset, 8 encoder threads, decoder threads auto-selected, on the maintainer's x86-64 + AVX2 box. The "Ratio" column is total_original / total_compressed; the throughput columns are the harmonic mean of per-dataset MB/s. Numbers come straight out of test_fc.csv and will move on different hardware — re-run make test to regenerate.

Codec Ratio Encode (MB/s) Decode (MB/s)

fc 3.07 120 1277

zstd -3 2.07 528 1556

zstd -9 2.09 111 1572

fpzip 1.71 123 121

lz4hc -9 1.69 51 5835

gorilla 1.66 683 971

lz4 1.62 2176 5353

Headline: fc is the best size-oriented floating-point compressor in this benchmark, not the best all-around compressor.

§2 AI · 83%

It wins ratio on 10 of 17 datasets outright and is never worse than roughly third on any of them. Where fc wins (often by a lot):

Structured / analytic floats — constant ≈ 39,756× (vs. zstd-9 ≈ 11,619×), parabolic ≈ 2,973× (vs. 2.5×), int×1000 ≈ 5,388× (vs. 4.3×), piecewise ≈ 26.8× (vs. 14.9×), stocks ≈ 15.6× (vs. 7.1×). Periodic / quasi-periodic signals — sin-low-freq, audio-mix, ar2-damped beat gorilla and fpzip by 1.3×–2.7×. Decode is fast and parallel (auto-threaded inside fc_dec): aggregate ≈ 1.28 GB/s, ~10× faster than encode, ~10× faster than fpzip, ~1.3× faster than gorilla, and ~80% of zstd-3 decode. Good fit for write-once / read-many time-series stores.

Where it loses:

Byte-pattern-friendly quantized data — zstd wins on decimal-cents (zstd-9 ≈ 3,465× vs fc 268×), dict-16 (zstd-9 ≈ 10,268× vs fc 4,660×), and quantized-4lvl (zstd-9 ≈ 9,675× vs fc 1,037×). General-purpose LZ catches structure fc doesn't currently model. Noisy natural-float arrays — fpzip is marginally better on random-walk (1.39 vs 1.38), climate (1.395 vs 1.380), geo-coords (2.012 vs 2.000), and sensor-noisy (1.397 vs 1.386). Differences are small but consistent. Encode throughput — ~120 MB/s aggregate. The mode competition is what buys the ratio; you pay for it in encoder CPU.

§3 Mixed · 61%

If encode is the bottleneck, lz4 (~2.2 GB/s) or zstd-3 (~530 MB/s) dominate. Pseudo-random data — ratio ≈ 1.000. Same as every other lossless codec; fc just doesn't waste headers.

Hard constraints: lossless only (no quantization). The encoder takes 8-byte-aligned input in multiples of 8 bytes; the heuristics and most modes are tuned for IEEE-754 double-precision bit patterns, so other 8-byte payloads will compress but you may leave ratio on the table. Requires x86-64 with AVX2 + SSE4.2 + BMI + LZCNT. Rule of thumb:

Want the smallest file for a floating-point stream and decode-side latency matters → fc. Encode-bound on heterogeneous data → zstd or lz4. Noisy scientific arrays where every byte counts → benchmark fc vs fpzip on your data; they're within ~1% of each other.

Status This is a research-grade single-file library. The on-disk format is versioned by a magic number in the stream header; backwards-incompatible format changes will bump the magic. Treat compressed output as opaque — do not rely on internal layout. Repository layout

File Purpose

fc.h Public API (encode, decode, monitoring counters).

fc.c Compressor implementation. ~50 codecs + dispatch + threads.

gorilla.h Bundled Redis "Gorilla" XOR/delta codec — third-party.

gorilla.c Bundled Redis "Gorilla" XOR/delta codec — third-party.

test_fc.c Round-trip + benchmark harness over 17 synthetic datasets.

§4 Mixed · 30%

Makefile Builds fc.o, gorilla.o, and the test_fc benchmark.

LICENSE Apache License 2.0 (this project's own code).

NOTICE Attribution and third-party license disclosures.

CONTRIBUTING.md Contribution guide and DCO.

Public API The whole API is in fc.h: typedef struct { int p, t, c; } fc_cfg;

size_t fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg); size_t fc_dec(const void *src, size_t bytes, void *dst);

extern const char fc_ver[]; extern unsigned long fc_dec_mode_hist[64]; extern unsigned long fc_enc_mode_time_ns[64]; extern unsigned long fc_enc_mode_calls[64]; extern unsigned long fc_enc_mode_wins[64];

void fc_dec_mode_hist_reset(void); const char *fc_mode_name(int mode); fc_cfg fields

Field Meaning

p log2 of the predictor table size. Clamped to [10, 16] inside fc_enc. The test_fc benchmark passes 18, which is silently clamped to 16.

t Worker thread count for the encoder. The decoder takes no fc_cfg and auto-selects threads via sysconf(_SC_NPROCESSORS_ONLN), capped at 128 and at the per-stream block count.

c Reserved. Currently unused ((void)cfg.c; in fc_enc).

fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg) Compresses bytes of input (a multiple of 8 — one double per 8 bytes). Returns the number of compressed bytes written, or 0 on failure: NULL src / dst, bytes not a multiple of 8, or an internal allocation failure. The destination buffer must be sized by the caller. The library does not publish a closed-form comp_bound. The bundled benchmark allocates 2 * bytes + 64 KiB (test_fc.c), which has been sufficient for every dataset in the test suite; for arbitrary inputs, allocate at least that much and check the return value (the encoder will return 0 rather than overflow).

§5 Mixed · 45%

fc_dec(const void *src, size_t bytes, void *dst) Decompresses a stream produced by fc_enc. Returns the number of decompressed bytes written, or 0 on failure. Failure conditions: NULL src / dst, truncated header, wrong magic, header predsizelg2 outside [10, 16], a per-block header that claims more input or output than remains, total decoded size that doesn't match the header's original_bytes, or an internal allocation failure. The caller is responsible for sizing dst from the original byte count recorded in the stream header (use fc_dec once with a header read, or simply allocate the original buffer length you compressed). Note: unknown / unimplemented mode IDs are not currently treated as a hard failure — the affected block is silently left as zeros. Treat streams from untrusted sources accordingly. Monitoring counters The fc_*_mode_* arrays are 64-entry tables indexed by mode ID. They are updated atomically and are intended for diagnostics only:

fc_dec_mode_hist[m] — number of blocks decoded with mode m. fc_enc_mode_calls[m] — number of times mode m was evaluated during the encoder competition. fc_enc_mode_wins[m] — number of times mode m won and was written. fc_enc_mode_time_ns[m] — cumulative wall-clock nanoseconds spent evaluating mode m.

fc_mode_name(m) returns a short string for mode m (or "?" for unused IDs); see the table below. Modes fc currently defines 50 mode IDs (0–49 used; 12, 14, and 50–63 reserved). Mode names as exposed by fc_mode_name: 0 PRED 25 PRED2 1 CONST 26 PRED_ADAPTIVE 2 STRIDE 27 VITERBI 3 XORZ 28 DELTA_BINNED 4 LZ 29 PRED_RC 5 RAW 30 PRED_INTERLEAVED 6 FLOAT32 31 BWT 7

§6 Mixed · 41%

ORDERED_DELTA 32 LZ_DICT 8 FUZZY_STRIDE 33 CONV_N 9 ALP 34 SIGN_CONV 10 TRAILING_ZERO_BP 35 CONV_DOUBLE 11 BYTE_TRANSPOSE 36 MTF_LZ 13 XOR128 37 CONV_DOUBLE_BP 15 LSB_STRIP 38 CONV_N_BINNED 16 LOOKBACK_DELTA 39 PRED_SIMD_INTERLEAVED 17 FLOAT_MULT 40 FUZZY_STRIDE_ANS 18 FCM_RLE 41 PAQ_MIXER 19 DICT 42 PAQ4_MIXER 20 DELTA2 43 BWT_MTF_TANS 21 BITPLANE 44 PRED4 22 INT_MULT 45 DELTA_DP_BINNED 23 CONV1 46 CONV_N_DP_BINNED 24 PRED_TANS 47 ELF 48 LZ_SPLIT 49 BWT_MTF_RC

Roughly grouped:

Predictors: PRED, PRED2, PRED4, PRED_TANS, PRED_RC, PRED_ADAPTIVE, PRED_INTERLEAVED, PRED_SIMD_INTERLEAVED, VITERBI, LSB_STRIP — FCM/DFCM-style hash predictors with various residual coders (raw, tANS, range coding) and SIMD layouts. XOR / delta: XORZ, XOR128, LOOKBACK_DELTA, ORDERED_DELTA, DELTA2, DELTA_BINNED, DELTA_DP_BINNED. Constant / stride / dictionary: CONST, STRIDE, FUZZY_STRIDE, FUZZY_STRIDE_ANS, DICT, LZ_DICT, MTF_LZ. Lempel-Ziv: LZ, LZ_SPLIT.

§7 Mixed · 43%

Floating-point specific: FLOAT32, FLOAT_MULT, INT_MULT, ALP (Adaptive Lossless Floating-point), ELF (Erasing Last bits). Transforms: BYTE_TRANSPOSE, BITPLANE, TRAILING_ZERO_BP, SIGN_CONV, BWT, BWT_MTF_TANS, BWT_MTF_RC. Convolutional / linear models: CONV1, CONV_N, CONV_DOUBLE, CONV_DOUBLE_BP, CONV_N_BINNED, CONV_N_DP_BINNED. Mixers: PAQ_MIXER, PAQ4_MIXER, FCM_RLE. Fallback: RAW.

fc.c is the source of truth for what each mode actually does. How it works

Header. The stream begins with a fixed-size header containing a magic number, the original byte count, the quantum size used, and the clamped predictor log2. Block planning. Input is scanned and split into blocks. The default quantum is 256 KiB (C_QUANTUM_BYTES) but a probe (ceq_probe_chunk_values) can grow a block up to 1 MiB (C_QUANTUM_MAX_BYTES) when the data looks low-entropy. Mode competition. Workers pull blocks from a queue. For each block, the encoder evaluates a feature-gated subset of the modes (exp_range, sign_flips, distinct_count, looks_like_repeats, and the running best decide which modes are worth trying) and keeps the smallest output. Per-mode wall time, call count, and win count are recorded in the fc_enc_mode_* counters. Emission. Each block is written as [1-byte mode][payload], with length and offset bookkeeping in the stream so fc_dec can find block boundaries without scanning the payload. Decode. fc_dec walks the block index and dispatches each block's payload to the matching codec. Like encode, decode is multi-threaded; fc_dec itself picks the worker count automatically (online CPUs, capped at 128 and at the block count).