Skip to content
HN On Hacker News ↗

GitHub - dventimisupabase/pg_flight_recorder: Server-side performance telemetry for PostgreSQL 15+ - extracted from SA-Toolkit

▲ 16 points 5 comments by tanelpoder 2w ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

60 %

AI likelihood · overall

Mixed
22% human-written 78% AI-generated
SEGMENTS · HUMAN 0 of 4
SEGMENTS · AI 3 of 4
WORD COUNT 752
PEAK AI % 100% · §1
Analyzed
May 8
backend: pangram/v3.3
Segments scanned
4 windows
avg 188 words each
Distribution
22 / 78%
human / AI fraction
Verdict
Mixed
Pangram v3.3

Article text · 752 words · 4 segments analyzed

Human AI-generated
§1 AI · 100%

Server-side flight recorder for PostgreSQL. Answers "what was happening in my database?" View the project website pg_flight_recorder continuously samples PostgreSQL system state in the background via pg_cron -- no external agents, sidecars, or polling required. It captures wait events, active sessions, locks, WAL activity, checkpoints, I/O, table and index stats, query performance, replication state, and configuration changes. When something goes wrong, the data is already there. Architecture Flight Recorder collects two types of data:

System What it captures Frequency Retention

Sampled Activity Wait events, sessions, locks 1 min Ring buffer: 2h, Archives: 7d

Snapshots WAL, checkpoints, I/O, tables, indexes 1 min 30 days

Data flows through ring buffers (hot, low-overhead) into durable archives and aggregates (cold, long-retention). Safety mechanisms -- circuit breaker, load shedding, per-section timeouts, and pg_cron job timeouts -- prevent the recorder from impacting production workloads. Extensions Two extensions, each published as a separate dbdev package:

Extension Schema Purpose README

pgfr_record pgfr_record Core: tables, collection, scheduling, ring buffers pgfr_record/README.md

pgfr_analyze pgfr_analyze Optional: reporting, anomaly detection, time travel pgfr_analyze/README.md

Requirements

PostgreSQL 15, 16, 17, or 18 pg_cron extension Superuser privileges for installation Optional: pg_stat_statements for query-level analysis

Quick start Download from GitHub Releases or clone the repo, then: # Install core + optional analysis

§2 AI · 100%

extension psql --single-transaction -f pgfr_record/install.sql psql --single-transaction -f pgfr_analyze/install.sql -- Enable collection SELECT pgfr_record.enable();

-- Check health SELECT * FROM pgfr_record.health_check();

-- Generate a diagnostic report SELECT pgfr_analyze.report('1 hour'); Common workflows Daily monitoring SELECT * FROM pgfr_record.health_check(); SELECT pgfr_analyze.report('1 hour'); Incident response -- Switch to high-frequency collection SELECT * FROM pgfr_record.apply_profile('troubleshooting');

-- What was happening at a specific time? SELECT * FROM pgfr_analyze.what_happened_at('2024-01-15 14:32');

-- Reconstruct an incident timeline SELECT * FROM pgfr_analyze.incident_timeline( '2024-01-15 14:00'::timestamptz, '2024-01-15 15:00'::timestamptz );

-- Return to normal after incident SELECT * FROM pgfr_record.apply_profile('default'); XID / MultiXID wraparound monitoring -- Current XID and MultiXID ages at database level (from the latest snapshot) SELECT datfrozenxid_age, datminmxid_age FROM pgfr_record.snapshots ORDER BY captured_at DESC LIMIT 1;

-- Top 10 tables by XID or MultiXID age SELECT relid::regclass, relfrozenxid_age, relminmxid_age FROM pgfr_record.table_snapshots WHERE snapshot_id = (SELECT max(id) FROM pgfr_record.snapshots) ORDER BY greatest(relfrozenxid_age, relminmxid_age) DESC NULLS LAST LIMIT 10;

-- Wraparound anomalies (XID + MultiXID, cluster + per-table) SELECT anomaly_type, severity, metric_value, recommendation FROM pgfr_analyze.anomaly_report(now() - interval '1 hour', now()) WHERE anomaly_type LIKE '%WRAPAROUND%';

§3 AI · 100%

-- Tune thresholds (lower warning ratio to alert earlier on busy clusters) INSERT INTO pgfr_record.config (key, value) VALUES ('xid_warning_ratio', '0.25'), ('mxid_warning_ratio', '0.25') ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value; See REFERENCE.md for the full config key list and rationale (postgres-howto #0044 guidance). Performance analysis -- Find performance regressions SELECT * FROM pgfr_analyze.detect_regressions('1 day');

-- Find query storms SELECT * FROM pgfr_analyze.detect_query_storms('1 hour');

-- Table hotspots SELECT * FROM pgfr_analyze.table_hotspots(now() - '1 day', now());

-- Unused indexes SELECT * FROM pgfr_analyze.unused_indexes('7 days'); Capacity planning SELECT * FROM pgfr_analyze.capacity_summary('7 days'); SELECT * FROM pgfr_analyze.quarterly_review(); SELECT * FROM pgfr_analyze.capacity_dashboard; Configuration profiles Profiles are pre-configured settings for different environments:

Profile Sample Interval Use Case

default 60s General purpose monitoring

production_safe 300s Production with maximum safety margins

development 60s Staging and development

troubleshooting 60s Active incident response

minimal_overhead 300s Resource-constrained systems

SELECT * FROM pgfr_record.list_profiles(); SELECT * FROM pgfr_record.explain_profile('production_safe'); SELECT * FROM pgfr_record.apply_profile('production_safe'); Safety Flight Recorder includes automatic protections:

Protection Description

Circuit Breaker Skips collection if recent runs averaged > 1s

Load Shedding Skips collection when > 70% connections active

Section Timeouts Per-query timeout (250ms) prevents catalog lock hangs

Job Timeouts Outer statement_timeout on pg_cron collector jobs (500ms–60s)

Collection modes provide manual control: normal, light, emergency.

§4 Mixed · 38%

-- Reduce to minimum collection (300s sampling, locks/progress off) SELECT pgfr_record.set_mode('emergency');

-- Full stop: unschedule all pg_cron jobs SELECT pgfr_record.disable();

-- Resume SELECT pgfr_record.enable(); Export With default retention: ~2.5GB uncompressed, ~150MB compressed. # Without compression pg_dump -d your_database -n pgfr_record --data-only -f pgfr_data.sql

# With compression (PostgreSQL 16+) pg_dump -d your_database -n pgfr_record --data-only --compress=gzip:9 -f pgfr_data.sql.gz

# With compression (PostgreSQL 15) pg_dump -d your_database -n pgfr_record --data-only | gzip > pgfr_data.sql.gz Upgrade Re-running install scripts is safe -- they use CREATE OR REPLACE and IF NOT EXISTS, updating functions and views while preserving all data. psql --single-transaction -f pgfr_record/install.sql psql --single-transaction -f pgfr_analyze/install.sql Uninstall # Remove everything (stops jobs, drops all schemas and data) psql --single-transaction -f pgfr_record/uninstall.sql

# Remove only reporting functions (keeps core + data) psql --single-transaction -f pgfr_analyze/uninstall.sql Testing ./test.sh # Test all PostgreSQL versions in parallel (requires Docker) ./test.sh 17 # Test a specific PostgreSQL version (15, 16, 17, or 18) Reference See REFERENCE.md for the full function reference, table schemas, configuration settings, and detailed documentation.