Skip to content
HN On Hacker News ↗

Parse, don't validate through the years with C++

▲ 86 points 48 comments by dwrodri 3w ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

10 %

AI likelihood · overall

Human
100% human-written 0% AI-generated
SEGMENTS · HUMAN 7 of 7
SEGMENTS · AI 0 of 7
WORD COUNT 1,712
PEAK AI % 21% · §4
Analyzed
Apr 30
backend: pangram/v3.3
Segments scanned
7 windows
avg 245 words each
Distribution
100 / 0%
human / AI fraction
Verdict
Human
Pangram v3.3

Article text · 1,712 words · 7 segments analyzed

Human AI-generated
§1 Human · 1%

Alexis King’s Parse, don’t validate had a huge impact on how I write code, particularly my stance toward Python type annotations in production. However, as someone who has written practically zero Haskell, the idea didn’t click for me until I started seeing other examples, like this one in Rust. This post explores how we can take this paradigm and apply it to a simple date-parsing problem in C++98, C++11, C++17, and finally C++23.If you haven’t read the original essay, it is a great explainer for a genuinely tricky topic. Here’s my attempt at summarizing the key idea: Use your language’s type system to parse unstructured inputs. A successful instantiation of that type should mean the data is valid, and you eliminate the need for validation control flow down the line. I’ve revised this summary a few times and I still don’t love it. Likely a mix of skill issue and the fact that the idea is very hard to stuff into the small box of two sentences without relying on more jargon, which I am still using.Our Worked Example: Parsing a dateTimestamp and date parsing are notoriously riddled with edge cases. That being said, much like validating email addresses, it’s a perfect example of a problem where a programmer has to make a contextually optimal tradeoff between correctness and readability.In this case, let’s imagine we’re building something that consumes unstructured data from a file, and we need to extract personally identifiable information. For the purposes of this post, I’ll be ingesting a raw string, raw_input, that needs to be used by the rest of the code.Starting pointLet’s start with a dead simple implementation that is mostly apathetic to the idea of type-driven design.

§2 Human · 0%

See it in Compiler Explorer#include <cstdio> struct Birthdate { int year, month, day; };

// perhaps if it were more C-like we'd return a status code // to make everyone upset i settled on this Birthdate make_birthdate(const char* user_input) { Birthdate b = {0, 0, 0}; std::sscanf(user_input, "%d-%d-%d", &b.year, &b.month, &b.day); return b; }

int main() { // imagine this is the text we're reading from a file const char* file_text = "2026-04-17"; Birthdate b = make_birthdate(file_text); std::printf("Year: %d Month: %d Day: %d\n", b.year, b.month, b.day); return 0; } You can change quite a few characters in that instance of file_text being passed into the function and the program won’t crash. clang will happily compile, the OS will happily report that the program exited successfully. Imagine we’re handling input coming from an OCR pipeline and it was instead something like 2O26-04-I7. We’d get the following output from printf:Year: 2 Month: 0 Day: 0 This code is problematic because we are kicking the verification can down the road. This program will accept that example faulty input just fine, return a struct with data no one wants, and create headaches for everyone who needs to use make_birthdate, despite “working”. Say we later need to implement a User class with a Birthdate member field and a User::getAge() method. Does the User constructor validate the Birthdate fields? Does getAge()? Both of these are likely bad ideas. We could probably ask an LLM of our choosing to unload a bucket of if-statements into make_birthdate() and make it much more robust, but likely at the expense of readability.

§3 Human · 5%

First pass: C++98Instead of doing any of the above, let’s make a Birthdate type which enforces some sane constraints on what a valid birthdate should: Have an integer month value between 1 and 12 inclusive Have a four-digit year value between 1900 and 9999 Have a day value between 1 and 31 inclusive, where the upper bound is based on month and year to account for leap years If you’re bothering with C++98 in this day and age, it is very likely you’re running in an embedded environment where heap allocations are few and far between and exceptions can’t be used: sscanf replaced with our own parsing a private constructor only used for setting fields key logic is easily moved out to functions we can mark static to show no internal state See It In Compiler Explorer#include <cstdio> #include <cstring>

enum ParseStatus { PARSE_OK = 0, PARSE_NULL_INPUT, PARSE_BAD_FORMAT, PARSE_YEAR_RANGE, PARSE_MONTH_RANGE, PARSE_DAY_RANGE };

namespace { const unsigned char kDaysInMonth[12] = { 31,28,31,30,31,30,31,31,30,31,30,31 }; }

class Birthdate { public: // There are a few ways to let API callers bring their own // memory, as they would in a no-malloc environment and this // stack-friendly c'tor is a stand-in for that. static Birthdate epoch() { return Birthdate(1900, 1, 1); }

unsigned short year() const { return y_; } unsigned char month() const { return m_; } unsigned char day() const { return d_; }

static ParseStatus parse_iso_yyyy_mm_dd(const char* s, size_t s_len, Birthdate& out) { if (!s) return PARSE_NULL_INPUT; if (s_len != 10) return PARSE_BAD_FORMAT; if (s[4] !

§4 Human · 21%

= '-' || s[7] != '-') return PARSE_BAD_FORMAT;

unsigned int y = 0, m = 0, d = 0; if (!parse4(s, y) || !parse2(s + 5, m) || !parse2(s + 8, d)) { return PARSE_BAD_FORMAT; }

if (y < 1900U || y > 9999U) return PARSE_YEAR_RANGE; if (m < 1U || m > 12U) return PARSE_MONTH_RANGE;

unsigned int max_day = kDaysInMonth[m - 1U]; if (m == 2U && is_leap((unsigned short)y)) max_day = 29U; if (d < 1U || d > max_day) return PARSE_DAY_RANGE;

out = Birthdate((unsigned short)y, (unsigned char)m, (unsigned char)d); return PARSE_OK; }

private: Birthdate(unsigned short y, unsigned char m, unsigned char d) : y_(y), m_(m), d_(d) {}

static bool is_digit(char c) { return c >= '0' && c <= '9'; }

static bool parse2(const char* p, unsigned int& out) { if (!is_digit(p[0]) || !is_digit(p[1])) return false; out = (unsigned int)(p[0] - '0') * 10U + (unsigned int)(p[1] - '0'); return true; }

static bool parse4(const char* p, unsigned int& out) { if (!is_digit(p[0]) || !is_digit(p[1]) || !is_digit(p[2]) || !is_digit(p[3])) return false; out = (unsigned int)(p[0] - '0') * 1000U + (unsigned int)(p[1] - '0') * 100U + (unsigned int)(p[2] - '0')

§5 Human · 6%

* 10U + (unsigned int)(p[3] - '0'); return true; }

static bool is_leap(unsigned short y) { return (y % 400U == 0U) || ((y % 4U == 0U) && (y % 100U != 0U)); }

unsigned short y_; unsigned char m_; unsigned char d_; };

int main() { const char* file_text = "2026-04-17"; Birthdate b = Birthdate::epoch(); ParseStatus status = Birthdate::parse_iso_yyyy_mm_dd(file_text, std::strlen(file_text), // we do a little stdlib cheating b); if (status == PARSE_OK) { std::printf("Parsed: %u-%u-%u\n", (unsigned)b.year(), (unsigned)b.month(), (unsigned)b.day()); } else { std::printf("Parse failed: %d\n", (int)status); } return 0; } Giving the API caller more control over memory while keeping the class’s internal state “locked down” was a welcome exercise in API design. This version still has no exceptions, no heap allocations, and no post-hoc validation branches elsewhere. Birthdate values entering the rest of the program are already parsed and known-good. For further reading on exceptions, I would recommend: The original document pitching exceptions in ‘89 This document covering the history of C++, including some extra context as to how exceptions made it in This retrospective looking back at their use over the years Round 2: C++11Hopefully, by now, this is the C++ feature set that is mostly taught in schools, disgusting std::vector<bool> warts included. For this code snippet, I can offload proper date parsing to std::get_time! We rolled it ourselves once, but we weren’t handling all the proper edge cases and it’ll make our examples much shorter. Again, the real point is to have the class only contain valid states, and use the construction code as a hard boundary for parsing logic.

§6 Human · 7%

If code in our codebase consuming this type breaks, it shouldn’t be breaking due to the contents of the class instance at all. Style notes: I think the istringstream is a better stand-in for raw input, but I never liked working with streams. fmt was a clear improvement for most string manipulation. It felt more idiomatic to have a “boring” public constructor. Outside of all the gutted code replaced with imports and parsing, the key sequence of events doesn’t feel that different from C++98. See it in Compiler Explorer#include <ctime> #include <iomanip> #include <iostream> #include <sstream> #include <stdexcept>

class Birthdate { public: Birthdate(int y, int m, int d) : y_(y), m_(m), d_(d) { if (m_ < 1 || m_ > 12) throw std::invalid_argument("month must be 1..12"); if (d_ < 1 || d_ > 31) throw std::invalid_argument("day must be 1..31"); }

int year() const { return y_; } int month() const { return m_; } int day() const { return d_; }

private: int y_, m_, d_; };

Birthdate parse_birthdate(std::istream& in) { std::tm t = {}; in >> std::get_time(&t, "%Y-%m-%d"); if (in.fail()) throw std::invalid_argument("expected YYYY-MM-DD"); in >> std::ws; if (!in.eof()) throw std::invalid_argument("trailing characters"); return Birthdate(t.tm_year + 1900, t.tm_mon + 1, t.tm_mday); }

int main() { try { std::istringstream ss("2026-04-17"); Birthdate b = parse_birthdate(ss); std::cout << b.year()

§7 Human · 16%

<< "-" << b.month() << "-" << b.day() << "\n"; } catch (const std::exception& e) { std::cerr << "Parse failed: " << e.what() << "\n"; } } Round 3: C++17I imagine many engineers working with C++ in production environments generally have access to C++17. There are likely some notable exceptions (no pun intended), but this version stands out because it gives us a standard-library way to make failure explicit without throwing. Personally, I find this much nicer since I don’t love reasoning about control flow that sneaks past library boundaries the way exceptions often do.I’m also choosing to sneak manual parsing back into our example here since it lets us swap out string streams for a string view. The constructor has become a little less “default,” but you’d probably want a tight grip on those anyway if you’re going down this path.See it in Compiler Explorer#include <array> #include <iostream> #include <optional> #include <string_view>

class Birthdate { public: static std::optional<Birthdate> parse(const std::string_view s) { if (s.size() != 10 || s[4] != '-' || s[7] != '-') return std::nullopt;

const auto y = parse_n_digits(s, 0, 4); const auto m = parse_n_digits(s, 5, 2); const auto d = parse_n_digits(s, 8, 2); if (!y || !m || !d) return std::nullopt;

return from_ymd(*y, *m, *d); }

static std::optional<Birthdate> from_ymd(int y, int m, int d) { if (y < 1900 || y > 9999) return std::nullopt; if (m < 1 || m > 12) return std::nullopt;