Announcing Pyro Caml: The First Continuous Profiler for OCaml | Semgrep

S semgrep.dev ↗

▲ 26 points • 0 comments • by j12y • 3w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,898

PEAK AI % 0% · §2

Analyzed

Jun 2

backend: pangram/v3.3

Segments scanned

5 windows

avg 380 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,898 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

The core SAST engine of Semgrep is written in OCaml. There are a lot of good technical and historical reasons for this that I’ll leave for another time. An important consequence of using a language with a (relatively) small ecosystem like OCaml is that there aren’t a lot of libraries for things like observability, which are critical for running industrial software like Semgrep on hundreds of thousands of code repositories, and keeping it both reliable and performant. We’ve made heavy use of existing libraries like the OCaml OpenTelemetry library, and have contributed to, and written, some of our own. Last year I gave a workshop at FunOCaml explaining how we use and benefit from observability, and how you too can implement it in your OCaml program. After the workshop though, I had multiple people come up to me and ask “what about continuous profiling?”, to which my answer was, “it just doesn’t exist yet”. Well, 7 months later, I’m happy to announce that we’re releasing 1.0.0 of Pyro Caml, a continuous profiler for OCaml.What is Continuous Profiling?Before we dive into the technical details, we must first understand the difference between a normal profiler and one that is continuous. Profiling is a form of dynamic analysis that measures aspects of a program like time complexity, instruction usage, or where in the code time is spent. OCaml has a handful of profilers, such as the built-in ocamlprof, magic-trace, or olly, to name just a few. What differentiates a continuous profiler is that it is not run directly by the developer, but instead runs in production, continuously profiling the program, and reporting that data back to a central location.For us this distinction is incredibly important. We had gotten by with the other profilers mentioned, along with more general profilers like prof, but that only got us so far. Semgrep runs static analysis on code, and in general, we try to avoid giving our engineers easy access to the user code we analyze. So if we can’t get a local copy of the source code to profile on our own machines, continuous profiling becomes the only option. Additionally, metrics and tracing only help root out performance problems when you know where to look, and that becomes rarer as a code base matures.

§2 Human · 0%

We must profile while customers scan their code, or else we are destined to live in the dark.* Requirements for a Continuous ProfilerSo we know a continuous profiler is an incredibly helpful tool, but Semgrep has some additional restrictions.Runs under gVisorWe’re a security company, so we like to keep things secure, which means when we scan someone’s code, we sandbox the scan using gVisor, which implements the Linux API in userspace. One part of the Linux API it does NOT implement, is perf_event_open**, which is how profilers like prof work, and is what some continuous profilers are built on (like ddprof). Our first attempt used one of these tools. Everything worked perfectly, even in our test environment, and then when we deployed to production, we got some nasty errors about how this system call just did not work. Eventually we figured out it was gVisor, and although we were bummed it didn’t work, I was secretly excited that this meant I might get to write my own profiler.Supports OCamlThere are some very nice continuous profilers out there that integrate well with a language’s runtime (and therefore don’t use perf_event_open), such as Pyroscope or Datadog’s Python Profiler. None exist for OCaml. At the time, we knew that if we were to build our own, we'd want to build on an open source standard so we could make it useful for the community (and so we wouldn’t have to write as much tooling from scratch). Notably, Pyroscope’s SDK is open source whereas Datadog’s is not.MaturityOpenTelemetry also had a (at the time pre-alpha) profiling specification, along with a profiler kindly donated by Elastic Search. This profiler is very, very cool, and works via eBPF programs that would decode stack traces from raw memory, including interpreted languages like Python and Ruby. It’s some crazy stuff, and it’s worth digging around in that repo if you find this post interesting. Running this locally with an OCaml program resulted in some very mysterious stack traces, and we soon realized to get useful info out of it, we’d have to write our own eBPF program for walking OCaml stacks.

§3 Human · 0%

Though that does sound like fun, the OTel signal was also pre alpha, along with the rest of the infrastructure, so even if we did tame eBPF, there were still a bunch of what-ifs. Additionally, we weren’t even sure if this would run in gVisor (after further research, we found out it does not). So this wasn’t a strict no, but a very risky and difficult project if we went down this path.Performance and SafetyFinally, a continuous profiler needs to be performant and safe! If a profiler significantly impacts the runtime of your program, then you can’t run it in production easily. The existing OCaml profilers are solid tools but they impose significant overhead, which makes them poor starting points. If you’re trying to figure out what’s making your program slow and your profiler adds an 80% overhead… remove the profiler. In a perfect world, a profiler adds no overhead, but we’re willing to accept ~5%. Additionally the profiler must be safe, as in if it fails (whether that be walking stacks, reporting data, etc.), it should not affect the correctness of your program.Enter Pyro Caml At this point, it was obvious nothing off the shelf would really work for us, and our options to build our own were integrating with Pyroscope or the OTel profiler. I already listed the risks with the OTel profiler, so what about Pyroscope? Their Rust SDK is nice, and is set up to accept data from novel sources, not to mention that the backing infrastructure is something we can easily setup in a Grafana instance we already have. So the infrastructure part is trivial (unlike the OTel profiler)and the SDK is nice – we just needed a way to gather the performance data without perf_event_open and actually get it over to the Pyroscope SDK.What follows is the architecture of Pyro Caml:Call Stack Sampling via MemprofFirst, we needed a way to get the call stack of the currently running program. There are a few ways to go about this, such as looking at the raw memory and using something called DWARF symbols, similar to how eBPF and prof do it. Once upon a time I actually did something like this for our library OBackward, which provides pretty backtraces on segfaults for OCaml programs.

§4 Human · 0%

The TL;DR; is that sort of code is the complete opposite of portable, and although the compiler does a good job of generating DWARF symbols, the backtraces are nowhere near as helpful as what’s built into the runtime.So the other option is using what’s built in, specifically Printexc.get_callstack, which is a function in the OCaml standard library that returns the current callstack. But a problem arises here: even if we have a nice way to get this over to the SDK, how do we easily instrument programs with this? A nice way would be to write a PPX (OCaml macro) that adds the code everywhere you want, which is what the Landmarks library does. Unfortunately, unless this code is faster than most of your code, you run into an issue where repeated calls to a fast function that’s instrumented, results in that code path taking longer because the instrumentation takes awhile. This means you need to be careful with these PPXs, and so you must also then know what parts of your program are slow or fast, at which point the profiler doesn’t satisfy our requirements.This is why many profilers are statistical sampling profilers. Instead of measuring when a function starts and stops, along with its callstack, it just samples at a set frequency. This means that the profiler overhead is constant, instead of a function of how many times it’s called. The results end up being a little less accurate, but it’s more than enough to get by.So how can we do this with OCaml, when we don’t have access to eBPF/perf_event_open, and would rather not introspect the raw memory? We can be a bit tricky and use the built in memory profiler for OCaml, called Memprof. Memprof runs a set of callback functions on a configurable percentage of allocations. This is nice because we know for a program like Semgrep, we spend pretty much all of our time doing pure CPU work, which involves a lot of allocating.This means we’re sampling at some frequency of allocations, not time. So we’re still liable to get inaccurate profiles and introduce overhead if a function allocates more than another, which is a problem we’ll treat in the next section. Lastly, one small bonus is that the Memprof callback provides a callstack of where the code was allocated already! So one less thing for us to do.

§5 Human · 0%

In fact, the Memprof callbacks run on their own stack, so if this wasn’t the case, we’d have to be even trickier.Emitting Profiling Events via the OCaml Runtime EventsNow that we had a bunch of samples, we had to do two things. First, we needed a way to interpret the samples such that we had samples at a regular interval, instead of whenever the Memprof sampler decided to run. Then we needed to actually call into the Pyroscope SDK, and send the samples off to our infrastructure. As mentioned before, the longer the sampling process takes, the more likely we are to skew our data, and networking/interpreting samples could take a long time. So if we got these samples written to disk quickly, they could be processed and sent to another program, minimizing our overhead.OCaml 5 introduced Runtime Events, which are exactly what we want!> [...] the runtime events tracing system which enables continuous extraction of performance information from the OCaml runtime with very low overhead.This lets us write events to a file backed ring buffer and read it out from another program. It’s similar in kind to the Java Flight Recorder, or Go’s builtin runtime tracing. So now we can write these samples and read them from another OCaml program.To help us keep track of the samples, we also recorded in these events the time at which they were taken. The corresponding profiler program read these events and started to process them. Since the samples were not guaranteed to be taken at a set time interval, what we did was try to generate as many as possible, and then choose whichever was closest to the proper sample interval. This meant for a given sample interval, we had many possible samples to choose from, which allowed us to choose a sample timestamped sufficiently close to the single point in time we wanted to generate a complete callstack for. The downside here was that for programs that didn't emit many samples, we lost accuracy for function calls that lasted less than the time of the sample interval, and may not have received samples for intervals of time that didn't allocate often.Consider this flame graph of an example program (read this as, func_a calls func_b, which in turn calls func_d). Assume that we are sampling every 10ms (the default for Pyro Caml):Say we're sampling at time t+10.