Static Devirtualization of Themida

B back.engineering ↗

▲ 29 points • 2 comments • by homarp • 3w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with some AI-assisted content detected

10 %

AI likelihood · overall

Mixed

87% human-written 0% AI-generated

SEGMENTS · HUMAN 4 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,661

PEAK AI % 42% · §5

Analyzed

Jun 6

backend: pangram/v3.3

Segments scanned

5 windows

avg 332 words each

Distribution

87 / 0%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,661 words · 5 segments analyzed

Human AI-generated

§1 Human · 1%

Table of ContentsIntroductionThemida Architecture AnalysisWarning To The WiseGuided Symbolic EvaluationConcretizing Stack PointerOptimizationsConstant Promotion & Memory ModelingConstant FoldingDead Store EliminationInstruction CombinationBranch FoldingVMEXIT BehaviorVirtualized Control FlowDead Dependency Analysis PassStack Pointer Rewrite PassLowering IRResultsPreventing Symbolic EvaluationIntroductionBefore reading this article I highly recommend studying the following community research on binary deobfuscation.https://arxiv.org/pdf/1909.01752https://github.com/Colton1skees/Dna/pull/8https://github.com/JonathanSalwan/VMProtect-devirtualizationhttps://github.com/NaC-L/Mergenhttps://www.youtube.com/watch?v=3LtwqJM3Qjghttps://github.com/backengineering/vmp2https://back.engineering/blog/17/05/2021/https://www.youtube.com/watch?v=vYAJCfafYTYhttps://www.youtube.com/watch?v=KYQOtGiH9pQhttps://github.com/r3bb1t/bin_lifthttps://nac-l.github.io/2025/01/25/lifting_0.htmlhttps://blog.thalium.re/posts/llvm-powered-devirtualization/https://github.com/avast/retdechttps://github.com/ergrelet/themida-unmutatehttps://github.com/lifting-bits/remillThis article demonstrates devirtualization of CodeVirtualizer/Themida protected code, however the techniques described here apply to pretty much every virtual machine based obfuscator. Only requiring some minor modifications to support each of them. The following is a non-exhaustive list of obfuscators that can be reduced using the technique described in this article.https://vmpsoft.com/https://www.oreans.com/themida.phphttps://github.com/vxlang/vxlang-pagehttps://github.com/snowsnowsnows/EagleVMhttps://github.com/dmaivel/covirthttps://github.com/noahware/binprotectThemida Architecture AnalysisThemida’s virtual machine architecture differs from VMProtect primarily in its support for nested virtualization.

§2 Human · 1%

This is made possible by the fact that the VM context and virtual stack live inside the binary itself rather than on the native stack as they do in VMProtect. This article will not go deep on the architecture since it is largely not relevant to the devirtualization approach. The only VM-specific components that matter here are virtual branching and VMEXIT behavior, both of which are covered in their own sections. For a thorough breakdown of the Themida architecture, see this research.Warning To The WisePattern matching VM handlers back to x86 instructions is not an approach I recommend. I have tried it, and it does not scale. Any small change the protector vendor makes to handler layout, opcode tables, or dispatch logic can silently break your tooling across an entire version range. The more your implementation depends on VM-specific behavior, the more fragile it becomes.The approach presented in this article deliberately minimizes VM-specific knowledge. That is what makes it work across a wide range of Themida versions. That said, studying the VM architecture is still worthwhile, not to pattern match against it, but to orient yourself within it and make informed decisions about how to guide the symbolic evaluation engine.The vast majority of devirtualization work is done by a handful of general optimizations. VM-specific knowledge only becomes necessary when dealing with control flow, specifically virtual branching and virtualized calls.Guided Symbolic EvaluationThe core idea is to lift native instructions into a malleable intermediate representation and drive the lifting process forward by concretizing control flow as optimizations resolve unknown branch destinations. Back Engineering Labs maintains its own binary lifting and recompilation engine for this purpose called BLARE2. It sports a custom SSA IR with support for AMD64 and ARM64, along with a full pass system, optimizer, instruction selector, register allocator, and linker. That last part is what separates it from most lifting frameworks: BLARE2 can lower optimized IR back to native code and reinsert it into the binary, producing output that is near 1:1 with the original. Anyone looking to follow the techniques in this article can get most of the way there with Triton or an LLVM-based lifter like Remill. Both are capable of producing clean optimized IR. The gap is on the backend: getting LLVM to emit tight, well-behaved native code that reinserts cleanly.Lifting starts with all registers and flags symbolic.

§3 Human · 4%

From there, instructions are disassembled and lifted until the next instruction pointer cannot be determined. What happens next depends on the control flow instruction. A lifted ret means the last store to RSP is the next IP. When an address genuinely cannot be concretized, it means one of two things: either the optimizations have not run far enough, or the branch has multiple real destinations, as is the case with a virtualized JCC.Concretizing Stack PointerAt the start of symbolic evaluation, all registers and flags are symbolic except for the stack pointer, which is given a concrete initial value. This is a deliberate design choice rather than a strict requirement. Keeping RSP concrete means the existing load/store propagation machinery handles stack accesses automatically, and any arithmetic that adjusts the stack pointer can be constant folded without any special casing. The alternative is keeping RSP symbolic and writing dedicated stack propagation logic, which is more work for no meaningful gain in the context of devirtualization.The tradeoff is that functions with dynamic stack allocations, think alloca or compiler-generated variable-length arrays, are not supported by this approach since the stack displacement is no longer statically knowable. In practice this is rarely a problem. Dynamically allocated stack frames are uncommon in the kinds of functions that tend to get virtualized, so the simplicity of a concrete RSP is worth the limitation.OptimizationsFully reducing Themida or VMP virtualization does not require an exhaustive suite of compiler optimizations. In practice, a small set of passes running together to convergence is enough to collapse the entire VM scaffolding. The following sections cover each one and explain how it contributes to devirtualization.The passes are not independent. A bytecode load gets promoted to a constant, which lets the decryption arithmetic around it fold away, which produces a concrete handler index, which lets the handler table lookup resolve, which exposes the next handler address as a constant. Each pass feeds the next, and the VM scaffolding unravels as a consequence of all of them running together.Constant Promotion & Memory ModelingData loaded from memory frequently feeds into indirect jump computations, and VM bytecode is the most important example of this. When the lifter encounters a load from a bytecode address, that value needs to be promoted to a constant so the rest of the optimization passes have something concrete to work with. Once a bytecode load is promoted, constant folding can run on the decode arithmetic that surrounds it.

§4 Human · 19%

The handler decryption logic, the opcode table indexing, the VPC update math, all of it progressively folds away until the only thing left is a concrete handler address. That address is then used to continue lifting.The load store propagation logic in BLARE2 is configurable. A programmer specifies which memory ranges inside the binary are safe to promote from, which keeps VM-private constant promotion from accidentally touching user data. The pass also tracks prior stores, so if a store to address 0x5000 occurs and a load of 0x5000 follows, the SSA value from that store is forwarded rather than pulling from the original image. Propagation is modeled at the byte level, so overlapping stores are handled correctly. A narrow store that partially overlaps a wider load is composed properly rather than silently producing a stale or incorrect value. There are two failure modes worth keeping in mind. Promoting a load from an address that gets written before the load is reached will produce incorrect results, which is why store tracking exists. The other risk is over-promotion: if a load reflects original program semantics rather than VM scaffolding, replacing it with a constant destroys those semantics. The configurable range policy is what separates the two cases.Constant FoldingWhen all operands of an expression are known constants, the expression itself can be replaced with its result. There is no reason to carry 10 + 10 through the IR when it can just be 20. This applies to all binary and arithmetic operations: addition, subtraction, multiplication, bitwise AND, OR, XOR, shifts, and so on. For devirtualization, the important detail is that this pass needs to run until convergence. A single pass may fold one expression into a constant that then makes a previously non-constant expression fully constant, which enables another fold, and so on. Each optimization pass feeding into the next is what causes the VM scaffolding to progressively collapse. Bytecode decode arithmetic folds away, handler table indices become concrete, and dispatch logic disappears, all as a consequence of constant folding running to a fixed point alongside the other passes.Dead Store EliminationDead store elimination is generally unsafe to apply broadly. A store that looks unused may have real side effects: a kernel routine writing to the same MMIO address twice can trigger distinct hardware actions, and exceptional control flow can observe stores that appear dead along normal paths.

§5 Mixed · 42%

Blindly removing stores breaks things. The reason it is safe here is that the stores we are targeting are scoped to VM-private memory. Themida uses its own section for the virtual machine context, virtual stack, and related scaffolding. None of that memory is observed by the original program. Once lifting has reached a VMEXIT, any store that only ever touched the Themida section is provably dead from the perspective of the recovered function and can be removed. Skipping this pass has a visible cost. VM handlers constantly shuffle state through the context and virtual stack, and without elimination those stores persist as dangling expressions in the IR that have no consumer and no path to a native-visible output. Combined with the dead dependency analysis pass, this is what produces IR that actually looks like a function rather than a VM interpreterInstruction CombinationInstruction combination simplifies expressions by recognizing algebraic identities and collapsing operations that have knowable outcomes regardless of their inputs. The goal is to reduce the IR down to the smallest expression that preserves the original semantics.These identities matter for devirtualization because VM handlers are full of this kind of noise. Obfuscated code frequently introduces arithmetic that cancels itself out, redundant masking operations, and identity multiplications inserted purely to obscure intent. Instruction combination running to convergence progressively peels those layers away, and the simplified expressions it produces feed directly into constant folding and branch folding. An expression that looks complex after lifting often reduces to a single constant once a few of these rules have fired.Branch FoldingWhen the preceding optimizations have done their job, flag computations should resolve to either a constant or undefined. Any branch that depends on constant flags has a statically knowable destination, so the opaque branch target can be eliminated entirely.VMEXIT BehaviorBecause the stack pointer is concretized at the start of lifting, the initial RSP value is always known. When the lifter encounters a return instruction with RSP at initRSP - 0x10, that is a VMEXIT-CALL. Themida and VMP both use this pattern: the call target is placed at RSP and the return address is placed at RSP + 0x8, accounting for the 0x10 displacement.