In the first article we walked through Pluto — six passes that turn readable LLVM IR into flattened, substituted, MBA-wrapped spaghetti. In the second, Polaris raised the floor with encrypted state dispatch, alias pointer mazes, calling convention scrambling, and function merging. Together, they cover control flow, data access, ABI, and symbol erasure.
But both projects share a blind spot. Neither touches the string literals sitting in your
binary's .rodata section — the "Enter password: " and "License expired" that
strings your_binary | grep password finds in under a second. And neither attacks the
tool itself — the disassembler that a reverse engineer uses to turn machine code back
into something readable. You can flatten every branch and encrypt every global, and the
analyst can still grep for your error messages and read the surrounding code in IDA.
VMWhere fills both gaps. Two focused passes — string encryption and anti-disassembly — that go after the parts of the binary that Pluto and Polaris leave alone.
What Is VMWhere?
VMWhere is a set of LLVM obfuscation passes by MrRoy09 (21verses), an undergraduate at IIT Roorkee. It targets LLVM 14+ using the new pass manager and ships four compile-time passes plus a link-time anti-debug hook. Despite the name, VMWhere does not implement virtualization — the name is a play on VMware.
Of the four compile-time passes, two — instruction substitution and control flow
flattening — overlap with what Pluto and Polaris already do (and do better, with
encrypted state dispatch and modular-arithmetic predicates). The remaining two — string
encryption and anti-disassembly — are genuinely unique contributions that neither project
addresses. Those are the ones Shifting.Codes ports. The link-time anti-debug hook is a C
technique injected via __attribute__((constructor)), not an LLVM IR pass, so it falls
outside what llvm-nanobind can reach.
String Encryption: Making strings Useless
The Problem
Run this on any binary compiled with Pluto and Polaris together, every pass enabled:
strings obfuscated_binary | grep -i "password\|license\|error\|invalid"
You will find them all. Hours of control-flow obfuscation, XOR-encrypted globals,
indirect calls through pointer arithmetic, function merging — and the analyst recovers
your intent in one command because the string "Invalid license key" is sitting in
plaintext in the read-only data section. String literals are the lowest-hanging fruit
in reverse engineering, and neither Pluto nor Polaris picks them.
How It Works
String encryption is a module pass that operates in three phases.
Phase 1 — Discovery. The pass scans mod.globals looking for [N x i8] constant
arrays with internal, private, or linkonce_odr linkage. These are the globals that
Clang generates for string literals. Integer globals and non-constant globals are
skipped — those belong to Polaris's GlobalEncryptionPass, which handles them with a
different strategy.
Phase 2 — Encryption. Each qualifying global gets a random 32-bit key. The initializer bytes are XOR-encrypted with the key's 4 bytes cycling across the array:
encrypted[i] = original[i] ^ key_bytes[i % 4]
The global's initializer is replaced with the encrypted array, its constant flag is
cleared (the data is no longer a compile-time constant in the eyes of LLVM), and
linkonce_odr linkage is demoted to internal so the linker cannot merge an encrypted
copy with an unencrypted one from a different translation unit.
Phase 3 — Per-function decryption. For every function that uses the string, the pass inserts code at the top of the entry block:
- Allocate a stack copy (
alloca [N x i8]) - Byte-by-byte
load/storememcpy from the encrypted global to the stack copy - Store the 4-byte key into a local alloca
- Call the shared decrypt helper
__obfu_strenc_dec(ptr %copy, ptr %key, i64 %len, i64 %keyLen) - Rewrite every operand that referenced the global to point at the stack copy instead
The decrypt helper is a simple XOR loop — data[i] ^= key[i % keyLen] — shared across
all encrypted strings in the module. One helper, N decryption sites.
Before and After
A function that prints a greeting:
@.str = private unnamed_addr constant [14 x i8] c"Hello, world!\00"
define void @greet() {
entry:
call void @puts(ptr @.str)
ret void
}
After string encryption:
; Global — encrypted at compile time, no longer marked constant
@.str = internal unnamed_addr global [14 x i8] <encrypted bytes>
define void @greet() {
entry:
; stack-local copy of the string
%se.copy = alloca [14 x i8]
; byte-by-byte copy from global to stack
%se.src.0 = getelementptr i8, ptr @.str, i64 0
%se.byte.0 = load i8, ptr %se.src.0
%se.dst.0 = getelementptr i8, ptr %se.copy, i64 0
store i8 %se.byte.0, ptr %se.dst.0
%se.src.1 = getelementptr i8, ptr @.str, i64 1
%se.byte.1 = load i8, ptr %se.src.1
%se.dst.1 = getelementptr i8, ptr %se.copy, i64 1
store i8 %se.byte.1, ptr %se.dst.1
; ... (14 bytes total)
; decrypt the local copy
%se.key = alloca i32
store i32 <random_key>, ptr %se.key
call void @__obfu_strenc_dec(ptr %se.copy, ptr %se.key, i64 14, i64 4)
; use the decrypted local copy
call void @puts(ptr %se.copy)
ret void
}
; shared decrypt helper (one per module)
define private void @__obfu_strenc_dec(ptr %data, ptr %key, i64 %len, i64 %keyLen) {
; data[i] ^= key[i % keyLen] for i in 0..len
...
}
Now strings on the binary finds encrypted garbage instead of "Hello, world!". The
plaintext only exists on the stack, briefly, at runtime.
Design Decisions
A few choices worth calling out:
-
Per-function copies, not in-place decryption. The global stays encrypted in memory at all times. Each function that uses the string gets its own stack copy that is decrypted independently. An analyst cannot set one breakpoint on "the decryption" — if three functions use the string, there are three decryption sites.
-
Shared helper function. All strings in the module share one
__obfu_strenc_decfunction. This keeps code size from exploding — the per-string overhead is just the alloca, memcpy, and call, not a full inlined XOR loop. -
LinkOnceODR demotion. String literals from templates and inline functions get
linkonce_odrlinkage, meaning the linker normally deduplicates them across translation units. If one TU encrypts its copy and another does not, the linker might pick the unencrypted one. Demoting tointernalprevents this. -
Only
[N x i8]arrays. Integer globals and integer arrays are left toGlobalEncryptionPass. The two passes are complementary — string encryption targets the data thatstringsfinds, global encryption targets the data that a hex editor finds.
If you read the Polaris article's section on Global Encryption,
you will recognize the architecture: stack copy, shared decrypt helper, per-function
sites. StringEncryptionPass uses the same build_decrypt_function utility from
ir_helpers.py — just with a different name (__obfu_strenc_dec instead of
__obfu_globalenc_dec) and a different target (byte arrays instead of integer types).
Anti-Disassembly: Teaching the CPU to Lie
Everything we have covered so far — in this article and the two before it — attacks the logic of the program. The control flow, the data, the types, the function boundaries. But the reverse engineer still has one tool we have not touched: the disassembler itself. IDA, Ghidra, objdump, Binary Ninja — the programs that turn raw machine code bytes back into assembly instructions. If you can break that tool, nothing downstream works. Decompilation, cross-referencing, control-flow graphs — all of it depends on the disassembler correctly parsing the instruction stream.
Anti-disassembly attacks that layer. Not the program's semantics, but the parser that reads the program.
How Linear-Sweep Disassembly Works
The simplest disassembly strategy is linear sweep: start at byte offset 0, decode the instruction, advance by the instruction's length, decode the next instruction, repeat. objdump uses this. It is fast, simple, and catastrophically wrong when bytes lie — because x86 instructions are variable-length. If the disassembler misidentifies the length of one instruction, every subsequent instruction is decoded starting at the wrong offset. The error cascades forward indefinitely.
Recursive-descent disassemblers (IDA, Ghidra) are smarter — they follow control flow edges and decode from known-good entry points. But they still have to parse each instruction starting from a byte offset. If you can trick the parser into consuming the wrong number of bytes at a known entry point, the desynchronization is just as fatal.
The Byte Sequence
This is the core of VMWhere's anti-disassembly pass. Fifteen bytes, carefully chosen so the CPU and the disassembler see completely different instruction streams:
0x48 0xB8 r1 r2 r3 0xEB 0x08 0xFF 0xFF 0x48 0x31 0xC0 0xEB 0xF7 0xE8
Let's walk through what each segment does.
0x48 0xB8 — The bait. These two bytes encode REX.W + MOV rax, imm64 — the
start of a 10-byte instruction that loads a 64-bit immediate into RAX. The disassembler
sees this and expects 8 more bytes of immediate data. It will consume the next 8 bytes as
part of this single instruction.
r1 r2 r3 — Random padding. Three random bytes. The disassembler is still parsing
them as part of the movabs immediate. The CPU is executing them too — but it does not
matter, because of what comes next.
0xEB 0x08 — The real instruction. This is JMP rel8 +8 — a short forward jump
that skips the next 8 bytes. The CPU takes this jump. But the disassembler does not see
it — from its perspective, 0xEB 0x08 is just bytes 5–6 of the movabs immediate. It
keeps consuming.
0xFF 0xFF — Junk. Still being consumed by the fake movabs. Never executed (the
CPU already jumped past).
0x48 0x31 0xC0 — Phantom instruction. These bytes encode xor rax, rax. The
disassembler, having finished the 10-byte movabs, starts decoding here and sees a
legitimate instruction. But the CPU jumped past it — this code never runs. The
disassembler is now synchronized to real byte boundaries again, briefly.
0xEB 0xF7 — Backward jump. JMP rel8 -9. The disassembler sees this and follows
the backward edge, creating a phantom loop in the control-flow graph — a loop that does
not exist at runtime. This further pollutes CFG recovery.
0xE8 — The cascade byte. The first byte of CALL rel32 — an instruction that
consumes 4 more bytes as a relative offset. But those 4 bytes are the start of the next
real instruction. The disassembler eats them as a call offset, landing 5 bytes into
whatever comes after. The desynchronization is now permanent.
CPU vs. Disassembler: A Side-by-Side View
Here is what each sees when processing the same 15 bytes:
Offset Bytes CPU sees Disassembler sees
------ ----- -------- -----------------
0x00 48 B8 (start of movabs) movabs rax, <imm64> ← 10-byte instruction
0x02 r1 r2 r3 (part of movabs) (bytes 2-4 of imm)
0x05 EB 08 JMP +8 ← TAKEN (bytes 5-6 of imm)
0x07 FF FF (skipped) (bytes 7-8 of imm)
0x09 48 31 C0 (skipped) xor rax, rax
0x0C EB F7 (skipped) jmp -9 ← phantom loop
0x0E E8 (skipped) call <next 4 bytes> ← eats real code
0x0F ... real code ... ← CPU lands here ... desynchronized ...
The CPU executes one instruction: JMP +8. It lands at offset 0x0F — the start of
whatever real code follows the junk sequence — and continues normally. The disassembler
has decoded four phantom instructions (movabs, xor, jmp, call), created a fake
loop edge, consumed 4 bytes of the next real instruction as a call offset, and is now
hopelessly desynchronized.
Injection Strategy
The pass is a function pass that inserts junk sequences as inline assembly calls.
Each injection is a call void asm sideeffect ".byte ..." instruction — the assembler
emits the raw bytes, the optimizer cannot remove them (side effects), and the register
allocator knows RAX is clobbered (~{eax} constraint).
The injection logic:
-
Always once per block. Every basic block gets one junk injection at the start, before the first non-PHI instruction. This guarantees every block entry point is poisoned.
-
Probabilistic interior injection. A
densityparameter (default 0.3, clamped to[0.0, 1.0]) controls additional injections before non-PHI, non-terminator instructions. At density 0.3, roughly 30% of eligible instructions get a junk prefix. Higher density means more desynchronization points but larger binaries. -
x86 only. The pass checks the module's target triple for
x86,i386, ori686. On ARM, RISC-V, or any non-x86 target, the pass is a no-op — the byte sequence is meaningless outside the x86 instruction encoding. -
Three random bytes per site.
r1,r2,r3are different at every injection point. Pattern matching for "the junk sequence" requires matching a 15-byte template with 3 wildcard positions — feasible but not trivial, and stacking with other passes buries the call instructions under layers of flattening and substitution.
In the IR, each injection looks like this:
call void asm sideeffect ".byte 0x48, 0xB8, 0xa7, 0x3c, 0xf1, 0xEB, 0x08, 0xFF, 0xFF, 0x48, 0x31, 0xC0, 0xEB, 0xF7, 0xE8", "~{eax}"()
That is a single LLVM instruction — call void to an inline assembly value. The
assembler emits 15 raw bytes. The optimizer sees a void function with side effects and
leaves it alone.
The Inspiration
VMWhere's anti-disassembly technique is based on the "Assembly Wrapping" article by Tim Blazytko, which describes the general principle of exploiting variable-length x86 instruction encoding to desynchronize disassemblers. VMWhere takes the concept and packages it as a reusable LLVM pass with randomized byte variation and configurable density.
Using VMWhere Passes
If you followed the Pluto article's setup instructions — Python 3.12+, UV, LLVM 21 — you already have everything you need. The VMWhere passes are in the same package:
import llvm_nanobind as llvm
from shifting_codes.passes import PassPipeline
from shifting_codes.passes.string_encryption import StringEncryptionPass
from shifting_codes.passes.anti_disassembly import AntiDisassemblyPass
from shifting_codes.utils.crypto import CryptoRandom
rng = CryptoRandom(seed=42)
with llvm.create_context() as ctx:
mod = llvm.parse_bitcode_file("your_code.bc", ctx)
mod.target_triple = "x86_64-pc-linux-gnu" # required for anti-disassembly
pipeline = PassPipeline()
pipeline.add(StringEncryptionPass(rng=rng))
pipeline.add(AntiDisassemblyPass(rng=rng, density=0.5)) # default density is 0.3
pipeline.run(mod, ctx)
mod.write_bitcode_to_file("obfuscated.bc")
The density parameter controls how aggressively anti-disassembly injects junk — 0.0
means only block starts, 1.0 means every eligible instruction. The default of 0.3 is a
reasonable balance between disruption and binary size.
These passes compose naturally with everything from Pluto and Polaris. Run string encryption before flattening so the decryption code gets flattened too. Run anti-disassembly last so the junk bytes survive other passes' instruction rewriting.
Credits
VMWhere — designed and authored by MrRoy09 (21verses). A focused, lightweight project that targets two problems the larger frameworks ignore. The anti-disassembly technique in particular is a genuinely clever application of x86 encoding quirks to LLVM IR.
Assembly Wrapping — the Medium article that inspired VMWhere's anti-disassembly approach. Worth reading for the general principles of exploiting variable-length instruction encodings.
llvm-nanobind — the binding library that makes all three articles possible. Special thanks to mrexodia for maintaining Python bindings against a C++ API that actively resists being bound.
Note on scope: VMWhere ships two additional compile-time passes (instruction
substitution and control flow flattening) that are not ported here — Pluto and Polaris
handle those techniques with stronger implementations (MBA-augmented substitution,
encrypted state dispatch). The link-time anti-debug hook
(__attribute__((constructor))) is a C technique, not an LLVM IR pass, and falls outside
what llvm-nanobind can reach.
Shifting.Codes is provided for legitimate use cases including software protection, security research, CTF challenge authoring, and compiler education. The authors make no representations regarding fitness for any particular purpose and accept no liability for any misuse or damages arising from the use of this software. Use is entirely at your own risk and responsibility.