Junk Bytes and Disappearing Strings: VMWhere's Anti-Analysis Passes

In the first article we walked through Pluto — six passes that turn readable LLVM IR into flattened, substituted, MBA-wrapped spaghetti. In the second, Polaris raised the floor with encrypted state dispatch, alias pointer mazes, calling convention scrambling, and function merging. Together, they cover control flow, data access, ABI, and symbol erasure.

But both projects share a blind spot. Neither touches the string literals sitting in your binary's .rodata section — the "Enter password: " and "License expired" that strings your_binary | grep password finds in under a second. And neither attacks the tool itself — the disassembler that a reverse engineer uses to turn machine code back into something readable. You can flatten every branch and encrypt every global, and the analyst can still grep for your error messages and read the surrounding code in IDA.

VMWhere fills both gaps. Two focused passes — string encryption and anti-disassembly — that go after the parts of the binary that Pluto and Polaris leave alone.

What Is VMWhere?

VMWhere is a set of LLVM obfuscation passes by MrRoy09 (21verses), an undergraduate at IIT Roorkee. It targets LLVM 14+ using the new pass manager and ships four compile-time passes plus a link-time anti-debug hook. Despite the name, VMWhere does not implement virtualization — the name is a play on VMware.

Of the four compile-time passes, two — instruction substitution and control flow flattening — overlap with what Pluto and Polaris already do (and do better, with encrypted state dispatch and modular-arithmetic predicates). The remaining two — string encryption and anti-disassembly — are genuinely unique contributions that neither project addresses. Those are the ones Shifting.Codes ports. The link-time anti-debug hook is a C technique injected via __attribute__((constructor)), not an LLVM IR pass, so it falls outside what llvm-nanobind can reach.

String Encryption: Making `strings` Useless

The Problem

Run this on any binary compiled with Pluto and Polaris together, every pass enabled:

strings obfuscated_binary | grep -i "password\|license\|error\|invalid"

You will find them all. Hours of control-flow obfuscation, XOR-encrypted globals, indirect calls through pointer arithmetic, function merging — and the analyst recovers your intent in one command because the string "Invalid license key" is sitting in plaintext in the read-only data section. String literals are the lowest-hanging fruit in reverse engineering, and neither Pluto nor Polaris picks them.

How It Works

String encryption is a module pass that operates in three phases.

Phase 1 — Discovery. The pass scans mod.globals looking for [N x i8] constant arrays with internal, private, or linkonce_odr linkage. These are the globals that Clang generates for string literals. Integer globals and non-constant globals are skipped — those belong to Polaris's GlobalEncryptionPass, which handles them with a different strategy.

Phase 2 — Encryption. Each qualifying global gets a random 32-bit key. The initializer bytes are XOR-encrypted with the key's 4 bytes cycling across the array:

encrypted[i] = original[i] ^ key_bytes[i % 4]

The global's initializer is replaced with the encrypted array, its constant flag is cleared (the data is no longer a compile-time constant in the eyes of LLVM), and linkonce_odr linkage is demoted to internal so the linker cannot merge an encrypted copy with an unencrypted one from a different translation unit.

Phase 3 — Per-function decryption. For every function that uses the string, the pass inserts code at the top of the entry block:

Allocate a stack copy (alloca [N x i8])
Byte-by-byte load/store memcpy from the encrypted global to the stack copy
Store the 4-byte key into a local alloca
Call the shared decrypt helper __obfu_strenc_dec(ptr %copy, ptr %key, i64 %len, i64 %keyLen)
Rewrite every operand that referenced the global to point at the stack copy instead

The decrypt helper is a simple XOR loop — data[i] ^= key[i % keyLen] — shared across all encrypted strings in the module. One helper, N decryption sites.

Before and After

A function that prints a greeting:

@.str = private unnamed_addr constant [14 x i8] c"Hello, world!\00"

define void @greet() {
entry:
  call void @puts(ptr @.str)
  ret void
}

After string encryption:

; Global — encrypted at compile time, no longer marked constant
@.str = internal unnamed_addr global [14 x i8] <encrypted bytes>

define void @greet() {
entry:
  ; stack-local copy of the string
  %se.copy = alloca [14 x i8]
  ; byte-by-byte copy from global to stack
  %se.src.0 = getelementptr i8, ptr @.str, i64 0
  %se.byte.0 = load i8, ptr %se.src.0
  %se.dst.0 = getelementptr i8, ptr %se.copy, i64 0
  store i8 %se.byte.0, ptr %se.dst.0
  %se.src.1 = getelementptr i8, ptr @.str, i64 1
  %se.byte.1 = load i8, ptr %se.src.1
  %se.dst.1 = getelementptr i8, ptr %se.copy, i64 1
  store i8 %se.byte.1, ptr %se.dst.1
  ; ... (14 bytes total)
  ; decrypt the local copy
  %se.key = alloca i32
  store i32 <random_key>, ptr %se.key
  call void @__obfu_strenc_dec(ptr %se.copy, ptr %se.key, i64 14, i64 4)
  ; use the decrypted local copy
  call void @puts(ptr %se.copy)
  ret void
}

; shared decrypt helper (one per module)
define private void @__obfu_strenc_dec(ptr %data, ptr %key, i64 %len, i64 %keyLen) {
  ; data[i] ^= key[i % keyLen] for i in 0..len
  ...
}

Now strings on the binary finds encrypted garbage instead of "Hello, world!". The plaintext only exists on the stack, briefly, at runtime.

Design Decisions

A few choices worth calling out:

Per-function copies, not in-place decryption. The global stays encrypted in memory at all times. Each function that uses the string gets its own stack copy that is decrypted independently. An analyst cannot set one breakpoint on "the decryption" — if three functions use the string, there are three decryption sites.
Shared helper function. All strings in the module share one __obfu_strenc_dec function. This keeps code size from exploding — the per-string overhead is just the alloca, memcpy, and call, not a full inlined XOR loop.
LinkOnceODR demotion. String literals from templates and inline functions get linkonce_odr linkage, meaning the linker normally deduplicates them across translation units. If one TU encrypts its copy and another does not, the linker might pick the unencrypted one. Demoting to internal prevents this.
Only [N x i8] arrays. Integer globals and integer arrays are left to GlobalEncryptionPass. The two passes are complementary — string encryption targets the data that strings finds, global encryption targets the data that a hex editor finds.

If you read the Polaris article's section on Global Encryption, you will recognize the architecture: stack copy, shared decrypt helper, per-function sites. StringEncryptionPass uses the same build_decrypt_function utility from ir_helpers.py — just with a different name (__obfu_strenc_dec instead of __obfu_globalenc_dec) and a different target (byte arrays instead of integer types).

Anti-Disassembly: Teaching the CPU to Lie

Everything we have covered so far — in this article and the two before it — attacks the logic of the program. The control flow, the data, the types, the function boundaries. But the reverse engineer still has one tool we have not touched: the disassembler itself. IDA, Ghidra, objdump, Binary Ninja — the programs that turn raw machine code bytes back into assembly instructions. If you can break that tool, nothing downstream works. Decompilation, cross-referencing, control-flow graphs — all of it depends on the disassembler correctly parsing the instruction stream.

Anti-disassembly attacks that layer. Not the program's semantics, but the parser that reads the program.

How Linear-Sweep Disassembly Works

The simplest disassembly strategy is linear sweep: start at byte offset 0, decode the instruction, advance by the instruction's length, decode the next instruction, repeat. objdump uses this. It is fast, simple, and catastrophically wrong when bytes lie — because x86 instructions are variable-length. If the disassembler misidentifies the length of one instruction, every subsequent instruction is decoded starting at the wrong offset. The error cascades forward indefinitely.

Recursive-descent disassemblers (IDA, Ghidra) are smarter — they follow control flow edges and decode from known-good entry points. But they still have to parse each instruction starting from a byte offset. If you can trick the parser into consuming the wrong number of bytes at a known entry point, the desynchronization is just as fatal.

The Byte Sequence

This is the core of VMWhere's anti-disassembly pass. Fifteen bytes, carefully chosen so the CPU and the disassembler see completely different instruction streams:

0x48  0xB8  r1  r2  r3  0xEB  0x08  0xFF  0xFF  0x48  0x31  0xC0  0xEB  0xF7  0xE8

Let's walk through what each segment does.

0x48 0xB8 — The bait. These two bytes encode REX.W + MOV rax, imm64 — the start of a 10-byte instruction that loads a 64-bit immediate into RAX. The disassembler sees this and expects 8 more bytes of immediate data. It will consume the next 8 bytes as part of this single instruction.

r1 r2 r3 — Random padding. Three random bytes. The disassembler is still parsing them as part of the movabs immediate. The CPU is executing them too — but it does not matter, because of what comes next.

0xEB 0x08 — The real instruction. This is JMP rel8 +8 — a short forward jump that skips the next 8 bytes. The CPU takes this jump. But the disassembler does not see it — from its perspective, 0xEB 0x08 is just bytes 5–6 of the movabs immediate. It keeps consuming.

0xFF 0xFF — Junk. Still being consumed by the fake movabs. Never executed (the CPU already jumped past).

0x48 0x31 0xC0 — Phantom instruction. These bytes encode xor rax, rax. The disassembler, having finished the 10-byte movabs, starts decoding here and sees a legitimate instruction. But the CPU jumped past it — this code never runs. The disassembler is now synchronized to real byte boundaries again, briefly.

0xEB 0xF7 — Backward jump. JMP rel8 -9. The disassembler sees this and follows the backward edge, creating a phantom loop in the control-flow graph — a loop that does not exist at runtime. This further pollutes CFG recovery.

0xE8 — The cascade byte. The first byte of CALL rel32 — an instruction that consumes 4 more bytes as a relative offset. But those 4 bytes are the start of the next real instruction. The disassembler eats them as a call offset, landing 5 bytes into whatever comes after. The desynchronization is now permanent.

CPU vs. Disassembler: A Side-by-Side View

Here is what each sees when processing the same 15 bytes:

Offset  Bytes                    CPU sees              Disassembler sees
------  -----                    --------              -----------------
0x00    48 B8                    (start of movabs)     movabs rax, <imm64>  ← 10-byte instruction
0x02    r1 r2 r3                 (part of movabs)        (bytes 2-4 of imm)
0x05    EB 08                    JMP +8 ← TAKEN         (bytes 5-6 of imm)
0x07    FF FF                    (skipped)               (bytes 7-8 of imm)
0x09    48 31 C0                 (skipped)             xor rax, rax
0x0C    EB F7                    (skipped)             jmp -9  ← phantom loop
0x0E    E8                       (skipped)             call <next 4 bytes>  ← eats real code
0x0F    ... real code ...        ← CPU lands here      ... desynchronized ...

The CPU executes one instruction: JMP +8. It lands at offset 0x0F — the start of whatever real code follows the junk sequence — and continues normally. The disassembler has decoded four phantom instructions (movabs, xor, jmp, call), created a fake loop edge, consumed 4 bytes of the next real instruction as a call offset, and is now hopelessly desynchronized.

Injection Strategy

The pass is a function pass that inserts junk sequences as inline assembly calls. Each injection is a call void asm sideeffect ".byte ..." instruction — the assembler emits the raw bytes, the optimizer cannot remove them (side effects), and the register allocator knows RAX is clobbered (~{eax} constraint).

The injection logic:

Always once per block. Every basic block gets one junk injection at the start, before the first non-PHI instruction. This guarantees every block entry point is poisoned.
Probabilistic interior injection. A density parameter (default 0.3, clamped to [0.0, 1.0]) controls additional injections before non-PHI, non-terminator instructions. At density 0.3, roughly 30% of eligible instructions get a junk prefix. Higher density means more desynchronization points but larger binaries.
x86 only. The pass checks the module's target triple for x86, i386, or i686. On ARM, RISC-V, or any non-x86 target, the pass is a no-op — the byte sequence is meaningless outside the x86 instruction encoding.
Three random bytes per site. r1, r2, r3 are different at every injection point. Pattern matching for "the junk sequence" requires matching a 15-byte template with 3 wildcard positions — feasible but not trivial, and stacking with other passes buries the call instructions under layers of flattening and substitution.

In the IR, each injection looks like this:

call void asm sideeffect ".byte 0x48, 0xB8, 0xa7, 0x3c, 0xf1, 0xEB, 0x08, 0xFF, 0xFF, 0x48, 0x31, 0xC0, 0xEB, 0xF7, 0xE8", "~{eax}"()

That is a single LLVM instruction — call void to an inline assembly value. The assembler emits 15 raw bytes. The optimizer sees a void function with side effects and leaves it alone.

The Inspiration

VMWhere's anti-disassembly technique is based on the "Assembly Wrapping" article by Tim Blazytko, which describes the general principle of exploiting variable-length x86 instruction encoding to desynchronize disassemblers. VMWhere takes the concept and packages it as a reusable LLVM pass with randomized byte variation and configurable density.

Using VMWhere Passes

If you followed the Pluto article's setup instructions — Python 3.12+, UV, LLVM 21 — you already have everything you need. The VMWhere passes are in the same package:

import llvm_nanobind as llvm
from shifting_codes.passes import PassPipeline
from shifting_codes.passes.string_encryption import StringEncryptionPass
from shifting_codes.passes.anti_disassembly import AntiDisassemblyPass
from shifting_codes.utils.crypto import CryptoRandom

rng = CryptoRandom(seed=42)

with llvm.create_context() as ctx:
    mod = llvm.parse_bitcode_file("your_code.bc", ctx)
    mod.target_triple = "x86_64-pc-linux-gnu"  # required for anti-disassembly

    pipeline = PassPipeline()
    pipeline.add(StringEncryptionPass(rng=rng))
    pipeline.add(AntiDisassemblyPass(rng=rng, density=0.5))  # default density is 0.3

    pipeline.run(mod, ctx)
    mod.write_bitcode_to_file("obfuscated.bc")

The density parameter controls how aggressively anti-disassembly injects junk — 0.0 means only block starts, 1.0 means every eligible instruction. The default of 0.3 is a reasonable balance between disruption and binary size.

These passes compose naturally with everything from Pluto and Polaris. Run string encryption before flattening so the decryption code gets flattened too. Run anti-disassembly last so the junk bytes survive other passes' instruction rewriting.

Credits

VMWhere — designed and authored by MrRoy09 (21verses). A focused, lightweight project that targets two problems the larger frameworks ignore. The anti-disassembly technique in particular is a genuinely clever application of x86 encoding quirks to LLVM IR.

Assembly Wrapping — the Medium article that inspired VMWhere's anti-disassembly approach. Worth reading for the general principles of exploiting variable-length instruction encodings.

llvm-nanobind — the binding library that makes all three articles possible. Special thanks to mrexodia for maintaining Python bindings against a C++ API that actively resists being bound.

Note on scope: VMWhere ships two additional compile-time passes (instruction substitution and control flow flattening) that are not ported here — Pluto and Polaris handle those techniques with stronger implementations (MBA-augmented substitution, encrypted state dispatch). The link-time anti-debug hook (__attribute__((constructor))) is a C technique, not an LLVM IR pass, and falls outside what llvm-nanobind can reach.

Shifting.Codes is provided for legitimate use cases including software protection, security research, CTF challenge authoring, and compiler education. The authors make no representations regarding fitness for any particular purpose and accept no liability for any misuse or damages arising from the use of this software. Use is entirely at your own risk and responsibility.

What Is VMWhere?#

String Encryption: Making strings Useless#

The Problem#

How It Works#

Before and After#

Design Decisions#

Anti-Disassembly: Teaching the CPU to Lie#

How Linear-Sweep Disassembly Works#

The Byte Sequence#

CPU vs. Disassembler: A Side-by-Side View#

Injection Strategy#

The Inspiration#

Using VMWhere Passes#

Credits#

Stay in the loop