In the previous article, we walked through Pluto — six LLVM obfuscation passes that turn readable IR into something a decompiler would rather not think about. We ended with a teaser: Pluto's successor, Polaris, pushes the techniques further.
Here is the thing about Pluto's obfuscation, though. A sufficiently motivated analyst with Z3 and an afternoon can unpick a lot of it. The opaque predicates are algebraic identities any solver can discharge. The flattening uses plaintext state constants you can grep out of the binary. The global encryption decrypts at a single site. None of this makes Pluto bad — it makes it a first draft.
Polaris is the second draft. Where Pluto hides individual constructs behind thin disguises, Polaris makes the structural relationships between blocks, variables, and functions unrecoverable. The reverse engineer does not just need a solver — they need to reconstruct dominance trees, key schedules, pointer graphs, and ABI semantics that no longer exist in the output. That is a different category of problem.
What Is Polaris?
Polaris is a suite of LLVM obfuscation passes written by za233 (not bluesadi — different author, different repository). Where Pluto targets LLVM 14.0.6, Polaris targets LLVM 16.0.6 and ships 10 IR-level passes plus 2 MIR/backend passes (X86 and AArch64 "rubbish code" junk-instruction inserters, though only the X86 version is implemented — the AArch64 one is a stub). Pluto's own README points to Polaris as its successor.
Like Pluto, Polaris is frozen in time. LLVM 16 is not LLVM 21, and the C++ API drift means it will not build against a modern toolchain without porting work. The same vintage-car problem, just a slightly newer model year. Shifting.Codes ports 8 of Polaris's IR-level passes to Python, running against LLVM 21 today. (The remaining two — Substitution and LinearMBA — overlap with Pluto's versions already ported in the previous article.)
The Upgrades: Four Passes, Rebuilt
Polaris takes four of Pluto's passes and replaces their weak points. In each case, the transformation is semantically the same — the output still does what the input did — but the obfuscation is qualitatively harder to reverse.
The IR examples below are lightly simplified for readability — actual output uses random constants and generated names — but the instruction sequences and structure match the Python implementation exactly.
Bogus Control Flow: From Algebra Homework to Modular Arithmetic
Pluto's weakness: The opaque predicate y < 10 || x*(x+1) % 2 == 0 is an
algebraic identity. Feed it to Z3, get always true in milliseconds. The dead path
is immediately identifiable.
Polaris's fix: Two i64 allocas (bcf.var and bcf.var0) are initialized to the
same prime value chosen from a table of 1,087 primes (1009–9973). The predicate
bcf.var == bcf.var0 is always true. To keep the invariant alive across blocks,
Polaris inserts a modular arithmetic update before every terminator:
Var0 = ((a * Var0) mod m - b) mod m
The constants a, b, and m are chosen per-block so that a * x_val ≡ b (mod m)
— meaning the update always maps bcf.var0 back to its initial value. The modular
inverse is computed at compile time via the extended Euclidean algorithm.
Each processable block is split into head / body / tail, a clone of the body is created, and the head branches on the opaque predicate — real path to body, dead path to clone. The clone unconditionally loops back to body, creating a cycle that never executes but looks plausible to a disassembler.
After BCF (showing one block's transformation):
pos: ; head — opaque branch
%bcf.lhs = load i64, ptr %bcf.var
%bcf.rhs = load i64, ptr %bcf.var0
%bcf.cmp = icmp eq i64 %bcf.lhs, %bcf.rhs ; always true
br i1 %bcf.cmp, label %bcf.body.1, label %bcf.clone.1
bcf.body.1: ; real computation + modular state update
%r1 = mul i32 %x, 2
%bcf.v = load i64, ptr %bcf.var0
%bcf.av = mul i64 911, %bcf.v ; a * Var0
%bcf.avmod = urem i64 %bcf.av, 7919 ; mod m
%bcf.sub = sub i64 %bcf.avmod, 2753 ; - b
%bcf.result = urem i64 %bcf.sub, 7919 ; mod m again
store i64 %bcf.result, ptr %bcf.var0 ; invariant restored
%bcf.cmp2 = icmp eq i64 %bcf.lhs, %bcf.result ; always true
br i1 %bcf.cmp2, label %bcf.tail.1, label %bcf.clone.1
bcf.tail.1:
ret i32 %r1
bcf.clone.1: ; dead block — never reached
%r1.c = mul i32 %x, 2 ; cloned computation
br label %bcf.body.1 ; loops back, looks like real control flow
Why it is harder: A static analyzer must perform whole-function data-flow analysis to track the modular arithmetic chain across every block and prove the invariant holds. Z3 can still solve it in theory, but the solver needs to be told what to solve — and the predicate no longer announces itself as an algebraic identity.
Control Flow Flattening: Encrypted State Dispatch
Pluto's weakness: The dispatch switch uses plaintext case constants. grep the
binary for the state values, match them to switch targets, and the original CFG falls
out.
Polaris's fix: State values are XOR-encrypted with per-block keys derived from the
dominator tree. Each block gets a random key, and a block's effective decryption key
is the XOR of all keys from blocks that dominate it. A __cff_update_key helper
function XORs the key array on first visit, propagating the key schedule as execution
flows through the dominance hierarchy.
The pass also allocates per-block visited flags ([N x i8]) and per-block global
arrays listing which blocks each block dominates, so the key update knows which slots
to XOR.
After flattening (simplified — showing the dispatch and one block):
define i32 @loop_sum(i32 %n) {
entry:
%i.demoted = alloca i32 ; PHI nodes demoted to stack
%acc.demoted = alloca i32
store i32 0, ptr %i.demoted
store i32 0, ptr %acc.demoted
%cff.state = alloca i32 ; encrypted state variable
%cff.keys = alloca [2 x i32] ; per-block XOR key slots
%cff.visited = alloca [2 x i8] ; first-visit flags
store i32 <loop_state>, ptr %cff.state ; initial (unencrypted) state
br label %cff.dispatch
cff.dispatch:
%cff.sw = load i32, ptr %cff.state
switch i32 %cff.sw, label %cff.default [
i32 <loop_state>, label %loop
i32 <exit_state>, label %exit
]
loop:
%i = load i32, ptr %i.demoted
%acc = load i32, ptr %acc.demoted
%i.next = add i32 %i, 1
%acc.next = add i32 %acc, %i
store i32 %i.next, ptr %i.demoted
store i32 %acc.next, ptr %acc.demoted
%done = icmp eq i32 %i.next, %n
; dominance-based key update — XOR key slots for dominated blocks on first visit
call void @__cff_update_key(i8 %vis, i32 1, ptr @.cff.dom.loop_sum.0,
ptr %cff.keys, i32 <key_list[loop]>)
store i8 1, ptr %cff.vptr ; mark visited
; select next encrypted state, XOR-decrypt with this block's key slot
%cff.sel = select i1 %done, i32 <exit_enc>, i32 <loop_enc>
%cff.key = load i32, ptr %cff.kptr
%cff.enc = xor i32 %cff.key, %cff.sel
store i32 %cff.enc, ptr %cff.state
br label %cff.dispatch
; ...
}
define private void @__cff_update_key(i8 %flag, i32 %len, ptr %posArray,
ptr %keyArray, i32 %num) { ... }
Why it is harder: The case constants in the switch are the decrypted states. What
gets stored to cff.state is their XOR with a key that only becomes correct after all
dominating blocks have executed. Recovering the original CFG requires reconstructing
both the dominator tree and the key schedule — and neither is present in the binary as
a static artifact.
Global Encryption: Scattered Decryption
Pluto's weakness: Each encrypted global is decrypted inline at one site — find the XOR loop, find the key, done.
Polaris's fix: Instead of decrypting in place, Polaris creates a stack-local copy
of the global in every function that uses it. Each copy is independently decrypted via
a shared helper function __obfu_globalenc_dec(ptr %data, ptr %key, i64 %len, i64 %keyLen),
which runs a byte-level XOR loop.
Only is_global_constant globals with internal/private linkage are encrypted — a
deliberate safety measure that avoids corrupting mutable globals or externally visible
symbols. The pass uses a random 32-bit key per global, cycling the 4-byte key across
the data via encrypt_bytes().
After global encryption (showing per-function decryption):
; Global — encrypted at compile time, no longer marked constant
@secret_table = internal global [8 x i32] <encrypted_initializer>
define i32 @lookup(i32 %idx) {
entry:
; stack-local copy of the global
%local = alloca [8 x i32]
; byte-by-byte copy from global to stack
%g0 = load i8, ptr @secret_table
store i8 %g0, ptr %local
%g1 = load i8, ptr getelementptr (i8, ptr @secret_table, i64 1)
store i8 %g1, ptr getelementptr (i8, ptr %local, i64 1)
; ... (32 bytes total)
; decrypt the local copy
call void @__obfu_globalenc_dec(ptr %local, ptr @.genc.key.0, i64 32, i64 4)
; use the decrypted local copy
%ptr = getelementptr [8 x i32], ptr %local, i32 0, i32 %idx
%val = load i32, ptr %ptr
ret i32 %val
}
@.genc.key.0 = private global [4 x i8] c"\AB\CD\EF\12"
define private void @__obfu_globalenc_dec(ptr %data, ptr %key,
i64 %len, i64 %keyLen) {
; data[i] ^= key[i % keyLen] for i in 0..len
...
}
Why it is harder: If three functions use @secret_table, three independent stack
copies exist, each decrypted separately. An analyst cannot set a single breakpoint on
"the decryption" — there are N decryption sites, one per consumer, with no shared
mutable state to intercept.
Indirect Call: Per-Site Masking
Pluto's weakness: Every indirect call loads from a shared global variable per function. Find the GV table, resolve all the pointers, and the original call graph is back.
Polaris's fix: Each call site gets its own private global variable and a unique random 32-bit mask applied via pointer arithmetic:
; Before: direct call
%result = call i32 @helper(i32 %x)
; After: per-site indirect call with pointer masking
%loaded = load ptr, ptr @.indcall.0 ; private GV holding @helper
%int_val = ptrtoint ptr %loaded to i64
%added = add i64 %int_val, 749312581 ; + mask (random per site)
%unmasked = sub i64 %added, 749312581 ; - mask (cancels out)
%func_ptr = inttoptr i64 %unmasked to ptr
%result = call i32 %func_ptr(i32 %x) ; indirect call through recovered pointer
@.indcall.0 = private global ptr @helper
The add/sub cancel out at runtime, but a static analyzer sees an integer
computation producing the call target — it cannot resolve the pointer without
evaluating the arithmetic or recognizing the identity.
Why it is harder: With Pluto, one global per function means N call sites share one pointer — find it once, resolve them all. With Polaris, N call sites mean N globals and N masks. Each site looks independent. Automated CFG recovery that searches for a function-pointer table comes up empty.
The New Passes: Four Techniques Pluto Never Had
Polaris does not just upgrade existing passes. It introduces four entirely new transformations that attack parts of the binary Pluto never touched: branch targets, local variable layout, calling conventions, and function boundaries.
Indirect Branch: Jump Tables on the Stack
Every conditional br instruction is a gift to a disassembler — it reveals two
successor blocks, and the condition tells you why the program goes one way or the
other. Indirect Branch takes that gift back.
For each branch, the pass allocates a [2 x ptr] array on the stack and stores
blockaddress constants for the true and false targets. The branch condition is
inverted, zero-extended, and then run through an MBA-style obfuscation that
computes the same index via bitwise decomposition:
index = zext(not(cond))
xor1 = index ^ rand
result = (~xor1 & rand) | (xor1 & ~rand) ; == index ^ rand ^ rand == index
This is the identity a ^ b = (~a & b) | (a & ~b) applied twice — the result is just
index, but the IR shows five bitwise operations with a random constant mixed in.
Before and after:
; Before: conditional branch
%cond = icmp sgt i32 %x, 0
br i1 %cond, label %then, label %else
; After: indirect branch through obfuscated jump table
%cond = icmp sgt i32 %x, 0
; build jump table on the stack
%sibr.table = alloca [2 x ptr]
%slot0 = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 0
store ptr blockaddress(@func, %else), ptr %slot0 ; false target at [0]
%slot1 = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 1
store ptr blockaddress(@func, %then), ptr %slot1 ; true target at [1]
; obfuscated index computation
%not.cond = xor i1 %cond, true ; invert
%idx = zext i1 %not.cond to i32 ; 0 or 1
%xor1 = xor i32 %idx, 839201457 ; mix with random
%not.xor1 = xor i32 %xor1, -1 ; ~xor1
%not.rand = xor i32 839201457, -1 ; ~rand
%left = and i32 %not.xor1, 839201457 ; (~xor1 & rand)
%right = and i32 %xor1, %not.rand ; (xor1 & ~rand)
%final = or i32 %left, %right ; == idx
; load target and jump
%tgt.ptr = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 %final
%tgt = load ptr, ptr %tgt.ptr
indirectbr ptr %tgt, [label %then, label %else]
Impact: Linear-sweep and recursive-descent disassemblers both choke on indirectbr.
IDA shows jmp rax with no cross-references. The jump table is on the stack, not in
.rodata, so table-recovery heuristics do not find it. And the index computation looks
like it depends on a runtime value even though it is a deterministic function of the
branch condition.
Alias Access: The Pointer Maze
Local variables — alloca instructions — are the easiest things in LLVM IR to
understand. They have a name, a type, and a fixed set of loads and stores. Alias Access
makes all of that go away.
The pass builds a multi-level indirection graph:
-
Raw nodes. Original allocas are packed into randomly-structured structs with
ptr-typed padding fields. An alloca that wasi32might end up at index 5 of a 7-field struct where the other 6 fields are pointer-sized noise. -
Transition nodes.
3 × Nadditional structs are created, each a 6-slot pointer struct ({ ptr, ptr, ptr, ptr, ptr, ptr }). Each slot randomly points to another transition node or to a raw node, forming a directed graph. -
Getter functions. One private function per slot index (0–5) is generated:
__obfu_aa_getter_N(ptr) → ptr, which loads slot N from a transition struct and returns the pointer. -
Access replacement. Every use of an original alloca is replaced with a chain of getter calls that traverse the graph from a transition node to the raw node, followed by a GEP to the element position within the struct.
Before and after:
; Before: simple alloca usage
%x = alloca i32
store i32 42, ptr %x
%val = load i32, ptr %x
; After: struct indirection through getter chain
; raw node: %x packed into a struct at index 3
%raw.0 = alloca { ptr, ptr, ptr, i32, ptr }
; transition nodes wired together
%trans.0 = alloca { ptr, ptr, ptr, ptr, ptr, ptr }
%trans.1 = alloca { ptr, ptr, ptr, ptr, ptr, ptr }
; edge: trans.1 slot 2 → trans.0
%t1.s2 = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %trans.1, i32 0, i32 2
store ptr %trans.0, ptr %t1.s2
; edge: trans.0 slot 4 → raw.0
%t0.s4 = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %trans.0, i32 0, i32 4
store ptr %raw.0, ptr %t0.s4
; ... (other edges omitted)
; access %x: traverse trans.1 → slot 2 → trans.0 → slot 4 → raw.0 → index 3
%p0 = call ptr @__obfu_aa_getter_2(ptr %trans.1) ; → trans.0
%p1 = call ptr @__obfu_aa_getter_4(ptr %p0) ; → raw.0
%elem = getelementptr { ptr, ptr, ptr, i32, ptr }, ptr %p1, i32 0, i32 3
store i32 42, ptr %elem
; ... (load follows the same chain)
define private ptr @__obfu_aa_getter_2(ptr %node) {
%slot = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %node, i32 0, i32 2
%val = load ptr, ptr %slot
ret ptr %val
}
Impact: Type recovery — the bread and butter of decompilers like Hex-Rays and Ghidra — fails completely. Every local variable is buried inside a random struct at a random offset, accessed through a chain of pointer dereferences that looks different at every use site. Stack variable correlation, the technique decompilers use to give names and types to locals, sees a forest of unrelated pointer loads.
Custom Calling Convention
This is the simplest pass in the set, and possibly the most cost-effective in terms of damage per line of code.
Polaris randomly assigns a non-standard calling convention to every internal or private function, selected from a pool of seven:
| Convention | What Changes |
|---|---|
fastcc | Arguments in registers, tail-call eligible |
coldcc | Callee-saved everything, optimized for unlikely paths |
preserve_mostcc | Almost all registers callee-saved |
preserve_allcc | All registers callee-saved |
x86_regcallcc | Microsoft __regcall — up to 16 register args |
x86_64_sysvcc | System V AMD64 ABI |
win64cc | Windows x64 ABI |
The pass then scans the module and updates every call site to match:
; Before
define internal void @helper(i32 %x) {
...
}
call void @helper(i32 %arg)
; After
define internal fastcc void @helper(i32 %x) {
...
}
call fastcc void @helper(i32 %arg)
Impact: Every decompiler assumes the platform ABI by default. When @helper uses
fastcc but the decompiler assumes win64cc, the register allocation looks wrong, the
stack frame layout is misinterpreted, arguments appear in the wrong positions, and
return values go missing. The function "decompiles" but the output is semantically
garbage. Multiply this across every internal function in the module and the decompiler
output becomes unreliable everywhere.
Merge Function: Erasing Boundaries
Function boundaries are the highest-value landmarks in a binary. Remove them and the reverse engineer loses the ability to reason about the program in units smaller than "everything."
Merge Function works in three phases:
Phase 1 — Wrap. Every function is wrapped in a void-returning shell. Non-void
functions get an extra ptr parameter; the wrapper stores the return value through that
pointer instead of returning it. The original function's body is replaced with a thin
stub that calls the wrapper and loads the result.
Phase 2 — Merge. All wrappers are cloned into a single function
__merged_function(i32 %selector, ...). The first parameter is a selector index; the
remaining parameters are the union of all wrapper parameters (unused slots filled with
undef). A switch on %selector dispatches to the correct cloned entry block.
Phase 3 — Replace. Every call site in the module is rewritten to call
__merged_function with the appropriate selector and arguments at the correct offsets.
The wrapper functions are erased.
Before and after:
; Before: two simple functions
define i32 @add(i32 %a, i32 %b) {
%r = add i32 %a, %b
ret i32 %r
}
define i32 @mul(i32 %a, i32 %b) {
%r = mul i32 %a, %b
ret i32 %r
}
; call sites
%sum = call i32 @add(i32 3, i32 4)
%prod = call i32 @mul(i32 5, i32 6)
; After: single merged dispatcher
define void @__merged_function(i32 %selector, i32 %a0, i32 %b0, ptr %ret0,
i32 %a1, i32 %b1, ptr %ret1) {
entry:
switch i32 %selector, label %default [
i32 0, label %add.entry
i32 1, label %mul.entry
]
add.entry:
%r0 = add i32 %a0, %b0
store i32 %r0, ptr %ret0
br label %return
mul.entry:
%r1 = mul i32 %a1, %b1
store i32 %r1, ptr %ret1
br label %return
return:
ret void
default:
unreachable
}
; call sites become:
%ret.alloca = alloca i32
call void @__merged_function(i32 0, i32 3, i32 4, ptr %ret.alloca,
i32 undef, i32 undef, ptr undef)
%sum = load i32, ptr %ret.alloca
call void @__merged_function(i32 1, i32 undef, i32 undef, ptr undef,
i32 5, i32 6, ptr %ret.alloca)
%prod = load i32, ptr %ret.alloca
Impact: Function names, boundaries, and signatures are gone. The decompiler sees
one enormous function with a switch statement and a parameter list that makes no sense.
Cross-referencing "who calls add" returns "one function calls __merged_function
with selector 0" — which is useless without knowing the selector mapping. Combine this
with Indirect Call and the selector itself is masked behind pointer arithmetic.
Stacking the Layers
Each Polaris pass independently defeats a different class of analysis tool. Together, they form five defense layers:
-
Control flow — Bogus Control Flow + Flattening + Indirect Branch. The CFG is flattened into a switch dispatcher, the state values are encrypted, the branches are indirect, and the predicates require whole-function data-flow analysis to resolve. Both IDA's recursive descent and Ghidra's decompiler produce nonsense.
-
Data access — Alias Access + Global Encryption. Local variables are buried in pointer mazes and globals are per-function stack copies decrypted through a shared helper. Type recovery fails and memory access patterns become opaque.
-
ABI — Custom Calling Convention. Register allocation and stack layout assumptions are wrong for every internal function. Decompiler output looks syntactically valid but is semantically incorrect.
-
Symbol erasure — Merge Function + Indirect Call. Function boundaries, names, and call-graph edges are gone. The binary looks like one function calling itself through pointer arithmetic.
-
Machine code — X86RubbishCodePass. After instruction selection, junk instructions fill dead registers, instruction substitutions obscure operands, and dirty bytes between
call/retsplits break linear disassembly. This layer operates below the IR — none of the above passes can see or undo it.
No single tool defeats all five layers simultaneously. An SMT solver can handle the predicates but not the pointer maze. A dynamic tracer can follow the control flow but not reconstruct the function boundaries. A decompiler can guess at types but gets the ABI wrong.
Below the IR: X86RubbishCodePass
Everything above operates on LLVM IR — the platform-independent representation that
sits above instruction selection. Polaris has one more trick: a MachineFunctionPass
that runs after instruction selection, on the final x86 machine instructions (MIR).
At this level, the obfuscation is invisible to IR-level analysis because it doesn't
exist yet when the IR passes run.
X86RubbishCodePass activates per-function via an inline-asm marker
("backend-obfu") and applies three transformations:
1. Junk instruction insertion. The pass walks every instruction and computes
liveness — which registers and flags are dead at each point. It then inserts
random x86 instructions that write only to dead registers: add, sub, xor,
shr, neg, not, rdrand, cmp, test, memory loads from the stack, and more.
Because the destinations are provably dead, the junk cannot affect program behavior.
But a disassembler sees a dense wall of arithmetic that looks indistinguishable from
real computation.
2. Instruction substitution. mov reg, imm is split into two operations
(mov reg, A; add reg, B where A + B = imm, or xor/sub variants). mov reg, reg
is replaced with a temp-register triangle. Memory operands are recomputed through
arithmetic on a temporary register — the base+index+offset addressing is reconstructed
via mov, shl, add/sub, and neg, then the original operand is rewritten to use
the computed pointer. The original instruction is erased.
3. Control flow obfuscation. Basic blocks are split at random points. Some
splits use call/ret pairs instead of jumps — the pass pushes a fake return address
via add qword ptr [rsp], <offset> and issues ret, making the control flow look
like function calls. Between the call and the real continuation, raw .byte
directives inject garbage data that the disassembler tries to decode as instructions,
producing nonsense. Finally, blocks are randomly reordered so the linear layout no
longer matches execution order.
Before and After
A trivial function compiled at -O1 without the pass:
add:
lea eax, [rcx+rdx] ; return a + b
ret
The same function with X86RubbishCodePass active (addresses omitted for
readability):
add:
neg r8b ; junk — r8 is dead
sub r11b, r10b ; junk — r11 is dead
sub rsp, 0 ; junk — nop on stack pointer
dec r8d ; junk
call .split_block ; control flow split via call
; --- dirty bytes: raw data injected between call and continuation ---
mov dword ptr [rbp+2*rdi+0x540D94F6], 0x65BAED59 ; garbage decode
and esp, esp ; garbage decode
insb ; garbage decode
db 67h, 3Fh ; undecoded bytes
sar dword ptr [rbp+41h] ; garbage decode
retf ; garbage decode (not a real return)
; --- actual continuation (reached via return address fixup) ---
add rsp, 0 ; junk
sbb r10, r8 ; junk
ret ; real return from add
.split_block:
add r8d, dword ptr [rsp] ; fixup: adjust return address
movabs rax, 0 ; memory address computation
xor r9d, r9d ; junk
sub rax, 0x45 ; \
mov r11, rsp ; | recomputed memory operand
neg r11 ; | (replaces original addressing)
sub rax, r11 ; /
sub rax, 0 ; junk
add qword ptr [rax+0x45], 0x16 ; return address fixup
clc ; junk — flags are dead
cmp dx, 0xB18 ; junk — flags are dead
lea eax, [rcx+rdx] ; ← the real computation: a + b
shr r11d, 0xB9 ; junk
ret ; return to caller (via fixed address)
Two instructions became 30+, with the real lea eax, [rcx+rdx] buried among junk
operations. The control flow is split via call/ret with dirty bytes in between
that create false instructions when disassembled linearly.
What IDA Pro Sees
IDA's Hex-Rays decompiler produces this for main calling the obfuscated add:
__int64 __fastcall main(int argc, const char **argv, const char **envp)
{
__int64 v5; // rbp
__int64 v6; // rdi
sub_140001033(3, 4);
*(_DWORD *)(v5 + 2 * v6 + 1410176246) = 1706749273;
__asm { insb }
JUMPOUT(0x140001021LL);
}
The function returns 7. The decompiler doesn't know that. It fell through the
call into dirty bytes, decoded garbage as a memory write and an insb, then hit
undecoded bytes and emitted JUMPOUT. The variables v5 and v6 are phantom
registers from the garbage decode — they were never initialized.
Building It
The pass requires a custom LLVM build — it uses private X86 backend headers and
must be compiled into the X86 code generator, not loaded as a plugin. We ported
Polaris's X86RubbishCodePass from LLVM 16 to LLVM 21 and verified it builds and
runs correctly. The port required exactly one API fix:
// LLVM 16: getRegSizeInBits() returns unsigned
return TRI->getRegSizeInBits(*RC);
// LLVM 21: getRegSizeInBits() returns TypeSize — extract the fixed value
return TRI->getRegSizeInBits(*RC).getFixedValue();
The remaining APIs — MachineFunctionPass, LivePhysRegs, BuildMI,
MachineBasicBlock::splitAt, INITIALIZE_PASS — are unchanged between LLVM 16
and 21.
Quick start (requires Visual Studio 2022 and ~30 minutes build time):
# 1. Clone LLVM 21.1.0
git clone --depth 1 --branch llvmorg-21.1.0 https://github.com/llvm/llvm-project.git
# 2. Apply the patch (adds X86RubbishCode.cpp + hooks it into the build)
cd llvm-project
git apply ../shifting-codes-python-port/patches/x86-rubbish-code-llvm21-1-0.patch
# 3. Build (from a VS Developer Command Prompt)
cmake -S llvm -B build -G Ninja \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DLLVM_ENABLE_PROJECTS=clang \
-DCMAKE_BUILD_TYPE=Release
ninja -C build
The patch is in the
patches/
directory of the Shifting.Codes repository. It modifies four files: adds the
ported X86RubbishCode.cpp (with the getFixedValue() fix), registers it in
CMakeLists.txt, declares it in X86.h, and hooks it into X86TargetMachine.cpp.
The pass activates on any function containing an INLINEASM with the symbol name
"backend-obfu". In LLVM IR, the marker is:
call void asm sideeffect "backend-obfu", ""()
Functions without this marker are compiled normally.
Using Polaris Passes
If you followed the Pluto article's setup instructions — Python 3.12+, UV, LLVM 21 — you already have everything you need. The Polaris passes are in the same package:
import llvm_nanobind as llvm
from shifting_codes.passes import PassPipeline
from shifting_codes.passes.substitution import SubstitutionPass
from shifting_codes.passes.mba_obfuscation import MBAObfuscationPass
from shifting_codes.passes.bogus_control_flow import BogusControlFlowPass
from shifting_codes.passes.flattening import FlatteningPass
from shifting_codes.passes.global_encryption import GlobalEncryptionPass
from shifting_codes.passes.indirect_call import IndirectCallPass
from shifting_codes.passes.indirect_branch import IndirectBranchPass
from shifting_codes.passes.alias_access import AliasAccessPass
from shifting_codes.passes.custom_cc import CustomCCPass
from shifting_codes.passes.merge_function import MergeFunctionPass
from shifting_codes.utils.crypto import CryptoRandom
rng = CryptoRandom(seed=42)
with llvm.create_context() as ctx:
mod = llvm.parse_bitcode_file("your_code.bc", ctx)
pipeline = PassPipeline()
# Pluto foundations
pipeline.add(SubstitutionPass(rng=rng))
pipeline.add(MBAObfuscationPass(rng=rng))
# Polaris upgrades
pipeline.add(BogusControlFlowPass(rng=rng))
pipeline.add(FlatteningPass(rng=rng))
pipeline.add(GlobalEncryptionPass(rng=rng))
pipeline.add(IndirectCallPass(rng=rng))
# Polaris new passes
pipeline.add(IndirectBranchPass(rng=rng))
pipeline.add(AliasAccessPass(rng=rng))
pipeline.add(CustomCCPass(rng=rng))
pipeline.add(MergeFunctionPass(rng=rng))
pipeline.run(mod, ctx)
mod.write_bitcode_to_file("obfuscated.bc")
Credits
Polaris-Obfuscator — designed and authored by za233. The pass designs — particularly the modular-arithmetic predicates, dominance-based key schedule, and alias access graph — are significantly more sophisticated than their Pluto predecessors. Well worth reading the C++ source.
Pluto — authored by bluesadi. The foundation that Polaris builds on. Clean, well-commented, and still the best starting point for understanding LLVM obfuscation.
llvm-nanobind — the binding library that makes both articles possible. Special thanks to mrexodia for maintaining Python bindings against a C++ API that actively resists being bound.
Note on scope: Polaris ships two MIR-level passes — X86RubbishCodePass and
AArch64RubbishCodePass — that insert junk machine instructions after instruction
selection. Only the X86 version is implemented; the AArch64 pass is a stub. The X86
pass has been ported to LLVM 21 and is documented above; unlike the IR-level passes
(which run via llvm-nanobind in Python), the MIR pass requires a custom LLVM build
because it uses private X86 backend headers.
Shifting.Codes is provided for legitimate use cases including software protection, security research, CTF challenge authoring, and compiler education. The authors make no representations regarding fitness for any particular purpose and accept no liability for any misuse or damages arising from the use of this software. Use is entirely at your own risk and responsibility.
