Raising the Floor: Polaris and the Next Generation of LLVM Obfuscation

In the previous article, we walked through Pluto — six LLVM obfuscation passes that turn readable IR into something a decompiler would rather not think about. We ended with a teaser: Pluto's successor, Polaris, pushes the techniques further.

Here is the thing about Pluto's obfuscation, though. A sufficiently motivated analyst with Z3 and an afternoon can unpick a lot of it. The opaque predicates are algebraic identities any solver can discharge. The flattening uses plaintext state constants you can grep out of the binary. The global encryption decrypts at a single site. None of this makes Pluto bad — it makes it a first draft.

Polaris is the second draft. Where Pluto hides individual constructs behind thin disguises, Polaris makes the structural relationships between blocks, variables, and functions unrecoverable. The reverse engineer does not just need a solver — they need to reconstruct dominance trees, key schedules, pointer graphs, and ABI semantics that no longer exist in the output. That is a different category of problem.

What Is Polaris?

Polaris is a suite of LLVM obfuscation passes written by za233 (not bluesadi — different author, different repository). Where Pluto targets LLVM 14.0.6, Polaris targets LLVM 16.0.6 and ships 10 IR-level passes plus 2 MIR/backend passes (X86 and AArch64 "rubbish code" junk-instruction inserters, though only the X86 version is implemented — the AArch64 one is a stub). Pluto's own README points to Polaris as its successor.

Like Pluto, Polaris is frozen in time. LLVM 16 is not LLVM 21, and the C++ API drift means it will not build against a modern toolchain without porting work. The same vintage-car problem, just a slightly newer model year. Shifting.Codes ports 8 of Polaris's IR-level passes to Python, running against LLVM 21 today. (The remaining two — Substitution and LinearMBA — overlap with Pluto's versions already ported in the previous article.)

The Upgrades: Four Passes, Rebuilt

Polaris takes four of Pluto's passes and replaces their weak points. In each case, the transformation is semantically the same — the output still does what the input did — but the obfuscation is qualitatively harder to reverse.

The IR examples below are lightly simplified for readability — actual output uses random constants and generated names — but the instruction sequences and structure match the Python implementation exactly.

Bogus Control Flow: From Algebra Homework to Modular Arithmetic

Pluto's weakness: The opaque predicate y < 10 || x*(x+1) % 2 == 0 is an algebraic identity. Feed it to Z3, get always true in milliseconds. The dead path is immediately identifiable.

Polaris's fix: Two i64 allocas (bcf.var and bcf.var0) are initialized to the same prime value chosen from a table of 1,087 primes (1009–9973). The predicate bcf.var == bcf.var0 is always true. To keep the invariant alive across blocks, Polaris inserts a modular arithmetic update before every terminator:

Var0 = ((a * Var0) mod m - b) mod m

The constants a, b, and m are chosen per-block so that a * x_val ≡ b (mod m) — meaning the update always maps bcf.var0 back to its initial value. The modular inverse is computed at compile time via the extended Euclidean algorithm.

Each processable block is split into head / body / tail, a clone of the body is created, and the head branches on the opaque predicate — real path to body, dead path to clone. The clone unconditionally loops back to body, creating a cycle that never executes but looks plausible to a disassembler.

After BCF (showing one block's transformation):

pos:                               ; head — opaque branch
  %bcf.lhs = load i64, ptr %bcf.var
  %bcf.rhs = load i64, ptr %bcf.var0
  %bcf.cmp = icmp eq i64 %bcf.lhs, %bcf.rhs  ; always true
  br i1 %bcf.cmp, label %bcf.body.1, label %bcf.clone.1

bcf.body.1:                        ; real computation + modular state update
  %r1         = mul i32 %x, 2
  %bcf.v      = load i64, ptr %bcf.var0
  %bcf.av     = mul i64 911, %bcf.v            ; a * Var0
  %bcf.avmod  = urem i64 %bcf.av, 7919        ; mod m
  %bcf.sub    = sub i64 %bcf.avmod, 2753       ; - b
  %bcf.result = urem i64 %bcf.sub, 7919       ; mod m again
  store i64 %bcf.result, ptr %bcf.var0         ; invariant restored
  %bcf.cmp2   = icmp eq i64 %bcf.lhs, %bcf.result  ; always true
  br i1 %bcf.cmp2, label %bcf.tail.1, label %bcf.clone.1

bcf.tail.1:
  ret i32 %r1

bcf.clone.1:                       ; dead block — never reached
  %r1.c = mul i32 %x, 2           ; cloned computation
  br label %bcf.body.1             ; loops back, looks like real control flow

Why it is harder: A static analyzer must perform whole-function data-flow analysis to track the modular arithmetic chain across every block and prove the invariant holds. Z3 can still solve it in theory, but the solver needs to be told what to solve — and the predicate no longer announces itself as an algebraic identity.

Control Flow Flattening: Encrypted State Dispatch

Pluto's weakness: The dispatch switch uses plaintext case constants. grep the binary for the state values, match them to switch targets, and the original CFG falls out.

Polaris's fix: State values are XOR-encrypted with per-block keys derived from the dominator tree. Each block gets a random key, and a block's effective decryption key is the XOR of all keys from blocks that dominate it. A __cff_update_key helper function XORs the key array on first visit, propagating the key schedule as execution flows through the dominance hierarchy.

The pass also allocates per-block visited flags ([N x i8]) and per-block global arrays listing which blocks each block dominates, so the key update knows which slots to XOR.

After flattening (simplified — showing the dispatch and one block):

define i32 @loop_sum(i32 %n) {
entry:
  %i.demoted   = alloca i32               ; PHI nodes demoted to stack
  %acc.demoted = alloca i32
  store i32 0, ptr %i.demoted
  store i32 0, ptr %acc.demoted
  %cff.state   = alloca i32               ; encrypted state variable
  %cff.keys    = alloca [2 x i32]         ; per-block XOR key slots
  %cff.visited = alloca [2 x i8]          ; first-visit flags
  store i32 <loop_state>, ptr %cff.state  ; initial (unencrypted) state
  br label %cff.dispatch

cff.dispatch:
  %cff.sw = load i32, ptr %cff.state
  switch i32 %cff.sw, label %cff.default [
    i32 <loop_state>, label %loop
    i32 <exit_state>, label %exit
  ]

loop:
  %i        = load i32, ptr %i.demoted
  %acc      = load i32, ptr %acc.demoted
  %i.next   = add i32 %i, 1
  %acc.next = add i32 %acc, %i
  store i32 %i.next,   ptr %i.demoted
  store i32 %acc.next, ptr %acc.demoted
  %done     = icmp eq i32 %i.next, %n
  ; dominance-based key update — XOR key slots for dominated blocks on first visit
  call void @__cff_update_key(i8 %vis, i32 1, ptr @.cff.dom.loop_sum.0,
                               ptr %cff.keys, i32 <key_list[loop]>)
  store i8 1, ptr %cff.vptr                ; mark visited
  ; select next encrypted state, XOR-decrypt with this block's key slot
  %cff.sel = select i1 %done, i32 <exit_enc>, i32 <loop_enc>
  %cff.key = load i32, ptr %cff.kptr
  %cff.enc = xor i32 %cff.key, %cff.sel
  store i32 %cff.enc, ptr %cff.state
  br label %cff.dispatch
; ...
}

define private void @__cff_update_key(i8 %flag, i32 %len, ptr %posArray,
                                       ptr %keyArray, i32 %num) { ... }

Why it is harder: The case constants in the switch are the decrypted states. What gets stored to cff.state is their XOR with a key that only becomes correct after all dominating blocks have executed. Recovering the original CFG requires reconstructing both the dominator tree and the key schedule — and neither is present in the binary as a static artifact.

Global Encryption: Scattered Decryption

Pluto's weakness: Each encrypted global is decrypted inline at one site — find the XOR loop, find the key, done.

Polaris's fix: Instead of decrypting in place, Polaris creates a stack-local copy of the global in every function that uses it. Each copy is independently decrypted via a shared helper function __obfu_globalenc_dec(ptr %data, ptr %key, i64 %len, i64 %keyLen), which runs a byte-level XOR loop.

Only is_global_constant globals with internal/private linkage are encrypted — a deliberate safety measure that avoids corrupting mutable globals or externally visible symbols. The pass uses a random 32-bit key per global, cycling the 4-byte key across the data via encrypt_bytes().

After global encryption (showing per-function decryption):

; Global — encrypted at compile time, no longer marked constant
@secret_table = internal global [8 x i32] <encrypted_initializer>

define i32 @lookup(i32 %idx) {
entry:
  ; stack-local copy of the global
  %local = alloca [8 x i32]
  ; byte-by-byte copy from global to stack
  %g0 = load i8, ptr @secret_table
  store i8 %g0, ptr %local
  %g1 = load i8, ptr getelementptr (i8, ptr @secret_table, i64 1)
  store i8 %g1, ptr getelementptr (i8, ptr %local, i64 1)
  ; ... (32 bytes total)
  ; decrypt the local copy
  call void @__obfu_globalenc_dec(ptr %local, ptr @.genc.key.0, i64 32, i64 4)
  ; use the decrypted local copy
  %ptr = getelementptr [8 x i32], ptr %local, i32 0, i32 %idx
  %val = load i32, ptr %ptr
  ret i32 %val
}

@.genc.key.0 = private global [4 x i8] c"\AB\CD\EF\12"

define private void @__obfu_globalenc_dec(ptr %data, ptr %key,
                                           i64 %len, i64 %keyLen) {
  ; data[i] ^= key[i % keyLen] for i in 0..len
  ...
}

Why it is harder: If three functions use @secret_table, three independent stack copies exist, each decrypted separately. An analyst cannot set a single breakpoint on "the decryption" — there are N decryption sites, one per consumer, with no shared mutable state to intercept.

Indirect Call: Per-Site Masking

Pluto's weakness: Every indirect call loads from a shared global variable per function. Find the GV table, resolve all the pointers, and the original call graph is back.

Polaris's fix: Each call site gets its own private global variable and a unique random 32-bit mask applied via pointer arithmetic:

; Before: direct call
%result = call i32 @helper(i32 %x)

; After: per-site indirect call with pointer masking
%loaded   = load ptr, ptr @.indcall.0          ; private GV holding @helper
%int_val  = ptrtoint ptr %loaded to i64
%added    = add i64 %int_val, 749312581        ; + mask (random per site)
%unmasked = sub i64 %added, 749312581          ; - mask (cancels out)
%func_ptr = inttoptr i64 %unmasked to ptr
%result   = call i32 %func_ptr(i32 %x)        ; indirect call through recovered pointer

@.indcall.0 = private global ptr @helper

The add/sub cancel out at runtime, but a static analyzer sees an integer computation producing the call target — it cannot resolve the pointer without evaluating the arithmetic or recognizing the identity.

Why it is harder: With Pluto, one global per function means N call sites share one pointer — find it once, resolve them all. With Polaris, N call sites mean N globals and N masks. Each site looks independent. Automated CFG recovery that searches for a function-pointer table comes up empty.

The New Passes: Four Techniques Pluto Never Had

Polaris does not just upgrade existing passes. It introduces four entirely new transformations that attack parts of the binary Pluto never touched: branch targets, local variable layout, calling conventions, and function boundaries.

Indirect Branch: Jump Tables on the Stack

Every conditional br instruction is a gift to a disassembler — it reveals two successor blocks, and the condition tells you why the program goes one way or the other. Indirect Branch takes that gift back.

For each branch, the pass allocates a [2 x ptr] array on the stack and stores blockaddress constants for the true and false targets. The branch condition is inverted, zero-extended, and then run through an MBA-style obfuscation that computes the same index via bitwise decomposition:

index    = zext(not(cond))
xor1     = index ^ rand
result   = (~xor1 & rand) | (xor1 & ~rand)   ; == index ^ rand ^ rand == index

This is the identity a ^ b = (~a & b) | (a & ~b) applied twice — the result is just index, but the IR shows five bitwise operations with a random constant mixed in.

Before and after:

; Before: conditional branch
  %cond = icmp sgt i32 %x, 0
  br i1 %cond, label %then, label %else

; After: indirect branch through obfuscated jump table
  %cond = icmp sgt i32 %x, 0
  ; build jump table on the stack
  %sibr.table = alloca [2 x ptr]
  %slot0 = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 0
  store ptr blockaddress(@func, %else), ptr %slot0       ; false target at [0]
  %slot1 = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 1
  store ptr blockaddress(@func, %then), ptr %slot1       ; true target at [1]
  ; obfuscated index computation
  %not.cond = xor i1 %cond, true                        ; invert
  %idx      = zext i1 %not.cond to i32                  ; 0 or 1
  %xor1     = xor i32 %idx, 839201457                   ; mix with random
  %not.xor1 = xor i32 %xor1, -1                         ; ~xor1
  %not.rand = xor i32 839201457, -1                      ; ~rand
  %left     = and i32 %not.xor1, 839201457               ; (~xor1 & rand)
  %right    = and i32 %xor1, %not.rand                   ; (xor1 & ~rand)
  %final    = or i32 %left, %right                       ; == idx
  ; load target and jump
  %tgt.ptr  = getelementptr [2 x ptr], ptr %sibr.table, i32 0, i32 %final
  %tgt      = load ptr, ptr %tgt.ptr
  indirectbr ptr %tgt, [label %then, label %else]

Impact: Linear-sweep and recursive-descent disassemblers both choke on indirectbr. IDA shows jmp rax with no cross-references. The jump table is on the stack, not in .rodata, so table-recovery heuristics do not find it. And the index computation looks like it depends on a runtime value even though it is a deterministic function of the branch condition.

Alias Access: The Pointer Maze

Local variables — alloca instructions — are the easiest things in LLVM IR to understand. They have a name, a type, and a fixed set of loads and stores. Alias Access makes all of that go away.

The pass builds a multi-level indirection graph:

Raw nodes. Original allocas are packed into randomly-structured structs with ptr-typed padding fields. An alloca that was i32 might end up at index 5 of a 7-field struct where the other 6 fields are pointer-sized noise.
Transition nodes. 3 × N additional structs are created, each a 6-slot pointer struct ({ ptr, ptr, ptr, ptr, ptr, ptr }). Each slot randomly points to another transition node or to a raw node, forming a directed graph.
Getter functions. One private function per slot index (0–5) is generated: __obfu_aa_getter_N(ptr) → ptr, which loads slot N from a transition struct and returns the pointer.
Access replacement. Every use of an original alloca is replaced with a chain of getter calls that traverse the graph from a transition node to the raw node, followed by a GEP to the element position within the struct.

Before and after:

; Before: simple alloca usage
  %x = alloca i32
  store i32 42, ptr %x
  %val = load i32, ptr %x

; After: struct indirection through getter chain
  ; raw node: %x packed into a struct at index 3
  %raw.0 = alloca { ptr, ptr, ptr, i32, ptr }
  ; transition nodes wired together
  %trans.0 = alloca { ptr, ptr, ptr, ptr, ptr, ptr }
  %trans.1 = alloca { ptr, ptr, ptr, ptr, ptr, ptr }
  ; edge: trans.1 slot 2 → trans.0
  %t1.s2 = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %trans.1, i32 0, i32 2
  store ptr %trans.0, ptr %t1.s2
  ; edge: trans.0 slot 4 → raw.0
  %t0.s4 = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %trans.0, i32 0, i32 4
  store ptr %raw.0, ptr %t0.s4
  ; ... (other edges omitted)

  ; access %x: traverse trans.1 → slot 2 → trans.0 → slot 4 → raw.0 → index 3
  %p0 = call ptr @__obfu_aa_getter_2(ptr %trans.1)   ; → trans.0
  %p1 = call ptr @__obfu_aa_getter_4(ptr %p0)         ; → raw.0
  %elem = getelementptr { ptr, ptr, ptr, i32, ptr }, ptr %p1, i32 0, i32 3
  store i32 42, ptr %elem
  ; ... (load follows the same chain)

define private ptr @__obfu_aa_getter_2(ptr %node) {
  %slot = getelementptr { ptr, ptr, ptr, ptr, ptr, ptr }, ptr %node, i32 0, i32 2
  %val = load ptr, ptr %slot
  ret ptr %val
}

Impact: Type recovery — the bread and butter of decompilers like Hex-Rays and Ghidra — fails completely. Every local variable is buried inside a random struct at a random offset, accessed through a chain of pointer dereferences that looks different at every use site. Stack variable correlation, the technique decompilers use to give names and types to locals, sees a forest of unrelated pointer loads.

Custom Calling Convention

This is the simplest pass in the set, and possibly the most cost-effective in terms of damage per line of code.

Polaris randomly assigns a non-standard calling convention to every internal or private function, selected from a pool of seven:

Convention	What Changes
`fastcc`	Arguments in registers, tail-call eligible
`coldcc`	Callee-saved everything, optimized for unlikely paths
`preserve_mostcc`	Almost all registers callee-saved
`preserve_allcc`	All registers callee-saved
`x86_regcallcc`	Microsoft `__regcall` — up to 16 register args
`x86_64_sysvcc`	System V AMD64 ABI
`win64cc`	Windows x64 ABI

The pass then scans the module and updates every call site to match:

; Before
define internal void @helper(i32 %x) {
  ...
}
call void @helper(i32 %arg)

; After
define internal fastcc void @helper(i32 %x) {
  ...
}
call fastcc void @helper(i32 %arg)

Impact: Every decompiler assumes the platform ABI by default. When @helper uses fastcc but the decompiler assumes win64cc, the register allocation looks wrong, the stack frame layout is misinterpreted, arguments appear in the wrong positions, and return values go missing. The function "decompiles" but the output is semantically garbage. Multiply this across every internal function in the module and the decompiler output becomes unreliable everywhere.

Merge Function: Erasing Boundaries

Function boundaries are the highest-value landmarks in a binary. Remove them and the reverse engineer loses the ability to reason about the program in units smaller than "everything."

Merge Function works in three phases:

Phase 1 — Wrap. Every function is wrapped in a void-returning shell. Non-void functions get an extra ptr parameter; the wrapper stores the return value through that pointer instead of returning it. The original function's body is replaced with a thin stub that calls the wrapper and loads the result.

Phase 2 — Merge. All wrappers are cloned into a single function __merged_function(i32 %selector, ...). The first parameter is a selector index; the remaining parameters are the union of all wrapper parameters (unused slots filled with undef). A switch on %selector dispatches to the correct cloned entry block.

Phase 3 — Replace. Every call site in the module is rewritten to call __merged_function with the appropriate selector and arguments at the correct offsets. The wrapper functions are erased.

Before and after:

; Before: two simple functions
define i32 @add(i32 %a, i32 %b) {
  %r = add i32 %a, %b
  ret i32 %r
}

define i32 @mul(i32 %a, i32 %b) {
  %r = mul i32 %a, %b
  ret i32 %r
}

; call sites
%sum  = call i32 @add(i32 3, i32 4)
%prod = call i32 @mul(i32 5, i32 6)

; After: single merged dispatcher
define void @__merged_function(i32 %selector, i32 %a0, i32 %b0, ptr %ret0,
                                               i32 %a1, i32 %b1, ptr %ret1) {
entry:
  switch i32 %selector, label %default [
    i32 0, label %add.entry
    i32 1, label %mul.entry
  ]

add.entry:
  %r0 = add i32 %a0, %b0
  store i32 %r0, ptr %ret0
  br label %return

mul.entry:
  %r1 = mul i32 %a1, %b1
  store i32 %r1, ptr %ret1
  br label %return

return:
  ret void

default:
  unreachable
}

; call sites become:
%ret.alloca = alloca i32
call void @__merged_function(i32 0, i32 3, i32 4, ptr %ret.alloca,
                              i32 undef, i32 undef, ptr undef)
%sum = load i32, ptr %ret.alloca

call void @__merged_function(i32 1, i32 undef, i32 undef, ptr undef,
                              i32 5, i32 6, ptr %ret.alloca)
%prod = load i32, ptr %ret.alloca

Impact: Function names, boundaries, and signatures are gone. The decompiler sees one enormous function with a switch statement and a parameter list that makes no sense. Cross-referencing "who calls add" returns "one function calls __merged_function with selector 0" — which is useless without knowing the selector mapping. Combine this with Indirect Call and the selector itself is masked behind pointer arithmetic.

Stacking the Layers

Each Polaris pass independently defeats a different class of analysis tool. Together, they form five defense layers:

Control flow — Bogus Control Flow + Flattening + Indirect Branch. The CFG is flattened into a switch dispatcher, the state values are encrypted, the branches are indirect, and the predicates require whole-function data-flow analysis to resolve. Both IDA's recursive descent and Ghidra's decompiler produce nonsense.
Data access — Alias Access + Global Encryption. Local variables are buried in pointer mazes and globals are per-function stack copies decrypted through a shared helper. Type recovery fails and memory access patterns become opaque.
ABI — Custom Calling Convention. Register allocation and stack layout assumptions are wrong for every internal function. Decompiler output looks syntactically valid but is semantically incorrect.
Symbol erasure — Merge Function + Indirect Call. Function boundaries, names, and call-graph edges are gone. The binary looks like one function calling itself through pointer arithmetic.
Machine code — X86RubbishCodePass. After instruction selection, junk instructions fill dead registers, instruction substitutions obscure operands, and dirty bytes between call/ret splits break linear disassembly. This layer operates below the IR — none of the above passes can see or undo it.

No single tool defeats all five layers simultaneously. An SMT solver can handle the predicates but not the pointer maze. A dynamic tracer can follow the control flow but not reconstruct the function boundaries. A decompiler can guess at types but gets the ABI wrong.

Below the IR: X86RubbishCodePass

Everything above operates on LLVM IR — the platform-independent representation that sits above instruction selection. Polaris has one more trick: a MachineFunctionPass that runs after instruction selection, on the final x86 machine instructions (MIR). At this level, the obfuscation is invisible to IR-level analysis because it doesn't exist yet when the IR passes run.

X86RubbishCodePass activates per-function via an inline-asm marker ("backend-obfu") and applies three transformations:

1. Junk instruction insertion. The pass walks every instruction and computes liveness — which registers and flags are dead at each point. It then inserts random x86 instructions that write only to dead registers: add, sub, xor, shr, neg, not, rdrand, cmp, test, memory loads from the stack, and more. Because the destinations are provably dead, the junk cannot affect program behavior. But a disassembler sees a dense wall of arithmetic that looks indistinguishable from real computation.

2. Instruction substitution. mov reg, imm is split into two operations (mov reg, A; add reg, B where A + B = imm, or xor/sub variants). mov reg, reg is replaced with a temp-register triangle. Memory operands are recomputed through arithmetic on a temporary register — the base+index+offset addressing is reconstructed via mov, shl, add/sub, and neg, then the original operand is rewritten to use the computed pointer. The original instruction is erased.

3. Control flow obfuscation. Basic blocks are split at random points. Some splits use call/ret pairs instead of jumps — the pass pushes a fake return address via add qword ptr [rsp], <offset> and issues ret, making the control flow look like function calls. Between the call and the real continuation, raw .byte directives inject garbage data that the disassembler tries to decode as instructions, producing nonsense. Finally, blocks are randomly reordered so the linear layout no longer matches execution order.

Before and After

A trivial function compiled at -O1 without the pass:

add:
  lea     eax, [rcx+rdx]      ; return a + b
  ret

The same function with X86RubbishCodePass active (addresses omitted for readability):

add:
  neg     r8b                          ; junk — r8 is dead
  sub     r11b, r10b                   ; junk — r11 is dead
  sub     rsp, 0                       ; junk — nop on stack pointer
  dec     r8d                          ; junk
  call    .split_block                 ; control flow split via call
  ; --- dirty bytes: raw data injected between call and continuation ---
  mov     dword ptr [rbp+2*rdi+0x540D94F6], 0x65BAED59  ; garbage decode
  and     esp, esp                     ; garbage decode
  insb                                 ; garbage decode
  db      67h, 3Fh                     ; undecoded bytes
  sar     dword ptr [rbp+41h]          ; garbage decode
  retf                                 ; garbage decode (not a real return)
  ; --- actual continuation (reached via return address fixup) ---
  add     rsp, 0                       ; junk
  sbb     r10, r8                      ; junk
  ret                                  ; real return from add
.split_block:
  add     r8d, dword ptr [rsp]         ; fixup: adjust return address
  movabs  rax, 0                       ; memory address computation
  xor     r9d, r9d                     ; junk
  sub     rax, 0x45                    ; \
  mov     r11, rsp                     ;  | recomputed memory operand
  neg     r11                          ;  | (replaces original addressing)
  sub     rax, r11                     ; /
  sub     rax, 0                       ; junk
  add     qword ptr [rax+0x45], 0x16   ; return address fixup
  clc                                  ; junk — flags are dead
  cmp     dx, 0xB18                    ; junk — flags are dead
  lea     eax, [rcx+rdx]              ; ← the real computation: a + b
  shr     r11d, 0xB9                   ; junk
  ret                                  ; return to caller (via fixed address)

Two instructions became 30+, with the real lea eax, [rcx+rdx] buried among junk operations. The control flow is split via call/ret with dirty bytes in between that create false instructions when disassembled linearly.

What IDA Pro Sees

IDA's Hex-Rays decompiler produces this for main calling the obfuscated add:

__int64 __fastcall main(int argc, const char **argv, const char **envp)
{
  __int64 v5; // rbp
  __int64 v6; // rdi

  sub_140001033(3, 4);
  *(_DWORD *)(v5 + 2 * v6 + 1410176246) = 1706749273;
  __asm { insb }
  JUMPOUT(0x140001021LL);
}

The function returns 7. The decompiler doesn't know that. It fell through the call into dirty bytes, decoded garbage as a memory write and an insb, then hit undecoded bytes and emitted JUMPOUT. The variables v5 and v6 are phantom registers from the garbage decode — they were never initialized.

Building It

The pass requires a custom LLVM build — it uses private X86 backend headers and must be compiled into the X86 code generator, not loaded as a plugin. We ported Polaris's X86RubbishCodePass from LLVM 16 to LLVM 21 and verified it builds and runs correctly. The port required exactly one API fix:

// LLVM 16: getRegSizeInBits() returns unsigned
return TRI->getRegSizeInBits(*RC);

// LLVM 21: getRegSizeInBits() returns TypeSize — extract the fixed value
return TRI->getRegSizeInBits(*RC).getFixedValue();

The remaining APIs — MachineFunctionPass, LivePhysRegs, BuildMI, MachineBasicBlock::splitAt, INITIALIZE_PASS — are unchanged between LLVM 16 and 21.

Quick start (requires Visual Studio 2022 and ~30 minutes build time):

# 1. Clone LLVM 21.1.0
git clone --depth 1 --branch llvmorg-21.1.0 https://github.com/llvm/llvm-project.git

# 2. Apply the patch (adds X86RubbishCode.cpp + hooks it into the build)
cd llvm-project
git apply ../shifting-codes-python-port/patches/x86-rubbish-code-llvm21-1-0.patch

# 3. Build (from a VS Developer Command Prompt)
cmake -S llvm -B build -G Ninja \
  -DLLVM_TARGETS_TO_BUILD=X86 \
  -DLLVM_ENABLE_PROJECTS=clang \
  -DCMAKE_BUILD_TYPE=Release
ninja -C build

The patch is in the patches/ directory of the Shifting.Codes repository. It modifies four files: adds the ported X86RubbishCode.cpp (with the getFixedValue() fix), registers it in CMakeLists.txt, declares it in X86.h, and hooks it into X86TargetMachine.cpp.

The pass activates on any function containing an INLINEASM with the symbol name "backend-obfu". In LLVM IR, the marker is:

call void asm sideeffect "backend-obfu", ""()

Functions without this marker are compiled normally.

Using Polaris Passes

If you followed the Pluto article's setup instructions — Python 3.12+, UV, LLVM 21 — you already have everything you need. The Polaris passes are in the same package:

import llvm_nanobind as llvm
from shifting_codes.passes import PassPipeline
from shifting_codes.passes.substitution import SubstitutionPass
from shifting_codes.passes.mba_obfuscation import MBAObfuscationPass
from shifting_codes.passes.bogus_control_flow import BogusControlFlowPass
from shifting_codes.passes.flattening import FlatteningPass
from shifting_codes.passes.global_encryption import GlobalEncryptionPass
from shifting_codes.passes.indirect_call import IndirectCallPass
from shifting_codes.passes.indirect_branch import IndirectBranchPass
from shifting_codes.passes.alias_access import AliasAccessPass
from shifting_codes.passes.custom_cc import CustomCCPass
from shifting_codes.passes.merge_function import MergeFunctionPass
from shifting_codes.utils.crypto import CryptoRandom

rng = CryptoRandom(seed=42)

with llvm.create_context() as ctx:
    mod = llvm.parse_bitcode_file("your_code.bc", ctx)

    pipeline = PassPipeline()
    # Pluto foundations
    pipeline.add(SubstitutionPass(rng=rng))
    pipeline.add(MBAObfuscationPass(rng=rng))
    # Polaris upgrades
    pipeline.add(BogusControlFlowPass(rng=rng))
    pipeline.add(FlatteningPass(rng=rng))
    pipeline.add(GlobalEncryptionPass(rng=rng))
    pipeline.add(IndirectCallPass(rng=rng))
    # Polaris new passes
    pipeline.add(IndirectBranchPass(rng=rng))
    pipeline.add(AliasAccessPass(rng=rng))
    pipeline.add(CustomCCPass(rng=rng))
    pipeline.add(MergeFunctionPass(rng=rng))

    pipeline.run(mod, ctx)
    mod.write_bitcode_to_file("obfuscated.bc")

Credits

Polaris-Obfuscator — designed and authored by za233. The pass designs — particularly the modular-arithmetic predicates, dominance-based key schedule, and alias access graph — are significantly more sophisticated than their Pluto predecessors. Well worth reading the C++ source.

Pluto — authored by bluesadi. The foundation that Polaris builds on. Clean, well-commented, and still the best starting point for understanding LLVM obfuscation.

llvm-nanobind — the binding library that makes both articles possible. Special thanks to mrexodia for maintaining Python bindings against a C++ API that actively resists being bound.

Note on scope: Polaris ships two MIR-level passes — X86RubbishCodePass and AArch64RubbishCodePass — that insert junk machine instructions after instruction selection. Only the X86 version is implemented; the AArch64 pass is a stub. The X86 pass has been ported to LLVM 21 and is documented above; unlike the IR-level passes (which run via llvm-nanobind in Python), the MIR pass requires a custom LLVM build because it uses private X86 backend headers.

Shifting.Codes is provided for legitimate use cases including software protection, security research, CTF challenge authoring, and compiler education. The authors make no representations regarding fitness for any particular purpose and accept no liability for any misuse or damages arising from the use of this software. Use is entirely at your own risk and responsibility.

What Is Polaris?#

The Upgrades: Four Passes, Rebuilt#

Bogus Control Flow: From Algebra Homework to Modular Arithmetic#

Control Flow Flattening: Encrypted State Dispatch#

Global Encryption: Scattered Decryption#

Indirect Call: Per-Site Masking#

The New Passes: Four Techniques Pluto Never Had#

Indirect Branch: Jump Tables on the Stack#

Alias Access: The Pointer Maze#

Custom Calling Convention#

Merge Function: Erasing Boundaries#

Stacking the Layers#

Below the IR: X86RubbishCodePass#

Before and After#

What IDA Pro Sees#

Building It#

Using Polaris Passes#

Credits#

Stay in the loop

What Is Polaris?

The Upgrades: Four Passes, Rebuilt

Bogus Control Flow: From Algebra Homework to Modular Arithmetic

Control Flow Flattening: Encrypted State Dispatch

Global Encryption: Scattered Decryption

Indirect Call: Per-Site Masking

The New Passes: Four Techniques Pluto Never Had

Indirect Branch: Jump Tables on the Stack

Alias Access: The Pointer Maze

Custom Calling Convention

Merge Function: Erasing Boundaries

Stacking the Layers

Below the IR: X86RubbishCodePass

Before and After

What IDA Pro Sees

Building It

Using Polaris Passes

Credits