Call-Stack Laundering: Registration-Free COM as an Execution Primitive

The Problem

In 2019 I wrote about Registration-Free COM loading as a way for operators to avoid registry writes and sidestep the LoadLibrary + GetProcAddress combo that EDRs flag. The core technique holds up, but the detection story was incomplete in ways that matter operationally. This post rebuilds the topic from the ground up, corrects those gaps, and introduces more progressive loading variants that span the tradeoff space between simplicity and forensic stealth.


What Registration-Free COM Is

A COM server — whether a DLL or an EXE — is normally made discoverable by writing its CLSID into HKCR\CLSID\{...}\InprocServer32 via regsvr32. When a client calls CoGetClassObject, the COM runtime reads that key, finds the server binary, loads it, and returns the factory interface.

Registration-Free COM replaces the registry lookup with a Windows Activation Context (SxS manifest). The CLSID-to-file mapping lives in an XML manifest rather than the registry. The manifest can be a file on disk or embedded directly into the binary as a Win32 resource.

1
2
3
4
5
6
<assembly manifestVersion="1.0">
  <assemblyIdentity type="win32" name="Server" version="1.0.0.0" />
  <file name="Server.exe">
    <comClass clsid="{5d8a7d33-059f-418a-8d77-5f3944d63b6d}" threadingModel="Both" />
  </file>
</assembly>

No registry key is written. No elevation is required. The manifest above maps the CLSID to Server.exe — meaning the host can resolve a COM factory from itself (as we will see in Case A), or point to a DLL on disk.

flowchart LR subgraph trad["Traditional COM"] direction TB T1["Client\nCoGetClassObject"] --> T2["Registry\nHKCR\\CLSID\\{...}\\InprocServer32"] T2 --> T3["combase.dll\nLoadLibraryExW(server.dll)"] end subgraph regfree["Registration-Free COM"] direction TB R1["Client\nCoGetClassObject"] --> R2["SxS Manifest\nCLSID → file mapping"] R2 --> R3["combase.dll\nLoadLibraryExW(server.dll)"] end

The Contract: One Header, Two Independent Binaries

Before the loading variants, the architecture matters. The entire technique rests on a shared interface definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// The CLSID carrier — __uuidof(Det) is what CoGetClassObject takes
struct __declspec(uuid("5d8a7d33-059f-418a-8d77-5f3944d63b6d")) Det;

// Detonation type — controls what Detonate() simulates
enum DetonationType : DWORD
{
    Det_MessageBox = 0x1,   // show a message box — baseline simulation action
};

// Versioned context passed to Detonate(). nullptr = use defaults.
// Caller sets cbSize = sizeof(DetContext); implementor reads only fields within that size.
// Matches the Windows API cbSize convention (e.g. STARTUPINFOEX).
struct DetContext
{
    DWORD          cbSize;
    DetonationType dwType;  // v1
};

// The payload interface — what the host calls after loading
struct __declspec(uuid("5a196c0f-e296-4b35-9249-f3d7ad5999fd")) IDet : IUnknown
{
    virtual void __stdcall Detonate(const DetContext* ctx) = 0;
    virtual void __stdcall EndDetonate()                   = 0;
};

// The factory interface — what DllGetClassObject hands back
struct __declspec(uuid("9362f817-85b6-4a80-81de-772c792922ff")) IDetFactory : IUnknown
{
    virtual HRESULT __stdcall CreateDet(IDet** result) = 0;
};

This header is the only compile-time dependency between the host (WinMain) and any payload. The host never names the concrete implementation type. It never calls new Det. It holds only interface pointers and talks through vtables. This is not incidental — it is the property that breaks the static call graph (all too visible in a regular dll load).

The host side reduces to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CoGetClassObject(__uuidof(Det), CLSCTX_INPROC_SERVER, nullptr,
                 __uuidof(armory), (void**)armory.GetAddressOf());

armory->CreateDet(det.GetAddressOf());

DetContext ctx  = {};
ctx.cbSize      = sizeof(ctx);
ctx.dwType      = Det_MessageBox;

det->Detonate(&ctx);

There is no CALL Det::Detonate instruction anywhere in the host binary. The disassembly at the call site is:

1
2
mov  rax, [rcx]        ; load vtable pointer from the IDet object
call [rax + 0x18]      ; jump through vtable slot — target is a runtime value

The payload’s address is resolved at runtime from a vtable filled in by the DLL instance. Static analysis cannot follow this.


Three Loading Variants

The same IDet contract works across three structurally distinct loading mechanisms. Each trades off stealth properties against operational prerequisites.

Case A — Self-Load via Embedded Manifest

The host binary embeds the manifest as RT_MANIFEST #1:

1
mt.exe -manifest CaseA.manifest -outputresource:"Server.exe";#1

RT_MANIFEST #1 is the standard EXE manifest resource. Windows creates an activation context from it automatically at process start — no CreateActCtx call is needed. When CoGetClassObject runs, the activation context maps the CLSID to Server.exe, and combase.dll calls:

1
2
LoadLibraryExW("Server.exe")        ← second load of the same file
GetProcAddress(hMod, "DllGetClassObject")

Two things happen that matter for analysis. First, LoadLibraryExW loads the binary a second time, rebased by ASLR to a different address (e.g. EXE at 0x00400000, DLL copy at 0x6FD00000). The payload runs from the DLL copy’s address range. Any hook, tracer, or instrumentation anchored to the EXE’s address range misses the execution entirely.

This tripped me up when I first worked through it. The standard mental model is that if a DLL is already loaded, a second LoadLibrary call on the same file just increments the reference count and hands back the same HMODULE — no second mapping, no new address. That model is correct for DLL-to-DLL re-loading. It does not apply here. The OS loads Server.exe as the process image — a distinct load type that the loader tracks separately from LoadLibrary-managed modules. When combase.dll later calls LoadLibraryExW("Server.exe"), the loader finds no existing LoadLibrary refcount entry for it and performs a fresh NtMapViewOfSection. ASLR assigns a new base. The result is two independent mappings of the same file: the process image at its original address, and the DLL copy somewhere else entirely. Read-only sections (.text, .rdata) are backed by the same on-disk section object — no physical memory is duplicated — but the two mappings live at separate virtual addresses with separate PEB module list entries.

The stealth property this creates is precise: any instrumentation anchored to the process image’s address range — inline hooks, breakpoints, page guards — does not cover the DLL copy. Detonate() executes from the DLL mapping. The hook range never overlaps it.

There is a second, quieter bonus here. Both mappings are backed by the on-disk file. Modern EDRs and memory scanners specifically flag unbacked executable memory — private allocations that contain a PE header or shellcode but have no corresponding file on disk. That is the primary detection signal for in-memory loaders and shellcode runners. The Case A DLL copy does not trigger it: the mapping has a backing file (Server.exe), passes the image-backed check, and looks to a memory scanner like any other legitimately-loaded module. More on this in the disk vs memory section below.

Second, the call stack at the moment LoadLibraryExW fires is:

1
2
3
4
KernelBase!LoadLibraryExW          ← the actual load
combase!CInprocServer::Load        ← called from a signed MS DLL
combase!CoGetClassObject           ← the only thing WinMain touched
Host!WinMain

LoadLibraryExW is not present in Server.exe’s import table. It is not called by any code in the binary. The EDR heuristic “untrusted binary called LoadLibrary” does not fire because the binary that called it is combase.dll.

This is call-stack laundering: the suspicious operation (LoadLibraryExW) exists and is visible, but its attributed caller is a signed Microsoft DLL.

The procmon stack at the Load Image event for Payload.dll confirms this. Frames are bottom-up (oldest first):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
44  Server.exe      WinMain + 0xd7   Host.cpp(23)        ← last host frame; CoGetClassObject call
43  combase.dll     CoGetClassObject + 0x4a              ← entry from host
42  combase.dll     CoGetClassObject + 0xcb4
41  combase.dll     RoGetActivatableClassRegistration
    ... (19 combase frames — COM's internal activation machinery) ...
30  combase.dll     CoGetClassObject + 0x19fc
29  combase.dll     RoGetActivationFactory
25  combase.dll     PropVariantClear + 0x2d3f            ← nearest export symbol; actual fn is CInprocServer::Load
24  KernelBase.dll  LoadLibraryExW + 0x156               ← the load; attributed to KernelBase, called by combase
 0  ntoskrnl.exe    NtMapViewOfSection                   ← kernel ImageLoad — fires regardless

Frame 44 is the deepest Server.exe frame. Everything below it belongs to signed Microsoft DLLs. LoadLibraryExW is called by combase.dll, not by the host binary. The frames above frame 24 (kernel, ntdll) confirm ETW ImageLoad still fires — the kernel is caller-agnostic.

sequenceDiagram box rgba(200,50,50,0.1) Direct Load — EDR sees host as caller participant HostD as loader.exe (WinMain) participant KBD as KernelBase.dll participant DLLD as payload.dll end box rgba(50,150,50,0.1) COM Load — EDR sees combase as caller participant Host as Server.exe (WinMain) participant COM as combase.dll participant KB as KernelBase.dll participant DLL as Payload.dll end HostD->>KBD: LoadLibraryExW("payload.dll") Note over HostD,KBD: ❌ host in call stack — EDR fires KBD->>DLLD: map into process HostD->>DLLD: GetProcAddress("RunPayload") DLLD-->>HostD: fn ptr HostD->>DLLD: call fn ptr Host->>COM: CoGetClassObject(CLSID) Note over COM: 19 combase frames COM->>KB: LoadLibraryExW("Payload.dll") Note over COM,KB: ✅ combase in call stack — host not attributed KB->>DLL: map into process Note over KB: kernel ETW ImageLoad fires regardless COM->>DLL: GetProcAddress("DllGetClassObject") DLL-->>COM: IDetFactory* COM-->>Host: IDetFactory* Host->>DLL: Detonate() via call [rax+0x18]

The exports required to satisfy DllGetClassObject:

1
2
3
4
EXPORTS
DllGetClassObject    PRIVATE
DllRegisterServer    PRIVATE
DllUnregisterServer  PRIVATE

PRIVATE keeps them out of the .lib import library while remaining reachable via GetProcAddress. DllRegisterServer and DllUnregisterServer are no-op stubs — regsvr32 can probe them without error, and no registry write occurs.

Case A bakes the payload at compile time — both its strength and its constraint. When execution is static, the behavior known and fixed, Case A is the cleanest option: single binary, nothing to stage, no operational dependency on a second artifact being in the right place. The moment that changes, the costs become harder to ignore. The payload can’t be swapped without recompiling the host, whereas Cases B and C load whatever DLL the manifest points to at runtime — one Server.exe, different payloads per objective. The self-load pattern is also the most distinctive fingerprint Case A leaves: an EXE whose RT_MANIFEST #1 registers itself as a COM server is unusual in legitimate software, the CLSID-to-filename mapping is immediately visible to any PE tool that reads the standard manifest slot, and reversing Server.exe exposes the payload implementation directly. In Cases B and C the host binary contains no payload code — an analyst reversing the host gets nothing. When any of those tradeoffs matter, Case A stops being the right tool. That is the natural push toward Case B.

Case B — External DLL via Dynamic Disk Manifest

Case B builds on the same COM activation architecture, but replaces the embedded manifest with one supplied at runtime. The mechanism is ActCtxRuntime — a thin RAII wrapper around three Win32 functions: CreateActCtx parses the manifest XML into an in-memory activation context structure, ActivateActCtx pushes it onto the thread’s activation context stack, and DeactivateActCtx pops it on scope exit. While the context is active, any COM resolution on that thread — including CoGetClassObject — consults it before falling back to the registry. The scope is deliberately narrow: the context wraps only the CoGetClassObject call, lives for milliseconds, and leaves no persistent state anywhere.

The host accepts a manifest path at runtime and activates it around CoGetClassObject:

1
2
3
ActCtxRuntime actCtx(argv[2]);   // CreateActCtx + ActivateActCtx
CoGetClassObject(__uuidof(Det), CLSCTX_INPROC_SERVER, ...);
// ActCtxRuntime dtor: DeactivateActCtx + ReleaseActCtx

The manifest on disk maps the same CLSID to Payload.dll:

1
2
3
<file name="Payload.dll">
  <comClass clsid="{5d8a7d33-059f-418a-8d77-5f3944d63b6d}" threadingModel="Both" />
</file>

combase.dll resolves Payload.dll relative to the manifest’s directory and loads it. The call stack at load time is identical to Case A. The loaded DLL exports the same three symbols and implements the same IDet / IDetFactory interfaces.

The forensic difference from Case A:

Signal Case A Case B
<comClass> in RT_MANIFEST #1 Present — extractable statically Absent
DLL name anywhere in host binary Server.exe (self, anomalous) Nowhere
Self-loading anomaly Present Absent — normal COM client pattern

Case B’s call stack is indistinguishable from any legitimate application loading a COM add-in. Excel loading an in-process COM server produces the same shape.

The payload DLL defines no DllMain. When combase.dll loads it via LoadLibraryExW, the loader has no entry point to call — DLL_PROCESS_ATTACH never fires and LdrpCallInitRoutine is never invoked for this module. EDRs that instrument LdrpCallInitRoutine to intercept DLL initialization (a common userspace hook site for detecting injected or side-loaded code) receive no callback. The DLL is fully mapped and executable; the first user-visible execution from it is DllGetClassObject, called directly by combase.dll.

Case C — Embedded Manifest at Non-Standard Resource ID

Case C eliminates the long term persisted on-disk manifest of Case B while keeping the external DLL. A second manifest is embedded in Server.exe at RT_MANIFEST #2:

1
mt.exe -manifest CaseC.manifest -outputresource:"Server.exe";#2

Resource #2 is not the standard EXE manifest. Windows does not process it automatically. Standard static analysis tools, PE analyzers, and sigcheck process RT_MANIFEST #1 only. #2 is invisible to them unless the analyst specifically enumerates all resource entries.

At runtime, the host extracts resource #2, writes it briefly to the DLL directory as a temp file, calls CreateActCtx against that file (which parses the manifest into memory structures), then immediately deletes the temp file before CoGetClassObject runs:

1
2
3
4
Process start  →  no manifest file on disk
CreateActCtx   →  ~actctx.manifest written briefly to DLL directory
CreateActCtx returns  →  ~actctx.manifest deleted
CoGetClassObject  →  no manifest file on disk anywhere

The activation context is parsed into memory at CreateActCtx time. The file is not needed again. No artifact remains on disk when the load occurs.

The directory supplied at runtime serves two purposes simultaneously: it becomes the assembly root for <file> resolution (so combase.dll constructs directory\Payload.dll as the full path), and it is where the temp manifest must be written. These conditions cannot be separated.

Signal Case A Case B Case C
<comClass> visible to standard tools Yes (#1) N/A (file) No (#2 only)
Manifest file on disk during load None Required None
DLL name in host binary Server.exe None In #2 resource — non-standard location
Self-load anomaly Yes No No
DLL directory in host binary N/A None None — runtime input only

How each case sources its activation context before CoGetClassObject runs:

flowchart TD START([Server.exe starts]) --> ARG{switch} ARG -->|Case A: /A| CA_CTX[RT_MANIFEST #1 auto-activated at process start] ARG -->|Case B: /B path| CB_CTX[CreateActCtx from disk manifest file] ARG -->|Case C: /C dir| CC_RES[FindResource: extract RT_MANIFEST #2 from binary] CC_RES --> CC_WRITE[WriteFile: dir/~actctx.manifest written briefly] CC_WRITE --> CC_ACT[CreateActCtx: manifest parsed into memory] CC_ACT --> CC_DEL[DeleteFile: no manifest on disk] CA_CTX --> CGCO[CoGetClassObject] CB_CTX --> CGCO CC_DEL --> CGCO CGCO --> LOAD[combase.dll calls LoadLibraryExW + DllGetClassObject] LOAD --> IFACE[IDetFactory → IDet → Detonate via vtable]

Invocation

All three cases use the same binary and the same interface. The switch is a command-line argument:

1
2
3
Server.exe /A                                      # Case A: self-load, no prerequisites
Server.exe /B C:\Code\Armory\Payload\Dist\Payload.manifest  # Case B: external DLL, disk manifest
Server.exe /C C:\Code\Armory\Payload\Dist\         # Case C: external DLL, embedded manifest #2

The host binary (Server.exe) is a reusable dispatch shell. The payload (Payload.dll) is independently deployable and can be swapped without recompiling the host. Both compile against the same Interfaces.h contract and nothing else.


The LoadLibrary Evasion Landscape

With the three cases established, it helps to place this technique in the wider taxonomy of how malware avoids attributable LoadLibrary calls. The approaches cluster into seven families, each attacking a different point in the detection chain.

The seven families

1. Direct LoadLibrary (baseline) The naive case — call LoadLibraryW("payload.dll") directly. Every detection signal fires. Included here as the reference point everything else is measured against.

2. PEB walk + manual export resolution Walk PEB→Ldr→InLoadOrderModuleList to locate kernel32.dll in memory, parse its export table manually to retrieve LoadLibraryW’s address, then call through that pointer. Removes the IAT entry and the string "LoadLibrary" from the binary. The function is still called — userspace hooks on it still fire — but static analysis tools find nothing to flag. On x86 the PEB pointer is at fs:[0x30]; on x64, gs:[0x60].

3. String obfuscation / API hashing Compute a hash of "LoadLibraryW" and resolve it from the export table at runtime rather than importing by name. Variants use XOR encoding or stack-allocated strings. Like PEB walking, this defeats static string and IAT analysis; dynamic hooks remain intact.

4. Direct syscalls — bypass Win32 entirely Extract the syscall number (SSN) for NtMapViewOfSection from ntdll at runtime, then issue the syscall instruction directly. The file is mapped as an image-backed section into the process without LoadLibraryExW ever executing. Userspace hooks on all Ldr* and LoadLibrary* functions are bypassed entirely. The kernel ImageLoad callback (PsSetLoadImageNotifyRoutine) still fires — it is kernel-generated and agnostic to how the section was mapped. Also detected by syscall call-origin mismatch.

5. Manual PE mapping / reflective loading Allocate memory, copy the PE manually, fix relocations, resolve imports, call DllMain directly. No LoadLibraryExW, no LdrpLoadDll, no loader involvement at all. The module does not appear in the PEB module list unless deliberately inserted. ETW ImageLoad may not fire if the mapping is done as anonymous memory rather than a file-backed section. The entire Windows loader is reimplemented in userspace.

6. Trusted module delegation ← this project Cause a trusted, signed OS component to call LoadLibraryExW on the attacker’s behalf. The load happens normally — it goes through the Windows loader, fires LdrpCallInitRoutine, creates a PEB module list entry, and generates an ETW ImageLoad event — but the attributed caller in the call stack is the trusted module, not the host binary. LoadLibrary is absent from the host’s IAT because the host never calls it.

7. Module stomping Locate a DLL already loaded in the target process, overwrite its .text section with new code, and redirect execution there. No load event fires because the module is already present. The PEB module list entry exists and shows the original legitimate DLL’s path. ETW sees no new ImageLoad. The cost is that the overwritten module’s legitimate functionality is destroyed, and memory forensics will detect the mismatch between the on-disk image and the in-memory content.


Detection signal matrix

Technique IAT entry Host in load call stack LdrpCallInitRoutine fires ETW ImageLoad PEB module list entry Benign prevalence
Direct LoadLibrary 🔴 Yes 🔴 Yes 🔴 Yes 🔴 Yes 🔴 Yes High
PEB walk + manual resolve 🟢 No 🔴 Yes 🔴 Yes 🔴 Yes 🔴 Yes Very low
String obfuscation / hashing 🟢 No 🔴 Yes 🔴 Yes 🔴 Yes 🔴 Yes Low
Direct syscalls 🟢 No 🟢 No 🟡 Partial 🔴 Yes 🟡 Partial Very low
Manual PE mapping 🟢 No 🟢 No 🟢 No 🟡 Partial 🟢 No Very low
Trusted module delegation (COM) 🟢 No 🟢 No 🟢 No¹ 🔴 Yes 🔴 Yes Very high
Module stomping 🟢 No 🟢 No 🟢 No 🟢 No 🔴 Yes² Very low

¹ The payload DLL in Cases B and C defines no DllMain. LdrpCallInitRoutine is invoked but finds no entry point and returns immediately — no user code runs there.

² The PEB entry shows the legitimate DLL’s name, not the payload. Memory forensics will detect the content mismatch between on-disk and in-memory images.

Why the bottom-left quadrant matters

The matrix reveals a pattern: every technique that removes the host from the load call stack (rows 4–7) has very low benign prevalence. In the commodity threat actor profile, direct syscalls, reflective loading, and module stomping appear almost exclusively in malicious tooling — legitimate security software and packers are exceptions rather than the norm. An EDR that alerts on those signals generally fires with high fidelity against that population.

Trusted module delegation occupies the only cell that combines low host attribution with very high benign prevalence. COM loading via combase.dll is what Office add-ins, browser extension hosts, shell extensions, and in-process COM servers all do as a matter of routine. In environments with meaningful COM usage, an EDR alerting on “DLL loaded via CoGetClassObject” without additional context is working against a very noisy signal — the load pattern alone is not enough to distinguish the technique from legitimate software.

The technique does not hide the load. It makes the load indistinguishable from normal COM usage at the signal layer where most automated detection operates. The kernel still sees everything — ImageLoad fires, the module appears in the PEB. The question is whether the detection pipeline can distinguish a registration-free COM load of an unsigned DLL from the thousands of legitimate COM loads happening in the same process lifetime. That is a behavioral and context problem, not a visibility problem.


Detection Ledger: What Fires, What Doesn’t

The 2019 post framed the value as “avoiding the LoadLibrary + GetProcAddress combo.” That framing is imprecise in a way that matters. LoadLibraryExW does execute — it is called by combase.dll. GetProcAddress also executes — combase.dll calls it to locate DllGetClassObject. Neither appears in the host binary’s import table or source code because the host binary never calls them. They are called by a signed Microsoft DLL on behalf of the host.

What the technique bypasses:

Detection Method Bypassed Mechanism
IAT scan for LoadLibrary* Yes Not in host’s import table
Dynamic GetProcAddress("LoadLibrary") pattern Yes Never called by host code
PEB walk / hash-based API resolution Yes Never done
“Untrusted module called LoadLibrary” heuristic Yes Caller is combase.dll — signed, trusted
Userspace API hook on LoadLibraryA/W in host module Yes Hook on host’s address range is never reached
EDR hook on DllMain / LdrpCallInitRoutine Yes Payload DLL has no DllMain; DLL_PROCESS_ATTACH never fires; LdrpCallInitRoutine is never invoked for this load
Registry CLSID write (HKCR\CLSID) monitoring Yes Activation context replaces registry entirely
Two-file artifact pattern (loader.exe + payload.dll) Yes (Case A) Single binary — no second artifact
Static call graph from WinMain to payload Yes No direct call edge — vtable dispatch breaks the graph

What still fires:

Detection Method Still Fires Why
Kernel PsSetLoadImageNotifyRoutine Yes Fires regardless of which module called LoadLibrary
ETW ImageLoad (Microsoft-Windows-Kernel-Process) Yes Kernel-generated, caller-agnostic
Loaded module characteristics (unsigned, unusual path) Yes The DLL’s metadata is still inspectable
RT_MANIFEST #1 anomaly analysis (Case A) Yes EXE registering itself as a COM server is forensically unusual
Payload behavior (CreateFile, WriteFile, network, etc.) Yes File system minifilter and network driver see I/O regardless of execution path
Static binary analysis (YARA, strings, disassembly) Yes The payload code is on disk
Memory forensics (pe-sieve, volatility) Yes In-memory PE is inspectable

The technique’s ceiling is userspace and signature-based detection. It is not a bypass for kernel-driver EDRs with ImageLoad callbacks, behavioral analytics that model self-loading as anomalous, or a thorough human analyst. The operational value is not invisibility — it is cost elevation: the binary walks into a sandbox, loads its payload through CoGetClassObject, and the sandbox reports “normal COM usage.” Reconstructing what happened requires correlating the activation context, enumerating RT_MANIFEST #2, and tracing the temp manifest write to the subsequent load event. That is meaningfully more work than following a direct LoadLibrary call.


Disk vs Memory: Why Attackers Use Both

A reasonable objection to any disk-based loading technique is that sophisticated attackers prefer memory-only execution — no file on disk, no ImageLoad ETW event, no artifact for AV to scan. If manual PE mapping removes all the remaining signals that COM delegation still exposes (ETW ImageLoad, PEB module list entry), why use disk at all?

The answer is that the techniques are not alternatives. They are stages.

Why pure memory-only loading is harder than it looks

Reflective loading means reimplementing the Windows PE loader — relocations, import resolution, TLS callbacks, SEH exception directory registration, forwarded exports, delay-load imports. Getting every edge case right across Windows versions and architectures is genuinely non-trivial; a missed relocation or wrong base delta crashes the process silently. By comparison, COM delegation is thirty lines of standard Windows API code that has been shipping in production software for twenty years.

The “no ETW” property of memory-only loading also turns out to require more than just avoiding LoadLibraryExW. It specifically requires mapping into anonymous memory. If NtMapViewOfSection is called against an actual file rather than a memory buffer, the kernel ImageLoad callback fires identically to LoadLibraryExW. Eliminating ETW means anonymous allocation + manual copy + manual mapping, which circles back to the full reflective loader complexity above.

On x64 there is another practical obstacle: structured exception handling requires the .pdata section to be registered with the runtime via RtlAddFunctionTable. Reflective loaders that skip this step crash the moment an exception unwinds through payload code — including exceptions thrown internally by system APIs the payload calls. Registering .pdata re-introduces a detectable artifact: a call to RtlAddFunctionTable with a base address that has no corresponding PEB module list entry.

Detection note — unbacked memory sweep. Newer EDRs scan specifically for executable memory regions with no backing file: private committed pages with execute permission (VirtualAlloc + PAGE_EXECUTE_*). This is a primary signal for shellcode runners and reflective loaders. The disk-loaded DLL in Cases A, B, and C does not produce it — every executable page passes VirtualQuery checks for MEM_IMAGE type and is indistinguishable from any other file-mapped module. File-backed mappings trade AV scan exposure for immunity to the unbacked-memory sweep. Which side of that tradeoff is preferable depends on which detection layer the target environment operates at.

Detection note — memory forensics. pe-sieve, moneta, and commercial EDR memory scanners flag PE headers in executable regions with no corresponding PEB module list entry, or where in-memory content diverges from the on-disk image. A reflectively-loaded DLL with no PEB entry is more anomalous to a memory scanner than a normally-loaded one — absence of a module entry for an executable region is itself a detection signal.

Finally, a memory-only load evaporates on process exit. If the operator needs capability to survive a restart, something touches disk eventually — the DLL, a scheduled task, a registry run key, a dropper. Disk is unavoidable somewhere in the persistence chain.

The two-stage chain

These constraints produce a natural division of labor:

flowchart TD S1["Stage 1 — COM delegation (disk)
─────────────────────────────────────
Low signal · indistinguishable from legitimate COM client

Buys: call-stack laundering · no IAT entry · vtable dispatch
Costs: ETW ImageLoad fires · PEB entry created · DLL on disk"] S2["Stage 2 — Reflective / manual map (memory)
─────────────────────────────────────
No further disk artifact · no ImageLoad for second stage

Buys: scan-resistant · no new PEB entry · no ETW for stage 2
Costs: implementation complexity · RtlAddFunctionTable exposure"] S1 -->|"Payload.dll loads · Detonate() executes · hands off in-process"| S2 style S1 fill:#1a3a1a,stroke:#4caf50,color:#e8f5e9 style S2 fill:#1a1a3a,stroke:#5c6bc0,color:#e8eaf6

Stage 1 provides legitimacy and call-stack cover. Stage 2 provides scan resistance and eliminates the remaining disk artifact. The stage 1 DLL (Payload.dll) can be small, unsigned, and behaviorally inert — it exists only to receive execution from combase.dll and hand off to the in-memory second stage. Its on-disk content gives analysts nothing because the real capability never touches disk.


End of technical reference. The sections above cover the mechanism in full: what Registration-Free COM is, how the contract and three loading variants work, where the technique sits in the broader evasion landscape, what detection signals fire and which don’t, and why disk-based loading and memory-only loading are stages rather than alternatives. What follows is a more generalised discussion of how this TTP is positioned and exercised operationally — applicable to any implementation of the technique, not only this one.


Red and Purple Team Perspectives

The two functions look at the same technique from different angles. Red team asks: is this good enough to land, and what does it cost me to use it? Purple team asks: does our detection pipeline catch this, at which stage, and what does the answer tell us about our coverage gaps? The technique serves both simultaneously.

Red team: signal value and tradecraft economy

Running this technique on an engagement and observing whether an alert fires returns a precise capability measurement about the defender’s stack:

Outcome What it reveals
No alert Userspace detection only; kernel telemetry absent or not alerting on COM loads
Alert on ImageLoad, no triage Telemetry present but behavioral analytics not built for this pattern
Alert with fast triage Mature COM-aware analytics; defender is ahead of commodity tooling
Alert with correct kill chain reconstruction Activation context inspection operational; manifest-to-load correlation works

A red team engagement that uses only well-known techniques — default Cobalt Strike profiles, standard shellcode injection — only tests whether the defender catches commodity malware. This technique tests a specific, less-commonly-tuned detection surface and returns a calibrated answer about defender maturity on that surface.

There is a second, less obvious reason red teamers prefer this class of technique at stage 0 that has nothing to do with stealth: not burning advanced tradecraft early. Every technique a red team exposes gives the defender something to build a detection for. If stage 0 uses a novel memory injection method, a custom syscall stub, or an undocumented kernel primitive, the defender’s incident response team now has a signature for it. The red team has spent a high-value technique to achieve initial execution on a workstation, and that technique is no longer deniable on future engagements.

COM delegation via registration-free manifests is documented, has prior art, and is already in the threat intelligence corpus. Using it at stage 0 exposes nothing new. The advanced tradecraft — novel injection, custom C2 channel, kernel-level persistence — stays reserved for the objectives that actually require it. The principle is triage inversion: use the minimum capability required to pass each defensive layer, and save headroom for the layers that are actually hard.

Red team: known limitations

The technique’s rough edges — unsigned DLL, inspectable activation context, Case A self-load anomaly — are already catalogued in the Detection Ledger above. The less obvious one is the CLSID: a manifest that references a CLSID with no HKCR entry is detectable by a defender correlating COM loads against known registrations. The cleaner counter is using a CLSID that already exists in the registry and remapping it via the activation context — activation context takes precedence over registry, so the substitution is transparent to the COM runtime while the CLSID itself looks legitimate.

Purple team: the Cases as a detection ladder

Purple team uses the same three cases as a structured escalation exercise rather than a sequential narrative. Each case removes one more artifact; each removal tests whether detection degrades gracefully or fails silently.

1
2
3
Case B  →  disk manifest present, external DLL, loudest artifact profile
Case C  →  no disk manifest, external DLL, manifest hidden in RT_MANIFEST #2
Case A  →  no manifest on disk, no second binary, self-load anomaly only

The exercise runs as a loop: deploy a case, observe what fires in the SIEM/EDR, tune the detection rule, confirm the rule catches the current case, then advance to the next. A detection that catches Case B but not Case C has a manifest-presence dependency — it is looking for the file artifact rather than the load pattern itself. A detection that catches Case C but not Case A has a two-file dependency. Each gap is a precise capability statement.

Purple team: isolating the load signal from the payload signal

DetContext::dwType is useful here in a way that isn’t obvious until you’re running the exercise. The detonation type can be varied independently of the loading mechanism. Set dwType = Det_MessageBox and the payload produces a visible, benign, easily-confirmed action — a message box — without generating any file system, network, or process creation events. This isolates the question: does the detection fire on the load event itself, or does it only fire because of downstream payload behavior?

If detection fires on the COM load regardless of what Detonate() does, the detection is anchored to the loading pattern — robust. If detection only fires when the payload writes a file or spawns a process, the detection is anchored to payload behavior — it will miss a behaviorally inert stage 1 loader that hands off to a memory-only stage 2.

Purple team: disk vs memory as a calibration variable

The two-stage chain is not just an attacker construction — it is a detection engineering calibration target. Running the exercise with a benign stage 1 (Det_MessageBox) and a simulated but observable stage 2 produces a precise answer about which layer the defender’s pipeline covers. If detection fires only on stage 2 behavior, the pipeline is anchored to payload actions rather than the load pattern itself — and a behaviorally inert stage 1 loader handing off to memory will pass unobserved. A detection that fires on the COM load itself, before any detonation, is the one that collapses the chain at its earliest and highest-leverage point.


MITRE ATT&CK

Technique ID
System Binary Proxy Execution T1218
Component Object Model T1559.001
Reflective Code Loading T1620
Indirect Command Execution T1202
Modify Registry (avoided) T1112

This technique does not map cleanly to any single ATT&CK entry — it is the combination that matters. T1559.001 (COM) is the mechanism; T1218 (System Binary Proxy Execution) describes the attribution effect: a signed OS component (combase.dll) performs the load on behalf of the host. T1620 (Reflective Code Loading) captures the vtable-mediated execution where no direct call edge from WinMain to payload code exists in the binary. T1202 (Indirect Command Execution) reflects the broader principle that the host binary’s observable behavior is decoupled from the payload’s actual execution. Defenders correlating on any single technique ID in isolation will miss the chain.


Prior Art

Registration-Free COM as a loading primitive has been discovered and rediscovered independently across several years.

2019 — original writeup. The technique was first documented here in the context of avoiding registry writes and the LoadLibrary + GetProcAddress call pattern that first-generation EDRs flagged. That post covered what is now called Case A — the self-loading EXE via RT_MANIFEST #1. The detection analysis in that post was incomplete: it understated what kernel-level telemetry still fires and framed the value as “avoiding LoadLibrary” rather than the more precise “laundering the LoadLibrary call through a signed Microsoft DLL.” This post corrects those gaps and adds Cases B and C as two further primitives along the tradeoff curve.

2019 — Philip Tsukerman, “Activation Contexts: A Love Story”. Tsukerman documented activation context abuse for a different goal: poisoning existing application activation contexts to achieve persistence by redirecting COM server resolution without registry writes. The mechanism (CreateActCtx / activation context override) overlaps with what is described here; the intent differs — persistence hijack versus clean-stack loading. The parallel independent discovery confirms that activation contexts were an underexplored primitive at the time.

2023 — 0xDarkVortex, thread-pool COM proxying. A post-2019 refinement targeting the same call-stack attribution problem via a different route: routing LoadLibrary through Windows thread-pool APIs (TpAllocWork / TpPostWork) so that the attributed caller is ntdll!TppWorkerThread rather than host code. Same goal — make the suspicious load appear to originate from a trusted system component — achieved without COM. The convergence on this class of problem from multiple independent directions indicates it represents a genuine detection gap in userspace-anchored EDR architectures.

COM hijacking (general). A large body of work exists on abusing registered COM servers for persistence and lateral movement (e.g., HKCU\Software\Classes\CLSID hijacks). That class of technique requires registry writes and targets existing CLSIDs. Registration-Free COM is structurally distinct: no registry write occurs, no existing registration is hijacked, and the CLSID is under operator control.


Summary

Registration-Free COM gives operators a native Windows loading primitive that routes LoadLibraryExW through a signed Microsoft DLL, severs the static call graph from entry point to payload, and requires no registry writes, no second file (Case A), and no elevated privileges. The three loading variants are a tradeoff ladder: Case B has the largest artifact footprint and the simplest setup; Case A has the smallest and the most anomalous self-load behavior; Case C sits between them.

The technique’s ceiling is userspace and signature-based detection. Kernel telemetry fires regardless. The value is cost elevation — automated first-pass analysis returns nothing interesting, and reconstruction requires correlating the activation context with the load event, which is work most automated pipelines do not do.

For purple teams, the three cases are not just loading variants — they are a calibration ladder. Each case removes one artifact class, each removal tests whether detection is anchored to the artifact or to the underlying load pattern. The DetContext detonation type provides a clean way to isolate the load signal from payload behavior. A detection that fires on the COM load itself, before any detonation, is the detection that collapses the two-stage chain at its earliest and highest-leverage point.

Code

The source is available on GitHub: dsnezhkov/armory-rfcom