Documente Academic
Documente Profesional
Documente Cultură
• Assembly Language
• Disassembly
Reverse Engineering is a wide topic that extends beyond computing and has been around before
computers. I believe Reverse Engineering was summed up well in Eldad Eilam’s book ‘Secrets of Reverse
Engineering (ref 2).
“Reverse Engineering is the process of extracting the knowladge or design blue-prints from anything
man-made”.
Many people used to take apart radios and different electronic devices, especially when they first came
out and were not understood. In modern times these electronics are usually so microscopic that there
isn’t much information that one could gain by taking the electronics apart. Taking apart modern
technology usually requires a better knowladge of the technology than the person who originally made
it.
Reverse Engineering computer software is a very similar concept. Reverse Engineering computer
software requires not only a good understanding of software engineering but also the basic structure of
software at a very low level.
There are several purposes of reverse engineering software. The most common reverse engineering jobs
in the industry as advertised by companies like GCHQ or Norton are for malware analysis.
Malware analysis allows interested companies to look into the working of a piece of malware and create
heuristics for detecting the malware in the feature. Companies such as Norton can then use this analysis
to see if new malware is trying to circumvent there Anti-Virus security systems or even disable it. Other
types of Companies such as GCHQ may reverse engineer the application further in order to trace the
origin of the creator. For example a Key-Logger will often connect to a remote server in order to provide
the attacker with the logged information. A quick analysis of this software can often reveal the IP
address of this server.
Understanding or Improving undocumented software is also another reason for reverse engineering
software. Gaining an understanding of how a piece of software works can be very informative to
competing software companies. This doesn’t seem to happen as often as one would expect since it takes
a lot more programming skill to reverse engineer a piece of software and re-create it. It does however
often occur when a already existing and undocumented piece of software is needed to be used by a
developer. For example if you have a old library for an Unreal Engine 2 game (like the one used in Unreal
Tournment) you may struggle to find much information on specific functions. In fact the reason why the
really old Unreal Engine versions are so well detailed are because reverse engineers have documented
missing information.
Assembler
The assembly language is an exact translation of machine code that is presented in a readable form.
Assembly language also extends on this translation with some macros in order to make coding easier.
Understanding the assembly language makes viewing disassembled machine code much easier to read,
yet a disassembled program will not give compile-able assembly code.
In order to research and learn more about assembler I have been programming with Flat Assembler
(FASM). This language was of particular interest to me since you can create software which is ten or
more times smaller than the equivalent program created in C++.
Here is a ‘Hello World program created based on the documentation provided with the FASM compiler.
format MZ
push cs
pop ds
mov ah, 9
mov dx, hello
int 21h
The syntax of the language is essentially a new line for each command, similar to C but without the
semi-colons to indicate a new line. This instruction is then followed by the left and right side variables or
constant, divided by a comma. Unlike a higher level language each variable must be moved into a CPU
register before it can be operated on. It is possible to access memory; this however requires the
processor cache (L1, L2 etc.).
The most important instructions to identify when reverse engineering are the push and pop instructions,
the different ‘jump’ instructions, the call and also the ‘mov’ instruction. There are many others which
could take up a books worth of documentation.
JMP – Jump over a certain amount of bytes or Jump to a different address in memory
CALL – Calls a function, essentially the same as a JMP except the original location is saved and returned
to later on via the RETN opcode.
A Disassembler is the tool used to take an executable file, by which I mean a bunch of bytes stored in
memory and translate the binary bytes (machine codes) into assembly code. The assembly code that is
output by the Disassembler is appropriately formatted for a reverse engineer to read.
Example #1
#include <stdio.h>
int main()
{
printf("hello");
return 0;
}
By using a breakpoint on the printf function and then opening Visual Studio’s own Disassembler we can
view the runtime assembly.
Since this is my own application being Disassembled in real-time while running in debug mode the
assembly provided is particularly easy to manipulate and even easier to interpret thanks to Visual Studio
putting the assembly along-side the C++ code.
In order to disassemble a virtual world, the aid of a standalone Disassembler is required. For this
purpose I will be using IDA Pro (version 5.5 is freely available from their website) in order to create an
‘IDA Map’. The IDA map will essentially give the formatting required to make the assembly more
human-readable.
There are other Disassembling tools available such as: MacNosy – A disassemble for Macintosh ROM or
PowerPC machine code; TRACE32 which includes emulation capabilities for multiple processors; Reverse
Engineering Compiler which attempts to make C-like code (but not very well in my personal opinion).
Example #2
In this example I have expanded the first program from example #1 to contain a local variable and an ‘if’
statement. This will allow us to view the ‘flow’ of the program as I like to call it as well as analyzing the
actual assembly code.
int main()
{
bool MyBoolean = true;
if (MyBoolean == true)
{
printf("Hello!");
}
else
{
printf("Goodbye!");
}
return 0;
}
This program is definitely a Portable Executable file (as discussed in the following section of this paper).
‘Rename DLL entries’ is selected because I want useful names. (ref 1) “If not checked, IDA makes
repeatable comments for entries imported by ordinals. Otherwise, IDA renames the entries to
meaningful names. “The imports section provides useful information about what functions are called
within the assembly code, so the option for an imports segment is also included (Imports section is
discussed later on in this paper).
IDA Formatting
By switching to graph view in IDA you can see clearly the ‘flow’ of the program. IDA even highlights IF
statements with a flase flow (indicated with a red line) and a true flow (indicated with a green line).
Here you can see this flow of the IF statement from the source code in example #2.
After the JNZ (Jump if not zero) instruction the red line flows to
‘Hello’ while the green line flows to ‘GoodBye’. This might seem the opposite of what is expected but
the ‘true flow’ means that the ‘JNZ’ instruction is true, not the ‘MyBoolean == true’. Changing this JNZ
instruction to JZ (jump if zero) would essentially hack the program into outputting the opposite
information.
Obviously an exe is a file on disk which the operating system recognizes as an executable program.
When the program is opened by the user, the operating system allocates the necessary memory pages
and executes the programs entry point using the CreateProcess(…) API. Since this research aims to
reverse engineer a win32 executable file its important to understand what an executable is at the lowest
level.
Reversing a program requires a high knowledge of what an executable or library contains. Since
Windows based virtual worlds are the target of my research it is important to view the detailed
structure of a Win32 Portable Executable file.
The PE Format
Image taken from microsofts MSDN, article cc301805 (Accessed 25-04-14) by Matt Pietrek
The PE File Format contains multiple separate layers of data. These layers of the PE Format start with
the MS-DOS header and the MS-DOS Program Stub. When you open a Portable Executable in a
disassembler or a hex-editor that shows the Ascii conversion of the characters, you will see the first two
bytes are the letters ‘MZ’ followed by an array of characters that read ‘This program cannot be run in
DOS mode’. All these characters are contained within the MS-DOS file header. The information stored
here provides a backward compatibility to old file types. If this executable is run in a DOS environment
then the error message will display the previously mentioned ‘This program cannot be run in DOS mode’
message.
After the ancient MS-DOS error handling data the PE File Signature, Header and Optional Header are
stored. Similarly to the MS-DOS error handling, DOS will print the PE error message if there is an issue. If
the program is being run by a compatible operator, a Windows NT system, then the PE Header is
pointed to by a pointer after the error code.
The PE Header
A portable executable header in my own opinion is like a contents page at the start of a book. The
content of a PE File header is definitely a vital resource for anyone reverse engineering the executable.
This is because it gives both information of the EXE and locations for every part of it.
From this structure, based on the naming of the variables, it seems that we can obtain some information
about the executable. This includes the machine that it was compiled on, the number of sections within
the PE file (sections will be explained later) and other (less useful) information.
The ‘Optional Header’ (which isn’t really optional on modern computers considering I’ve never seen in
not included) is contained within the next 224 bytes following the PE Header. The pointer to this is
slightly header to obtain, however it can be obtained via a simple C macro.
((PIMAGE_DOS_HEADER)a)->e_lfanew + SIZE_OF_NT_SIGNATURE + \
sizeof (IMAGE_FILE_HEADER)))
This macro takes advantage of the fact that the structures are static sizes and uses that to calculate the
pointer based on a variable at the end of the previous header. This contains most of the real important
data that will be needed such as the location of the entry point and the structure of each section within
the program. The optional header also contains the base address for the process (usually 0x400000).
//
// Standard fields.
//
USHORT Magic;
UCHAR MajorLinkerVersion;
UCHAR MinorLinkerVersion;
ULONG SizeOfCode;
ULONG SizeOfInitializedData;
ULONG SizeOfUninitializedData;
ULONG AddressOfEntryPoint;
ULONG BaseOfCode;
ULONG BaseOfData;
//
// NT additional fields.
//
ULONG ImageBase;
ULONG SectionAlignment;
ULONG FileAlignment;
USHORT MajorOperatingSystemVersion;
USHORT MinorOperatingSystemVersion;
USHORT MajorImageVersion;
USHORT MinorImageVersion;
USHORT MajorSubsystemVersion;
USHORT MinorSubsystemVersion;
ULONG Reserved1;
ULONG SizeOfImage;
ULONG SizeOfHeaders;
ULONG CheckSum;
USHORT Subsystem;
USHORT DllCharacteristics;
ULONG SizeOfStackReserve;
ULONG SizeOfStackCommit;
ULONG SizeOfHeapReserve;
ULONG SizeOfHeapCommit;
ULONG LoaderFlags;
ULONG NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;
As you will note from the comments the first variables are the ‘Standard Fields’ and this is followed by
the ‘NT Additional Fields’. The standard fields are named standard because they used to be in the
Common Object File Format (COFF).
In order to view the program header in real time we can use a debugger. It’s also possible to ‘dump’ an
image of the process while it’s running and then disassemble it within IDA-Pro but debugging can allow
browsing and a modification to the binary as the software is running. IDA comes with a local win32
debugger which does the job; alternatively there is Ollydbg which has rated as the number one windows
and unix debugger. I personally prefer the interface of IDA-Pro and the slightly better dis-assembly
output (since it is primarily a disassembler).
In order to view the process at runtime I needed to either add a pause into the source code of the
program such as _getch(..) or add a breakpoint. I opted to add a breakpoint just before the last ‘return’
of the main function.
Screenshot - Pausing the program in real time after the ‘if statement’ has finished.
Browsing the first 1000 bytes allows a reverse engineer to view the executables header, the same goes
for all loaded modules. Each DLL has very similar properties to an executable, including having their own
PE Header. A DLL Simply allows the system to be modular and separates unrelated functions into
different libraries and can be done both dynamically like in a DLL or statically, as done in lib files.
What is a DLL
A DLL is a Dynamically Linked Library. Like a executable process (an EXE) a DLL contains a header, a data
section and executable code. DLL’s are produced so that common functions can be linked to by
processes without having to recreate the functions every time. For example, on a win32 based NT
machine the program will automatically load necessary DLL’s such as Kernel32.dll, User32.dll and
NTDLL.dll. Other operating system DLL’s are optional, such as Comdlg32.dll for dialog based applications
and d3d9.dll for direct-x based applications.
DLL’s allow separate part of a users program to be updated individually. For example, if Microsoft
releases an operating system update that includes changes to NTDLL.dll then they can simply update
that file without having to update every win32 based NT software package. DLL’s also use less resources
since multiple programs can simultaneously use the functions within a DLL. This means that there will be
less code duplication, not just in the process but the whole operating system.
Creating a DLL
The basic structure for a DLL is very similar to a standard console win32 application. The defining
difference is that the entry point will be DLLMain instead of Main. While a process will execute ‘Main’ by
default after the operating system has created the process in memory and allocated a thread, a DLL’s
entry point is executed by the existing processing thread. The parameters of these entry point functions
are also different. A DLLMain function parameter list contains the HINSTANCE of the DLL being loaded
(essentially a HANDLE to itself) and the reason for the entry point being called. There can be 4 reasons
for the DLL entry point to be called: DLL_PROCESS_DETACH, DLL_PROCESS_ATTACH,
DLL_THREAD_ATTACH and DLL_THREAD_DETACH – these are all self descriptive enumerators.
• Disassembling a 3D Game
For this I project I have started to reverse engineer a 3D world using a Single Player environment so that
no ‘third-party add-on’ rules were broken.
Due to the anti-debugging features within Soldierfront I had to create a run-time image by pausing the
program and all of its threads and then using a custom made tool to write each byte to a file on disk.
This .exe file will not be capable of running, it is however a perfect image of Soldierfront at run-time.
Finding a useful function
Unlike the previous examples of disassembling, a full game contains so much data it’s near impossible to
find the functions that you want unless you can look for a good reference point.
For this purpose I am going to look for the variables relating to player health with the purpose of being
able to display health bars of all players in the game to the screen. I will not be editing the actual health,
but that is possible as-well.
After checking possible strings such as ‘Health’, ‘/100’ and ‘hp’ I found the following string in the .text
section of the Portable Executable file.
The information given here is that in the ‘.text’ section at the memory address ‘00BC2B3C’ (in
hexadecimal) is a string that says ‘%s [%dm]' followed by ‘[hp:%d]’. These look just like the in game
names tags for your own teammates that say for example ‘TheXSniperX [104m]’ ‘[hp:90]’.
By looking at the ‘XREF’ we will see where this string is referenced by the code. The important
information is the CALL operand following the string reference and the pushed variables before it.
Disassembled data:
push edx
push eax
push edx
push eax
call sub_B0754A
To reverse engineer this function it must first be somewhat re-created to understand the function call.
Because of the string formatting I am expecting something similar to a printf or sprint function. So we go
from the CALL backwards to get the parameters since they are pushed in the order of last one pushed is
the first parameter of the function.
The next opcode as we work backwards is lea. (ref 3) “The lea instruction places the address specified by
its second operand into the register specified by its first operand. Note, the contents of the memory
location are not loaded; only the effective address is computed and placed into the register. This is
useful for obtaining a pointer into a memory region.”
This means that eax, before being pushed to the first parameter contains a pointer (the effective
address) of esp (esp being the stack pointer) plus 13C (hexadecimal). Since this is looking like a sprintf
type of function (with the formatted string in the second parameter) it’s a good guess that eax is going
to be a pointer to the string that the result will end up within.
So far the following has been reconstructed with the information at hand:
char string_buffer[255]; //255 is a educated guess, it could be a different size but it doesn’t matter.
The good thing here is that the formatted string tells us how many more parameters there are and what
they contain. If this wasn’t the case it would be necessary to count every push and pop to see what’s on
the stack.
To skip to the value we want: health, we can just skip over the previous two push opcodes. This leaves
us with the first instance of PUSH EDX. At this moment in time EDX contains the value from this opcode:
This is the second parameter of the current function that we are within. In order to trace the origin of
this health value it is necessary to go to the top of this function and check for XREFS. We need to find
the functions that calls the function that we are currently inside in order to trace the second parameter
in order to understand which address the health is being loaded from.
Tracing the 2nd Argument from the previous stack pointer
[esp+130h+arg_4]
Since arg_4 is the second parameter (First argument plus 4 bytes) were now tracing back from the
previous function to see what it puts in the 2nd parameter.
.text:00607709
• Interesting thing to note is that it gets compared to 0FFFFFFFF which is -1 and if it does equal -1
then the code leading to our original function will never be called.
Repeating the process got me to this function call:
• 5A23C8 is where the health gets loaded into the 2nd parameter.
Attempted reconstruction
result = sub_563C70( hp );
By loading the correct offsets into the original function we can generate a pseudo-C construction.
Function Reconstruction
Using a combination of IDA-Pro and Visual Studio I formatted the following Reverse Engineered
reconstruction of the original program. I commented out the parts that would have taken hundreds of
functions of tracing or driver debugging. The caller convention for this function is _thiscall which means
it is a function within a class and the first parameter is always a pointer to its own class object. I
identified that it was this type of calling convention since it saves the ECX register and uses that for the
first parameter.
int __thiscall sub_6012D0(int PTR, char Pointer_To_Class, int Health, int Distance)
{
int result; // eax@1
int v5; // esi@1
int v6; // ebx@4
int v7; // ebp@4
int v8; // edi@4
int v9; // eax@5
int v10; // ecx@6
__int64 v11; // qax@7
int Unk_LocalVaraible1; // [sp+50h] [bp-11Ch]@1 stack pointer reference
int Unk_LocalVaraible2; // [sp+54h] [bp-118h]@1 4 byte decrement
int Unk_LocalVaraible4; // [sp+58h] [bp-114h]@1
int Unk_LocalVaraible5; // [sp+5Ch] [bp-110h]@1
int Unk_LocalVaraible6; // [sp+60h] [bp-10Ch]@1
int Unk_LocalVaraible7; // [sp+64h] [bp-108h]@1
char Unk_LocalVaraible8; // [sp+68h] [bp-104h]@1
char Unk_LocalVaraible9; // [sp+69h] [bp-103h]@1
unsigned int Pointer_To_C9D470; // [sp+168h] [bp-4h]@1
Unk_LocalVaraible8 = 0;
sub_B074D0(&Unk_LocalVaraible9, 0, 254);
sub_B0754A(&Unk_LocalVaraible8, "%s [%dm]\r [hp:%d]", -108 * Pointer_To_Class +
(_BYTE)v5 - 100);
Unk_LocalVaraible1 = 0;
Unk_LocalVaraible2 = 0;
Unk_LocalVaraible4 = 0;
Unk_LocalVaraible5 = 0;
///dword_BBA490(*(_DWORD *)(v5 + 120), &Unk_LocalVaraible8, -1, &Unk_LocalVaraible1,
9280, 0); - too many layers back
//v76feb21d(*(_DWORD *)(v5 + 120), 0, 0, Unk_LocalVaraible4, Unk_LocalVaraible5, 66);
dword_BBA490(*(_DWORD *)(v5 + 120), &Unk_LocalVaraible8, -1, &Unk_LocalVaraible1,
8256, 0);
Unk_LocalVaraible6 = 0;
Unk_LocalVaraible7 = 0;
result = sub_4B5E60(0, &Unk_LocalVaraible6, 0, 0);
if (!result)
{
if (Unk_LocalVaraible5 > 64)
Unk_LocalVaraible5 = 64;
v6 = Unk_LocalVaraible7;
v7 = *(_DWORD *)(v5 + 124);
sub_B074D0(Unk_LocalVaraible7, 0, 32768);
v8 = 0;
if (Unk_LocalVaraible5 > 0)
{
v9 = Unk_LocalVaraible4;
do
{
v10 = 0;
if (v9 > 0)
{
do
{
v11 = v8 * Unk_LocalVaraible6;
if (*(_BYTE *)(v7 + v8 * *(_DWORD *)(v5 + 192) +
v10))
*(_WORD *)(v6 + 2 * (v10 + (((_DWORD)v11 -
HIDWORD(v11)) >> 1))) = -1;
else
*(_WORD *)(v6 + 2 * (v10 + (((_DWORD)v11 -
HIDWORD(v11)) >> 1))) = 0;
v9 = Unk_LocalVaraible4;
++v10;
} while (v10 < Unk_LocalVaraible4);
}
++v8;
} while (v8 < Unk_LocalVaraible5);
}
result = sub_4B5E90(0);
}
return result;
}
Drawing a custom health bar
For the boxes I used DirectX 9.0c since that is what the game uses and I can use its device by hooking
EndScene.
D3DRECT BarRect = { x, y, x + w, y + h };
class CPlayer
{
public:
char unk22[0x37C];
DWORD Test_Val; //0x37C
char unk[0x4D8]; //0x858
DWORD valid; //0x85C
char unk1[0x55DE4]; //0x56640
char name[12]; //0x5664C
char unk2[0xF8]; //0x56744
DWORD health; //0x56748
Now that we have this we can get a pointer to the game, load a dll into it so that we are in the same
process space and read the health value into the DrawESPHealth function.
Testing
In this picture you can see the string I used to obtain the health information which is shown for people
on your own team. Below the enemy player you can see the health box which I am drawing to show
their health in real time.
References
Useful Sources
http://support.microsoft.com/kb/815065