005 Pietrek Matt Just Enough Assembly To Survive Kit HOOD SVE

Figure 1
EAX
Common Intel x86 Registers
Multipurpose. Return values from a function are usually stored in EAX. Low 16 bits are referenced as AX. AX can be further subdivided into AL (the low 8 bits), and AH (the upper 8 bits of AX). Multipurpose. Low 16 bits are referenced as BX. BX can be further subdivided into BL (the low 8 bits), and BH (the upper 8 bits of BX). Multipurpose. Often used as a counter, for example, to hold the number of loop iterations that should be performed. Low 16 bits are referenced as CX. CX can be further subdivided into CL (the low 8 bits), and CH (the upper 8 bits of CX). Multipurpose. Low 16 bits are referenced as DX. DX can be further subdivided into DL (the low 8 bits), and DH (the upper 8 bits of DX). Multipurpose. In certain operations that move or compare memory, ESI contains the source address. Low 16 bits are referenced as SI. Multipurpose. In certain operations that move or compare memory, EDI contains the destination address. Low 16 bits are referenced as DI. Stack pointer. Implicitly changed by PUSH, POP, CALL, and RET instructions. Base pointer. Usually points to the current stack frame for a procedure. Procedure parameters are usually at positive offsets from EBP (for example, EBP+8). Local variables are usually at negative offsets (for example, EBP-16). Sometimes, optimizing compilers won't use a stack frame, and use EBP as a multipurpose register.
EBX ECX
EDX ESI EDI
ESP EBP
EFLAGS Rarely directly referenced. Instead, instructions implicitly set or clear bitfields within the EFLAGS register to represent a certain state. For example, when the result of a mathematical operation is zero, the Zero flag is toggled on in the EFLAGS register. The conditional jump instructions make use of the EFLAGS register. FS 16-bit. Under Win32, the FS register points to a data structure with information pertaining to the current thread. FS is a segment register (segment registers are beyond the scope of this discussion). Intel CPUs have six segment registers, but the operating system sets them up and maintains them. Win32 compilers only need to explicitly refer to the FS segment register, which is used for things like structured exception handling and thread local storage.
Procedure Entry and Exit

These instructions are automatically inserted by the compiler to create a standard method for accessing parameters and local variables. This method is called a stack frame, as in "frame of reference." In fact, the Intel CPU dedicates the EBP register to maintaining a stack frame. For this group of instructions, it's especially important to note that not every procedure will use exactly the same sequence, and that certain things may be omitted entirely. Sequence PUSH EBP / MOV EBP,ESP / SUB ESP,XX Purpose Sets up the EBP stack frame for a new procedure Examples
PUSH MOV SUB EBP EBP, ESP ESP, 24
Description "PUSH EBP" saves the previous frame pointer on the stack. "MOV EBP,ESP" sets the EBP register to the same value as the stack pointer (ESP). "SUB ESP,XX" creates space for local variables below the EBP frame. In optimized code, you may see this sequence interspersed with other instructions (for example, "PUSH ESI"). Since "PUSH EBP" and "MOV EBP,ESP" both use the EBP register, a processor with multiple pipelines would ordinarily
need to stall one of the pipelines. By interspersing other instructions that don't use the EBP register, the processor can do more work in the same amount of time. Instruction ENTER Purpose Sets up the EBP stack frame for a new procedure Examples
ENTER 8, 0 ; Sets up stack frame with ; 8 bytes of local variables
Description The ENTER instruction first became available on the 80286 processor. It was intended to replace the "PUSH EBP / MOV EBP,ESP / SUB ESP,XX" sequence with a single, smaller instruction. On current processors the ENTER instruction is slower than the three-instruction sequence, so ENTER is rarely used. Sequence MOVE ESP,EBP / POP EBP Purpose Removes the EBP stack frame before leaving a procedure Description The "MOV ESP,EBP" instruction bumps up the stack pointer past any space allocated for local variables on the stack. "POP EBP" restores the stack frame pointer to point at the previous EBP frame. This sequence is normally followed by a return instruction to return control to the calling procedure. Instruction LEAVE Purpose Removes the EBP stack frame before leaving Description The LEAVE instruction is the inverse of the ENTER instruction. It can also be used to remove a frame set up by the "PUSH EBP / MOV EBP,ESP" sequence. The LEAVE instruction is only 1 byte long, which is smaller than the longer "MOV ESP,EBP / POP EBP" sequence. Unlike the ENTER instruction, there's no performance penalty for using it, so some compilers use LEAVE. Instruction PUSH register Purpose Saves the previous values of register variables Examples
PUSH EBX PUSH ESI PUSH EDI
Description Sometimes compilers use a general-purpose register to hold the value of parameters or local variables. This can be more efficient than storing the same value in memory. These are commonly known as register variables. The EBX, ESI, and EDI registers are most often used as register variables. The convention most compilers use is that register variable values are preserved across procedure calls. If the compiler decides to use register variables in a procedure, it is responsible for preserving the value of the registers that it alters (typically, EBX, ESI, and EDI). Typically, compilers preserve these register values on the stack as part of setting up the procedure's stack frame. If the compiler uses only one or two of the aforementioned registers, it needs to preserve only those registers. Instruction POP register Purpose Restores the previous values of register variables Examples
POP EDI
POP ESI POP EBX
Description In preparing to return from a procedure, the register variable registers need to be restored to their previous values. These instructions remove a value from the stack and place it into the designated register.
Accessing Variables
The Intel CPU has many instructions that work with variables, which are just locations in memory. For example, you can add or subtract from a variable representing a counter. Likewise, a variable may contain a pointer to something. There are just too many instructions to describe here, and in most cases the instruction name gives a good clue about what the instruction is doing. However, I will show how variables of different storage classes appear in assembly language. Instruction instruction [global] Purpose Global/static variables Examples
MOV EAX,[00401234] MOV [00401238],ESI PUSH [77852432] ADD [00620428],00001000
Description When you see an instruction that includes an actual machine address inside the square brackets, it's accessing memory that was declared as either a global or static variable. These addresses are known at program load time, so the instruction contains the actual memory address to read or write. Instruction instruction [parameter] Purpose Procedure parameters and this pointers Examples
MOV ESI,[EBP+14] MOV [ESP+30],EAX ADD [EBP+0C],2 OR [ESP+20],00000010
Description Parameters to procedures are usually passed on the thread's stack. Since these values are pushed before the procedure call and before the called procedure sets up its stack frame, the parameters appear at positive offsets from the stack frame base pointer (EBP). Just about any instruction that makes reference to memory above EBP (for example, "[EBP+8]") is making use of a procedure parameter. The advantage of using EBP for accessing parameters is that EBP doesn't change throughout the lifetime of a procedure. This makes it easier to keep track of the procedure's parameters. Prior to the 80386, the only effective way to access parameters was with the base pointer register. The 386 added the ability to access memory just as easily with displacements from the stack pointer (ESP) register. Thus, optimized code can dispense with setting up an EBP frame and still reference parameters by
using positive offsets from ESP. For example, "ADD [ESP+20],4" adds four to whatever DWORD is at [ESP+20]. From a debugging standpoint, using ESP to access parameters is inconvenient. Since ESP can change during a procedure, a given parameter may be at different offsets from ESP at different points in a procedure's code. One last word on parameters. In C++, the this pointer of a member function is really a hidden parameter. Usually the this pointer is the last parameter pushed on the stack before the call. In Visual Basic, the self-referential me is the same thing as the C++ this pointer. Instruction instruction [local] Purpose Local Variables Examples
MOV ESI,[EBP-14] MOV [EBP-30],EAX SUB [ESP],2 AND [ESP+4],00000010
Description From the vantage point of an assembly instruction, local variables aren't much different than parameters when an EBP frame is used. The only distinction is that local variables are at negative offsets from the EBP stack frame. You can get an idea of how big the sandbox for local variables will be by examining the "SUB ESP,XX" instruction near the beginning of the procedure. Things do get messy when the compiler decides to omit an EBP frame. When this happens, the compiler addresses both local variables and parameters as positive offsets from the ESP register. There's no good way to tell a local apart from a parameter in this situation except to find out how much space the procedure has allocated for locals (see above). If the offset is less than the space allocated, it's a local. Otherwise, it's probably a parameter. Instruction LEA variable Purpose Load Effective Address Examples
LEA EAX,[ESP+14] LEA EDX,[EBP-24]
Description Despite the square brackets, LEA doesn't actually read memory or dereference a pointer. Instead, it loads the first operand with an address specified by the second parameter. For example, "LEA EAX,[ESP+14]" takes the current value of the ESP register, adds 14 to it, and puts the result in EAX. LEA's primary use is to obtain the address of local variables and parameters. For example, in C++, if you use the & operator on a local variable or parameter, the compiler will likely generate an LEA instruction. As another example, "LEA EAX,[EBP-8]" loads EAX with the address of the local variable at EBP-8. A less obvious use of LEA is as a fast multiplication. For example, multiplying a value by 5 is relatively expensive. Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction. The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true.
Calling Procedures
Instruction CALL location Purpose Transfer control to another procedure Examples
CALL 00682568 CALL [00401234] CALL ESI CALL [EAX+24]
Description The CALL instruction doesn't need much explanation in itself. It pushes the address of the instruction following it onto the stack, then transfers control to the address given by the argument. The various ways of specifying a target address are worth mentioning, however. The simplest form of the CALL instruction is when the argument contains the destination address as an immediate value (for example, "CALL 00682568"). This type of call is almost always to another location within the same module (EXE or DLL). Slightly more complicated is when the CALL instruction indirects through an address (for example, "CALL [00401234]"). You'll see this form of CALL instruction when calling a function imported from another module. It's also seen when calling through a function pointer stored in a global variable. Two other forms of CALL instruction use registers as part of their address. If just a register name is specified (for example, "CALL ESI"), the CPU transfers to whatever address is in the register. If a register is used within brackets, perhaps with an additional displacement ("CALL [EAX+24]"), the instruction is calling through a table of function addresses. Where would these come from? You may know these tables by the more familiar name of vtables. In the preceding instruction example, the sixth member function is being called. (24 divided by the size of a DWORD is 6.) Instruction PUSH value Purpose Places a parameter onto the stack in preparation for calling procedure Examples
PUSH [00405234] PUSH [EBP+C] PUSH [EBP-14] PUSH EAX PUSH 12345678 ; Push a global variable ; Push a parameter ; Push a local variable ; Push whatever is in EAX ; Push an immediate value.
Description When it comes to passing parameters, all variations of the PUSH instruction are used by the compiler. Global variables, local variables, parameters, the results of a calculation, and immediate values can all be passed with a single instruction. When you see a sequence of PUSH instructions prior to a CALL instruction, the odds are good that the PUSHes are putting the parameters onto the stack. As mentioned earlier, if a member function or method is being called, the this or me pointer is usually passed last. In some cases, the this pointer is passed in the ECX register instead. You can identify when this occurs by looking for code that
initializes the ECX register and then does nothing with it before the CALL instruction. Instruction RET Purpose Return from a procedure call Examples
RET RET 8
Description The RET instruction returns from a procedure call. It simply pops whatever value is currently at [ESP] into the EIP (instruction pointer) register. The "RET XX" form does the same thing, and then adds XX to the ESP value. This is how __stdcall procedures clear parameters off the stack before returning to their caller. (Most Win32 APIs are __stdcall based.) By dividing the number of cleared bytes by four (the size of a DWORD), you can usually figure out how many parameters a procedure takes. For instance, a procedure that returns with a "RET 8" instruction takes two parameters. Functions that return an integer or pointer value usually return the value in the EAX register. By examining what's in EAX before executing the RET instruction, you can see the function's return value. Instruction ADD ESP, value Purpose Removes parameters off the stack Examples
ADD ESP,24
Description When calling procedures that don't remove parameters before returning, it's up to the calling function to remove its parameters. This is the case with cdecl functions, which is the default for C and C++ code. The "ADD ESP,XX" function bumps up the stack pointer so that any passed parameters are below the resulting ESP. If the function doesn't take a variable number of parameters, the "ADD ESP,XX" instruction gives insight to how many parameters the called procedure accepts. (See the description above for "RET XX".) If the called procedure takes a variable number of parameters (like printf and wsprintf do), the "ADD ESP,XX" instruction tells you how many parameters were passed for that particular CALL.
Flow Control
In the context of this column, flow control means code that affects which portions of a program's code are subsequently executed. At the simplest level, this means conditional execution (colloquially known as if statements). More complex flow control sequences such as while loops and for statements are usually built from the lower-level if statement constructs. In one case though (the LOOP instruction), the processor has built-in knowledge of these higher-level language constructs. Before I get to these instruction sequences, let me highlight two things that can easily trip you up. For starters, the term "Jcc" is used as a stand-in for any of the 16 conditional jump instructions. The cc means condition code. More insidiously, there are several sets of Jcc instructions that are aliases for one another. For example, JZ (Jump if Zero flag set) is the same instruction as JE (Jump if Equal). Likewise, JNZ (Jump if Zero flag NOT set) is the same instruction as JNE (Jump if Not Equal). Unfortunately, some disassemblers use the JZ/JNZ
form, while others use the JE/JNE form. Is this confusing? Yes! The moral of the story: be prepared to mentally substitute an aliased form of the instruction if it makes the code easier to understand. Sequence CMP value, value / Jcc location Purpose Compare two values, and branch accordingly Examples
CMP JE EAX,2 10036728
CMP JNE
[EBP+20],1000 00427824
Description The CMP instruction is used when two values are to be compared. The CMP instruction sets or clears a variety of flags, including the Zero, Sign, and Overflow flags. From this, a variety of Jcc instructions can then be used to branch accordingly. Most often, the JE and JNE instructions follow a CMP instruction. The following C++ code sequence would be implemented with a CMP / JNE sequence:
if ( MyVariable == 2 ) { // Whatever code you want }
If the CMP instruction determines that MyVariable isn't 2, the flag will be set so that the JNE instruction that follows will skip over the code in curly brackets.
Sequence TEST value, value / Jcc location Purpose Determine if a bit is set, and branch accordingly Examples
TEST JNZ EAX,EAX 00400124
TEST JZ
EDX,00400024 77f85624
Description The TEST instruction does a logical AND of the two arguments, which sets or clears the Zero flag in the EFLAGS register. The next instruction (JZ or JNZ) does a jump to the target address if the Zero flag is set or cleared, depending on the instruction used. If the JZ/JNZ doesn't jump, execution
continues at the following instruction. This sequence is typically used to test one or more bits as part of an if statement. For example, this C++ code could be implemented using a "TEST / JZ" sequence.
if ( MyVariable & 0x00400024 ) { // Whatever code you want }
If MyVariable has any of the same bitfields set as in the value 0x00400024, the Zero flag won't be set. This prevents the JZ instruction from jumping, and execution falls into the code in the curly brackets.
Instruction JMP location Purpose Transfer control to some other location Examples
JMP 10047820
if ( MyVariable == 2 ) { // some code // JMP past "else" code } else { // some other code
The second place where JMP instructions crop up is as part of a loop. At the end of the loop's code, some code sequence determines if it's time to break out of the loop. If the loop isn't finished, a JMP instruction transfers control back to the beginning of the loop's code. The third scenario where you'll see JMP instructions is when a procedure has a common exit sequence. That is, no matter how many return statements there are in the procedure, there's only one spot in the code that cleans up the stack frame and returns. In this situation, a return statement in the middle of the procedure's code is implemented as a JMP to the common exit sequence
code. It's also possible that you'll encounter a JMP instruction from a goto statement. Fortunately, most programmers don't bother with goto's anymore. Finally, if you see a JMP instruction that simply jumps to the next instruction, you're probably in code that wasn't compiled with optimizations enabled. Instructions LOOP, LOOPZ, LOOPNZ Purpose Purpose Jump back to the beginning of a loop's code, if conditions are right Examples
LOOP LOOPZ 00401234 65432108
Description The LOOP instruction uses the contents of the ECX register as a counter. Each invocation of the LOOP instruction decrements the ECX register. In the simplest case, the LOOP instruction branches back to the beginning of the instruction sequence if ECX isn't zero. The LOOPZ and LOOPNZ only branch if ECX is nonzero, and the Zero flag in EFLAGS is set accordingly. The C++ for loop construct can be implemented with the LOOP instruction if the number of iterations is known ahead of time. Before executing the actual code inside the loop, ECX is loaded with the number of iterations. At the end of the code inside the loop is a LOOP instruction. After the specified number of iterations, ECX becomes zero and the LOOP instruction doesn't branch.
Bitwise Manipulation
The bitwise instructions are used to turn individual bits on and off in a value. The value can be a global variable, a local variable, a parameter, or a register. Here, I'll show the two most common instructions, AND and OR. There's also an XOR instruction, but it's less commonly used. Instruction AND value,bitfield Purpose Performs a logical AND of the bitfields of two operands Examples
AND AND EAX,00001000 [ESI+4],00000004
Description Unlike the TEST instruction (see above), the AND instruction actually modifies the destination operand. For example, in C++, the statement
MyVar &= 0x00010001;
could be implemented as:
AND [MyVar],00010001h
The AND instruction is also used to turn off particular bitfields. To do this, the desired bits to be turned off are set to the off (zero) state in the source operand. All of the bits to be left alone are set to true in the source operand.
Instruction OR value,bitfield Purpose Performs a logical OR of the bitfields of the two operands Examples
OR EDX,10101010 OR [EBP+24],00080000
Description The OR instruction is used to turn on one or more bits in the destination operand. For example, the value of WS_VISIBLE is 0x10000000. The following C++ statement
wndStyle |= WS_VISIBLE;
would translate to something like this:
OR [wndStyle],10000000h
String Manipulation
The string instructions allow sequences of consecutive memory locations to be processed without branching after every operation. The instructions are able to do this because the CPU dedicates two registers (ESI and EDI) to point at the source and/or destination locations. After every operation, these registers are incremented or decremented based upon the CPU's direction flag. In the real world, the registers are rarely decremented, so from here on out I'll just say "incremented." When combined with the REP / REPE / REPNE class of instruction prefixes, very powerful code sequences can be implemented using only a single instruction. For example, with one string instruction, you can find the end of a null-terminated string (some setup and assembly required). In addition, the code executes much faster than if it were implemented as a series of instructions in a loop. Instruction SCASB, SCASW, SCASD Purpose Scan for a particular BYTE, WORD, or DWORD Examples
REPNE REPZ SCASB SCASD
Description These instructions scan consecutive locations in memory looking for a particular BYTE, WORD, or DWORD. Alternatively, they can be used to find the first occurrence of a value that's different from a target value. The BYTE, WORD, or DWORD target value is placed in the AL, AX, or EAX register. Each iteration of SCASx compares the contents of the AL/AX/EAX register to the memory pointed at by the EDI register, and sets EFLAGS accordingly. Afterwards, EDI is incremented. To search for the zero byte in an ANSI string, the AL register should be set to zero and EDI should be set to the beginning of the string. The ECX register is set to the maximum number of bytes to search. Finally, the REPNE SCASB instruction executes. The REPNE prefix causes the SCASB instruction to execute until one of two conditions is met. If ECX is zero, no zero byte was found in the entire string. Alternatively, a zero byte was found, and EDI points to the next byte in memory.
10
Instruction CMPSB, CMBSW, CMPSD Purpose Compares two strings in memory Examples
REPE CMPSB
Description These instructions are used to compare the BYTEs, WORDs, or DWORDs pointed to by the ESI and EDI registers with the EFLAGS set appropriately after the comparison. Each iteration of the CMPSx instruction causes the ESI and EDI registers to increment by the appropriate amount (one, two, or four bytes). It's not hard to see how the C++ memcmp function could be implemented by using the REPE prefix with the CMPSx instructions. The REPE prefix causes the CMPSx instruction to keep iterating while the two memory locations are equal and ECX is nonzero. The memcmp function could be implemented using "REPE CMPSB", although optimized code will use "REPE CMPSD" for the bulk of the string and "REPE CMPSB" for the last three or fewer bytes. Instruction MOVSB, MOVSW, MOVSD Purpose Moves BYTEs, WORDs, or DWORDs from the source string to the destination string Examples
REP MOVSD
Description The MOVSx instructions copy memory pointed to by ESI into the memory pointed at by EDI. After each iteration, ESI and EDI are incremented. Typically, MOVSx is used with the REP prefix to copy a predetermined number of BYTEs, WORDs, or DWORDs. The number to copy is specified in the ECX register. The C++ memcpy function can be implemented using "REP MOVSB". Instruction STOSB, STOSW, STOSD Purpose Sets a series of BYTEs, WORDs, or DWORDs to a specified value Examples
REP STOSB
Description The STOSx instructions copy the value in AL, AX, or EAX into the memory pointed to by the EDI register. Typically, STOSB is used with the REP prefix to copy the number of bytes specified in the ECX register. The C++ memset function can be implemented with "REP STOSB", or by a combination of "REP STOSD" and "REP STOSB".
Miscellaneous
In this final group are random instructions that you'll often encounter. Of the list, "XOR EAX,EAX" is most prevalent. Instruction XOR register, register Purpose Sets a register's value to zero Examples
XOR EAX,EAX
Description Using the XOR instruction to zero out a register takes less space than the equivalent MOV instruction. For example, "MOV EAX,0" takes five bytes, while "XOR EAX,EAX" uses only two bytes. Is using XOR twisted? Yes. But after years of stepping through assembly code, you too will automatically substitute "zero out the register" when you see this instruction.
11
Instruction MOVZX DWORD value, byte or word value Purpose Copies an unsigned value into a larger type Examples
MOVZX EAX,BYTE PTR [EBP+8] MOVZX EAX,WORD PTR [00451234]
Description In most languages, a value of a smaller type can be copied into or used in place of a larger type. For example, in C++ an unsigned char can be copied into an unsigned short (aka a WORD). Likewise, an unsigned short can be used where an unsigned long is expected. The compiler uses MOVZX (move with zero extend) to convert the smaller type into a larger type. In C++, BYTEs can be converted to WORDs or DWORDS, and WORDs can be converted to DWORDs. Instruction MOVSX DWORD value, byte or word value Purpose Copies a signed value into a larger type Examples
MOVSX EAX,BYTE PTR [EBP+8] MOVSX EAX,WORD PTR [77f81234]
Description In most languages, a value of a smaller type can be copied into or used in place of a larger type. For example, in C++ a char can be copied into a short. Likewise, a short can be used where a long is expected. The compiler uses MOVSX (move with sign bit extend) to convert the smaller type into a larger type. In C++, chars can be converted to shorts or longs, and shorts can be converted to longs. Instructions: MOV EAX,FS:[0], MOV FS:[0],ESP Purpose Establish a new structured exception handling frame Examples
MOV Push MOV EAX,FS:[00000000] EAX FS:[00000000h],ESP
Description In Win32, the FS register points to the Thread Environment Block (TEB). A data structure unique for each thread, the TEB contains values that the system uses to control the thread. At offset 0 in the TEB is a pointer to the first node in the structured exception handling chain. When you see code that uses FS:[0], it's usually setting up or tearing down a try block. Instruction MOV EAX,FS:[18] Purpose Makes a linear pointer to the TEB Examples
MOV EAX,FS:[18] MOV EAX,[EAX+24]
Description The TEB is always pointed to by the FS register. To make code portable, it's helpful to use a flat, linear address for the TEB. The TEB's linear address can be found at offset 0x18 in the TEB. Code that reads from FS:[18] is
12
preparing to read some other value from the TEB. Step through all three instructions in GetCurrentThreadId under Windows NT to see this for yourself. Instruction MOV ECX,FS:[2C] Purpose Makes a pointer to the Thread Local Storage (TLS) array Examples
ECX,DWORD PTR FS:[0000002C] EDX,DWORD PTR [ECX+EAX*4]
Description At offset 0x2C in the TEB is a pointer to the TLS array for the thread. This array contains 64 DWORDs, each corresponding to a particular index value that would be passed to TlsGetValue. Code that uses FS:[2C] is using TLS.
To The Code!
To show many of the instructions and sequences that I've described, I wrote the InstructionDemo program. A quick look at the source code in Figure 2 shows that the two functions don't do anything worthwhile. But the code is well commented, pointing out the particular instruction or instruction sequence it's designed to produce. I compiled InstructionDemo.CPP with the following command line:
CL InstructionDemo.CPP
I then disassembled the relevant parts of the executable and annotated the listing. Above each instruction or sequence is the C++ code responsible for it (see Figure 3). This is similar to what the Developer Studio IDE does when you select "Go To Disassembly" in the source window. Many of the instructions don't need explanation, but it's worthwhile to point out a few things. First, examine the instructions at offset 0x401000. They're establishing the stack frame for the procedure, including creation of space below the frame for local variables. If you look throughout the procedure, you won't see the EBX and ESI registers used, so the stack frame only preserves the EDI register. After a whole bunch of variable initialization instructions, notice that the signed type promotion (char to long) at offset 0x401040 requires two instructions. This is because (in the general case) the Intel architecture doesn't allow one instruction to reference two memory addresses. Therefore, the assignment must go through a register that acts as an intermediate location. Also interesting is the if statement starting at offset 0x40104D. After the code that executes when the expression evaluates to TRUE, note the JMP instruction at offset 0x0x401060. This JMP instruction makes the CPU skip over all the code for the else clause. A bit later (at offset 0x40106C), another if statement uses the TEST instruction to see if bitfields are set. In that sequence, the compiler treats the ECX register as a private, unnamed local variable. Examining the for loop at offset 0x4010A9 is interesting because of the way the compiler orders the initialization, termination condition, and post-iteration code. The MOV instruction at 0x4010A9 performs the initialization, and then control JMPs past the post-iteration code to get to the termination condition code. The termination condition code looks very similar to an if statement. If you understand what the code is doing here, you can see how a for statement could be rewritten using if and goto statements.
13
Starting at offset 0x4010E1, the code begins pushing parameters on the stack in preparation for calling printf. It's important to realize that the parameters are passed right to left. Note that there are two distinct LEA instructions. The first calculates the address of the szBuffer array, while the second calculates the address of the argc parameter. After the call to printf at offset 0x4010F9, the code cleans all the pushed parameters off the stack with the "ADD ESP,14" instruction. In the MySubProcedure code starting at offset 0x40112E, the stack frame setup is considerably more complex than the prior procedure's. The instructions like "PUSH 00405058" and "MOV EAX,FS:[00000000]" are building a frame for the structured exception handler code that results from using __try. Also, this time the stack frame setup code preserves all the register variable registers (EBX, ESI, and EDI). At offset 0x401154, the code modifies the TLS variable called tlsVariable. The "MOV ECX,DWORD PTR FS:[0000002C]" instruction loads the ECX register with a pointer to the array of 64 DWORDs that each thread uses for TLS. The next instruction uses an advanced addressing form to index into the array and read the slot corresponding to a particular TLS index. ECX contains the pointer to the array, while EAX contains the TLS index. The code multiplies EAX by four (the size of a DWORD), and adds it to the TLS array pointer.
14
Figure 2 InstructionDemo.CPP
//========================================== // Matt Pietrek // Microsoft Systems Journal, February 1998 // Program: InstructionDemo.CPP // FILE: InstructionDemo.CPP //========================================== #define WIN32_LEAN_AND_MEAN #include <windows.h> #include <stdlib.h> #include <stdio.h> // Force these functions inline (/O2 would normally do this #pragma intrinsic( memset, strlen, strcmp ) __declspec(thread) int tlsVariable = 0; // Make a thread local variable int g_myGlobalVariable; void MySubProcedure( void ); int main( int argc, char *argv[] ) { char szBuffer[128]; char *pszString = "Hello"; unsigned long localUnsignedLong unsigned char localUnsignedChar long localSignedLong = char localSignedChar = int i; g_myGlobalVariable = 0x12345678; localSignedLong = localSignedChar; // Conditional execution if ( localUnsignedLong == 2 ) localSignedLong = 1; else localSignedLong = 2; // Using TEST if ( localUnsignedLong & 0x00040008 ) i = 3; // AND'ing off bitfields localUnsignedLong &= 0x01020304; // OR'ing on bitfields localSignedLong |= 0x05060708; // LOOP code for ( i = 0; i < 4; i++ ) localUnsignedLong += i; // Procedure invocation printf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer ); // Using STOSD / STOSB memset( szBuffer, 0, sizeof(szBuffer) ); // Using SCASB i = strlen( szBuffer ); MySubProcedure( ); return 0; } void MySubProcedure( void ) { tlsVariable = 2; // Use of try/except code __try { g_myGlobalVariable = 2; } // Make a global variable
= 2; = 2; 2; 2;
// Assignment to global // signed type promotion
15
__except( EXCEPTION_EXECUTE_HANDLER ) { g_myGlobalVariable = 4; } }
16
Figure 3 InstructionDemo Mixed Source and Assembly

int main( int argc, char *argv[] ) { 401000: PUSH EBP 401001: MOV EBP,ESP 401003: SUB ESP,00000098 401009: PUSH EDI char *pszString = "Hello"; 40100A: MOV DWORD PTR [EBP-0000008C],00406030 unsigned long 401014: MOV unsigned char 40101E: MOV long 401025: char 40102F: localUnsignedLong = 2; DWORD PTR [EBP-00000088],00000002 localUnsignedChar = 2; BYTE PTR [EBP-00000094],02 localSignedLong = 2; DWORD PTR [EBP-00000084],00000002 localSignedChar = 2; BYTE PTR [EBP-00000098],02
MOV
MOV
g_myGlobalVariable = 0x12345678; // Assignment to global 401036: MOV DWORD PTR [004088E8],12345678 localSignedLong = localSignedChar; // signed type promotion 401040: MOVSX EAX,BYTE PTR [EBP-00000098] 401047: MOV DWORD PTR [EBP-00000084],EAX // Conditional execution if ( localUnsignedLong == 2 ) 40104D: CMP DWORD PTR [EBP-00000088],02 401054: JNE 00401062 localSignedLong = 1; MOV DWORD PTR [EBP-00000084],00000001 JMP 0040106C
401056: else 401060:
401062:
localSignedLong = 2; MOV DWORD PTR [EBP-00000084],00000002
// Using TEST if ( localUnsignedLong & 0x00040008 ) 40106C: MOV ECX,DWORD PTR [EBP-00000088] 401072: AND ECX,00040008 401078: TEST ECX,ECX 40107A: JE 00401086 i = 3; MOV
40107C:
DWORD PTR [EBP-00000090],00000003
// AND'ing off bitfields localUnsignedLong &= 0x01020304; 401086: MOV EDX,DWORD PTR [EBP-00000088] 40108C: AND EDX,01020304 401092: MOV DWORD PTR [EBP-00000088],EDX // OR'ing on bitfields localSignedLong |= 0x05060708; 401098: MOV EAX,DWORD PTR [EBP-00000084] 40109E: OR EAX,05060708 4010A3: MOV DWORD PTR [EBP-00000084],EAX // LOOP code for ( i = 0; i < 4; i++ ) 4010A9: MOV DWORD PTR [EBP-00000090],00000000 4010B3: JMP 004010C4 4010B5: 4010BB: 4010BE: 4010C4: 4010CB: MOV ADD MOV CMP JNL ECX,DWORD PTR [EBP-00000090] ECX,01 DWORD PTR [EBP-00000090],ECX DWORD PTR [EBP-00000090],04 004010E1
localUnsignedLong += i;
17
4010CD: 4010D3: 4010D9: 4010DF:
MOV ADD MOV JMP
EDX,DWORD PTR [EBP-00000088] EDX,DWORD PTR [EBP-00000090] DWORD PTR [EBP-00000088],EDX 004010B5
// Procedure invocation printf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer ); 4010E1: LEA EAX,[EBP-80] 4010E4: PUSH EAX 4010E5: LEA ECX,[EBP+08] 4010E8: PUSH ECX 4010E9: MOV EDX,DWORD PTR [EBP+08] 4010EC: PUSH EDX 4010ED: MOV EAX,DWORD PTR [EBP-00000088] 4010F3: PUSH EAX 4010F4: PUSH 00406038 4010F9: CALL 004011C0 4010FE: ADD ESP,14 // Using STOSD / STOSB memset( szBuffer, 0, sizeof(szBuffer) ); 401101: MOV ECX,00000020 401106: XOR EAX,EAX 401108: LEA EDI,[EBP-80] 40110B: REP STOSD // Using SCASB i = strlen( szBuffer ); 40110D: LEA EDI,[EBP-80] 401110: OR ECX,FF 401113: XOR EAX,EAX 401115: REPNE SCASB 401117: NOT ECX 401119: ADD ECX,FF 40111C: MOV DWORD PTR [EBP-00000090],ECX MySubProcedure( ); 401122: CALL 0040112E return 0; 401127: XOR } 401129: 40112A: 40112C: 40112D:
EAX,EAX
POP MOV POP RET
EDI ESP,EBP EBP
void MySubProcedure( void ) { 40112E: PUSH EBP 40112F: MOV EBP,ESP 401131: PUSH FF 401133: PUSH 00405058 401138: PUSH 004012F8 40113D: MOV EAX,FS:[00000000] 401143: PUSH EAX 401144: MOV DWORD PTR FS:[00000000],ESP 40114B: SUB ESP,08 40114E: PUSH EBX 40114F: PUSH ESI 401150: PUSH EDI 401151: MOV DWORD PTR [EBP-18],ESP tlsVariable = 2; 401154: MOV 401159: MOV 401160: MOV 401163: MOV __try { 40116D:
EAX,[004088EC] ECX,DWORD PTR FS:[0000002C] EDX,DWORD PTR [ECX+EAX*4] DWORD PTR [EDX+00000004],00000002
MOV
DWORD PTR [EBP-04],00000000
401174: 40117E: 401185:
g_myGlobalVariable = 2; MOV DWORD PTR [004088E8],00000002 MOV DWORD PTR [EBP-04],FFFFFFFF JMP 004011A1
__except( EXCEPTION_EXECUTE_HANDLER ) 401187: MOV EAX,00000001 40118C: RET
18
40118D:
MOV
ESP,DWORD PTR [EBP-18]
401190: } 40119A: } 4011A1: 4011A4: 4011AB: 4011AC: 4011AD: 4011AE: 4011B0: 4011B1:
g_myGlobalVariable = 4; MOV DWORD PTR [004088E8],00000004
MOV
DWORD PTR [EBP-04],FFFFFFFF
MOV MOV POP POP POP MOV POP RET
ECX,DWORD PTR [EBP-10] DWORD PTR FS:[00000000],ECX EDI ESI EBX ESP,EBP EBP
Wrap-up
In the real world, you will no doubt encounter instructions beyond what I've described here. But now you should be familiar with most of the commonly used registers and how memory is addressed. You should be able to tell a local variable apart from a parameter. You should also be able to distinguish these type classes from global and TLS variables. Beyond the basic theory, I've also shown a reasonably large subset of the instructions that Win32 compilers generate. It's unlikely that my introduction will enable you to start writing your code in MASM. Still, with this working knowledge, you can be more confident when your debugger takes you to dark, scary places in other people's code, especially when even the dim light of source code isn't available. Read more about assembly language in the June 1998 installment of Under the Hood.
19
Common Instructions
Instructions INC value, DEC value Purpose Increments or decrements integer value by 1 Example
INC ESI INC [EBP-8] DEC [EAX+4]
The INC and DEC instructions are used to increment and decrement values kept in memory or registers. As you might imagine, these instructions map precisely to the ++ and - - operators in C++ for standard integer operations. You could use the ADD or SUB instructions to achieve the same effect as INC and DEC, although it would be more expensive in terms of size. Since they are so commonly used, the smallest versions of the INC/DEC instructions take only a single byte. Looking at the Intel opcode map, you'll see that there's an opcode for each of the eight general-purpose registers that INC can be used against (EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP). Another eight opcodes are used for the DEC instruction and the same set of registers. Instructions MUL value, value DIV value, value Purpose Multiplication and division Example
MUL EAX,EDX MUL AL,BYTE PTR [EBP-14h] DIV EAX,EBX
I didn't cover the ADD and SUB instructions in my February column since their operation is straightforward. However, the MUL and DIV instructions have some quirks that make them difficult to read and downright quirky to write. Throughout this column, when I mention (E)AX, I'm referring to AL, AX, or EAX. Likewise, when I mention (E)DX, I'm referring to DL, DX, or EDX. Both MUL and DIV treat their operands as unsigned values. The operands can't be immediate values (such as 3); rather, they must be in registers or memory. You may have noticed that the destination value (the first argument) always seems to be (E)AX. This is by design. The use of the (E)AX register is an implicit part of the instruction. Beyond the implicit use of (E)AX, the (E)DX register is also silently involved. The high bits of the MUL instruction end up in (E)DX. Likewise, for the DIV instruction, E(DX) holds the remainder and (E)AX holds the quotient. If you write any assembler code, MUL and DIV get even weirder. The assembler (both MASM and the Visual C++ inline assembler) won't let you specify the (E)AX operand. Thus, if you want the instruction MUL EAX,ECX, you would write MUL ECXjust another example of the intuitive language syntax that's made assembly language wildly popular in recent years. Instructions IMUL value, value IDIV value, value Purpose Signed multiplication and division Example
IMUL WORD PTR [EBP+8] IMUL EDX,ECX,8
20
IDIV EAX,DWORD PTR [EDX]
The IMUL and IDIV instructions treat the operands as signed values. Contrast this to MUL and DIV, which work on unsigned values. IDIV uses (E)AX as the implicit first operand, just as DIV does. Also, like its DIV counterpart, IDIV only works with register or memory values. IMUL, on the other hand, doesn't fit the general patterns of MUL, DIV, and IDIV. It can work with immediate values and it can have a non-(E)AX register as the destination. There's even a form of the IMUL instruction that takes three operands. To my knowledge, this is the only instruction in the Intel opcode set with this distinction. Instructions PUSHAD, POPAD Purpose Saves or restores all general-purpose registers via the stack PUSHAD and POPAD push or pop EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI on the stack, in that order. These instructions are used in situations where many registers may be modified and the programmer wants to leave no evidence of the execution in the code. Although interrupt handlers are pass for most programmers, they're a perfect example of where PUSHAD and POPAD come in handy. Besides taking fewer opcodes than eight individual PUSH instructions, they also execute faster (five clock cycles on a Pentium). Instructions PUSHFD, POPFD Purpose Push or pop the EFLAGS register In some cases, it's inconvenient to use the flags set by a prior operation immediately. Alternatively, you may want to make sure that some operation you're about to execute won't change the current flag values. For these situations, PUSHFD and POPFD are the easiest methods to save and restore those bits. PUSHFD is one of the atomic components of an interrupt. When an interrupt or an exception occurs, the following code effectively executes:
PUSHFD, PUSH CS, PUSH EIP.
Following the three pushes, the EIP register changes to the interrupt handler address contained in the appropriate slot in the Interrupt Descriptor Table (IDT). Likewise, the IRETD effectively does a POPFD as part of returning from an interrupt. Instructions SHL, SHR, SHLD, SHRD Purpose Shift bits to the left or right Example
SHL EBX,3 SHR EBX,CL SHLD EDX,ECX,4 SHRD ESI,EDI,CL
The SHL and SHR instructions are logically equivalent to the C++ << and >> operators. Many of you probably recall that bitwise shifting is a quick way to perform multiplication and division by powers of 2. For example, the SHL EBX,3 instruction has the same effect as multiplying EBX by 8 (23 == 8). Indeed, if you write C++ code that multiplies or divides an unsigned value by 2, 4, 8, 16, and so on, it will most likely compile to a SHL instruction. When shifting left, the low-order bits are filled with zeroes. The final high-order bit that's "shifted out" is moved to the carry flag (CF). In other words, the carry
21
flag is like a virtual 33rd bit. When shifting right, the high-order bits are filled with zeroes, and the last bit shifted out moves to the carry flag. Instruction ADD [EAX],AL Purpose None You may see a lot of this particular instruction, and you'll probably see it repeated. However, ADD[EAX], AL has no special significance. The opcode bytes for this instruction are 00 00. In other words, it's what you'll see if you're viewing a series of data bytes that all contain the value 0. Nothing to see here. You can all go home now. Instruction CLD Purpose Clears the direction flag In my February 1998 column, I described the string instructions LODSx, SCASx, STOSx, and MOVSx. Each of these instructions uses the ESI or EDI register to point at the memory to be read or written to. These instructions are typically used in conjunction with the REP, REPE, or REPNE prefixes, which cause the string instruction to execute several times until some specific condition is met. After each REPx-induced iteration, the CPU changes the ESI or EDI register to point to an adjacent memory location. The direction in which the registers move is given by the direction flag. If the direction flag is clear, ESI or EDI is incremented after each instruction (thus causing the next higher memory location to be referenced in the next iteration). When the direction flag is set, ESI or EDI decrements after each iteration. Most of the time it's easiest to work moving forward in memory (toward higher addresses) so that the direction flag is usually clear. However, it's generally not safe to assume that the flag is clear. Thus, you'll often see the CLD instruction somewhere before a string operation such as REP MOVSB. Instructions NOT value, NEG value Purpose Negation of values Example
NOT DWORD PTR [EBP-8] NEG EDX
The NOT instruction does ones-complement negation. That is, it applies the NOT operation to each bit in the operand. An initial value of 0 will become 0xFFFFFFFF after a NOT instruction. The C++ ~ operator is typically implemented via the NOT instruction. The NEG instruction does twos-complement negation. (If you're not 100 percent up on ones versus twos-complement negation, don't feel bad. I learned this stuff 10 years ago in college, and I've completely forgotten it!) An easier way to think of the NEG instruction is that it puts a - sign in front of the value. Thus, using NEG on -3 yields 3, while NEG applied to 4 yields -4. To summarize, you can think of NOT as affecting individual bits, while NEG operates on the entire value. Instruction NOP Purpose No operation The NOP instruction does nothing and affects nothing. It's a single-byte opcode that executes in one clock cycle and is primarily used to pad code. For example, a compiler might want the beginning of a procedure to start on a 16- byte boundary. The compiler/linker would insert enough NOP instructions between the end of one procedure and the beginning of the next procedure to create the desired alignment. If you're confident in your assembler abilities, the NOP instruction can be applied to code in memory or in the executable file. You might know that some
22
instruction you're about to execute will cause a fault in a debugger. If you want to skip that instruction, use the debugger to write enough NOP opcodes (0x90) to eliminate the instruction. This is useful to squash hardcoded INT 3 breakpoint instructions while you're running under the debugger, effectively not stopping at the breakpoint. Really advanced users can implement NOP instructions to obliterate entire regions of code in an executable. (Warning! Harder than it looks.) Another advanced use of the NOP instruction is when you want to make it easy to patch or hook into your code. At the beginning of a procedure or block of code, put in enough NOP instructions for the desired goal. Subsequent patching or hooking code can write JMPs, CALLs, or whatever into the NOP area. Instruction INT 3 Purpose Debugger interrupt INT 3 has two usesone intended by the original CPU designers, the other accidental. The INT 3 instruction is the standard method to suspend a program and transfer control to a debugger. In normal use, programs don't include INT 3 instructions in their code. Rather, when you set a traditional breakpoint with a debugger, it temporarily overwrites the target instruction with an INT 3 instruction. (The LODPRF32 program from my July 1995 column illustrates this.) Note that an INT 3 instruction is the heart of the DebugBreak API for Intel CPUs. The other offbeat use of the INT 3 instruction is as a paranoid NOP. In those cases where a NOP would be used for padding (and theoretically never executed), an INT 3 can be used instead. Like NOP, an INT 3 instruction is only a single byte. The key difference is that if a bug crept in and you executed the INT 3 instruction, you'd pop into the debugger. In the same scenario, the CPU would blithely sail through NOP instructions and wreak havoc someplace farther away from the original error. The Microsoft linker uses INT 3s as paranoid NOPs when creating padding for incremental linking. The linker also uses them as padding between procedures it wants to align on a particular memory boundary. Usually this alignment is on a multiple of 16 bytes unless you have the "optimize for size" compiler option set. Figure 1 shows a section of code from CALC.EXE that illustrates INT 3 padding in action.
Figure 1
1285EC4: 1285EC5: 1285EC6: 1285EC7: 1285EC8: 1285EC9: 1285ECA: 1285ECB: 1285ECC: 1285ECD: 1285ECE: 1285ECF: 1285ED0: 1285ED7: 1285ED9: 1285EDE: 1285EE2: 1285EE3: 1285EE8: 1285EEB: 1285EF0: 1285EF6: 1285EF9: 1285EFA: 1285EFB: 1285EFC: 1285EFD: INT INT INT INT INT INT INT INT INT INT INT INT CMP JNE CALL MOV PUSH CALL ADD PUSH CALL ADD RET INT INT INT INT
INT 3 Padding
3 3 3 3 3 3 3 3 3 3 3 3 DWORD PTR [0128F4E8],01 01285EDE 012875B0 EAX,DWORD PTR [ESP+04] EAX 012875F0 ESP,04 000000FF DWORD PTR [0128F4E4] ESP,04
3 3 3 3
23
1285EFE: 1285EFF: 1285F00: 1285F04: 1285F09:
INT INT MOV MOV RET
3 3 EAX,DWORD PTR [ESP+04] [0128F4F0],EAX
Instruction LOCK Purpose This instruction locks the memory bus during the next instruction Example
LOCK INC DWORD PTR [EDX+04]
Technically speaking, LOCK is an instruction prefix rather than an instruction in its own right. In a multiprocessor environment, multiple processors could access the same memory location at the same time. The LOCK prefix insures that the instruction associated with it will have exclusive access to the destination memory location. If you've ever examined the EnterCriticalSection API, you'll see that if the critical section isn't currently held, the code essentially just increments a counter. A LOCK prefix is used with an INC instruction to guarantee that one thread won't increment the counter while another thread on another CPU is reading it. You'll also see the LOCK instruction used with multiprocessor synchronization APIs such as InterlockedExchange and InterlockedIncrement. A final thought on the LOCK prefix: you may recall a bug on older Pentium CPUs where a particular instruction sequence could cause the CPU to freeze up. (See the February 1998 Editor's Note if you need a refresher.) That instruction sequence isn't a valid sequence, and the LOCK prefix plays a vital role in the ensuing CPU meltdown.
Common Instruction Sequences

Sequence CMP register_X, immediate_value_A JE XXXXXXXX CMP register_X, immediate_value_B JE XXXXXXXX Purpose C++ switch statement Example
CMP EAX,1 JE 00400248
CMP EAX,3 JE 0040026E
CMP EAX,7 JE 004002A0
This sequence (compare and JMP if equal) is the most straightforward encoding of a C++ switch statement that I've seen. It's also very easy to pick out when you encounter it in a debugger. In the example code, the switch statement would look something like this:
24
switch ( value ) { case 1: // code for case 1 case 3: // code for case 3 case 7: // code for case 7 }
The trick to understanding this code sequence is realizing that compilergenerated code for switch statements usually differs from your mental model. The code for all the case comparisons is usually generated in one place. Following the value comparison code are discrete blobs of code that implement the code specified for a particular case. The value comparison code is optimized to quickly figure out just which case blob to jump to. By no means is this sequence the only encoding for switch statements. More efficient encodings may involve JMP tables or subtractive countdowns using the zero flag. However, these encodings definitely don't fit into my criteria of "just enough to get by." Sequence opcode [register+offset] Purpose Structure member access Example
PUSH [EAX+157C] MOV ADD EAX,[ESI+34] [EAX+44],ESI
Here's a common scenario: you have a pointer to a structure or class instance with which you read, write, or otherwise manipulate some field. In this situation, the compiler typically puts the pointer value into a register. The offset of the specified field within the structure is then added to the register. For instance, consider this structure:
struct Foo { int short char } i; j; k;
If you had a pointer to an instance of this structure and wanted to add 2 to each structure member, the code would look something like this (assuming ESI points to the structure instance):
ADD DWORD PTR [ESI],2
;; Foo.i
25
ADD WORD PTR ADD BYTE PTR
[ESI+4],2 ;; Foo.j [ESI+6],2 ;; Foo.k
Note that for the first structure field (i ), the field offset is 0, so no addition is needed. The i field is 4 bytes long, placing the next field (j) at offset 4. The j field is a short, so it's only two bytes long. The final field (k) is at offset 6, which I arrived at by adding 4 and 2. Compilers must place structure fields into memory locations in exactly the same sequence as the structure is declared. Thus, you can usually look at any structure or class definition and figure out the offsets of various fields. Be aware that compilers often place padding between structure fields so that each field starts at some natural boundary (typically 4 or 8 bytes). Using #pragma pack lets you specify the exact padding (or lack thereof) in your structure definitions. Sequence MOV value,EAX, many times in a row Purpose Serial initialization of several variables to the same value Example
MOV EAX,0 MOV [EBP-4],EAX MOV [EBP-10],EAX MOV [EBP-18],EAX
When a collection of variables is assigned the same value, the compiler may load the value into a register and copy the register into each of the variables. For example, at the beginning of a function you might initialize several int variables to the value 0. The example code sequence shows one way this might be encoded. Sequence CMP register_X,01 SBB register_X, register_X NEG register_X Purpose Converts 0 input value to 1, all other values to 0 Example
CMP EAX,01 SBB EAX,EAX NEG EAX
In many cases, generated code needs to inspect a value to determine if it's 0. If so, the result of the inspection should be nonzero (typically 1). If the input value is any value other than 0, the result should be 0. Using 0 to mean Boolean FALSE, and everything else being TRUE, this instruction sequence does a logical NOT of the input value. The code comprising this instruction sequence certainly isn't intuitive. Its distinctive characteristic is the use of the SBB instruction (integer subtraction with borrow). SBB is rarely used outside of this sequence. The first instruction (CMP) sets or clears the carry flag as appropriate. SBB then uses the carry flag as part of its subtraction. Since the two arguments to SBB in this sequence are always the same, the carry flag alone determines the outcome (which is always 0 or -1). The NEG instruction finishes up by changing a -1 to a 1 and leaving 0 values alone.
26
Oops! How did I Get Here?

Let's examine some of the common clues you can look for when something faults and you're rudely popped into the debugger. Think of this as a first aid quick reference. You won't find instructions on surgery here, but the common cuts and scrapes can be dealt with. Picture this scenario: everything is working fine until suddenly your program stops in the debugger because of a fault, and none of the code looks familiar. Never fear. The faulting address usually yields some sort of information that steers you toward a resolution. One of the more common and easy to find bugs is calling through a NULL function pointer. The signature characteristic of this bug is that the instruction pointer (EIP) is 0 or very close to 0. Under Windows NT, the first 64KB of the address space is off limits, so the fault occurs exactly at address 0. In Windows 95, it's slightly more tricky. Memory at address 0 is accessible, but it's certainly not code. In this case, the faulting address may or may not be 0. However, the faulting address will almost certainly be just a little bit higher (for example, 0x00000003). When this happens, the CPU miraculously manages to execute one or two "instructions" before it hits something that triggered a fault. Regardless of where you faulted, the vital information you need to know is: where were you executing before the NULL function pointer was called? In these situations the stack window may not be helpful, since the calling routine almost certainly won't appear in the stack window. This is a by-product of the way call stacks are walked. (See my May 1997 column for details on stack walking.) Luckily, when a NULL pointer call happens, there is a way to see where you came from. A CALL instruction pushes a return address on the stack. If you can find this return address, you can change the code window to display at that location. To find the return address, use the data window to display memory starting at the ESP value. Make sure that the memory is being displayed in the DWORD format. The first DWORD at ESP is most likely the return address. Remember, the return address you obtain will be for the instruction after the bad CALL instruction. You'll need to back up in the code window to see the code that led up to the CALL. In Figure 2, I've shown a NULL function pointer fault in the Visual Studio debugger. In the register window, the ESP value is 0x12FF7C. This is the same value that I've changed the data pane to display in DWORD format. The left column is the memory address. The second DWORD at the top (0x00401009) is the return address.
27
Figure 2 A NULL Pointer Fault
Incidentally, if the DWORD at ESP doesn't turn out to be a valid return address, it's certainly worth your while to look further up on the stack for values that look like they could be return addresses. If something looks like a valid address, change the code window to display at that address and see if you can make sense of it. If your ESP register is bogus, try looking for return addresses at positive offsets from the EBP register. Remember, this isn't an exact science. You're sifting through the rubble, looking for something that will give you a clue as to where you'll start doing more in-depth investigation. Moving away from NULL function pointers, let's say you've faulted in some code that you don't recognize, but the faulting address is nowhere near 0. What's worse, the code looks like garbage. In other words, it doesn't look like the normal instructions you'd see. Instead, you see instructions such as ARPL, AAA, and OUTSB. There are two likely ways your code got there. First, you may have called through a corrupted function pointer. Second, you may have corrupted the return address on the stack. When the RET instruction executed, control transferred to the bogus address. In either situation, the underlying problem is valid code addresses that were overwritten with garbage. In this case, your chance of getting a valid return address is lessened. However, you may be able to get an idea of what happened by looking at the faulting address. Try interpreting the fault address as a stream of datayou may find a pattern.
28
Figure 3 HoseStack.cpp
#include <string.h> #include <stdio.h> int main() { char szBuffer[4]; strcpy( szBuffer, "Hello World!\n" ); printf( szBuffer ); return 0; }
Figure 3 shows the code for a small Hello World program with a big bug. The szBuffer array is only four characters wide, while the strcpy function copies the whole 13 bytes of "Hello World!" This buffer overrun actually overwrites the stack frame where function main's return address is stored. When I run the program, it correctly prints out "Hello World!," but then faults at address 0x21646C72. The faulting address yields a clue if you think of the address as a pattern of bytes. In memory, 0x21646C72 is stored as four sequential bytes: 0x72, 0x6C, 0x64, and 0x21. Note that each of these values is above 0x20, and below 0x80. That happens to be the range of printable ASCII characters. Looking up the four bytes in an ASCII table, you get
0x72 = 'r' 0x6C = 'l' 0x64 = 'd' 0x21 = '!'
As you can see, those four bytes form the end of the string "Hello World!" You could then search your code for places where rld! appears. While not a perfect answer, you'll have substantially narrowed down the places to begin an initial search for the problem. Admittedly, this is a contrived example and there are tools available that find these types of memory overwrites. Nonetheless, I've found many obnoxiously difficult bugs only because I noticed a familiar pattern in the corrupted data.
Other Common Causes of Faults Figure 4 String Instructions and Registers

MOVSB, MOVSW, MOVSD SCASB, SCASW, SCASD STOSB, STOSW, STOSD LODSB, LODSW, LODSD Writes to ESI, reads from EDI Reads from EDI Writes to EDI Reads from ESI
29
Common sources of faults are the string instructions shown in Figure 4. Usually string instructions were either given bad data to start with or they operated past their intended range of memory. Remember, these string instructions implicitly use ESI, EDI, or both registers. They're almost always used with a REP, REPE, or REPNE prefix, which causes the instruction to execute multiple times with the registers incrementing or decrementing after each iteration. Tracking down the core cause of a fault from one of these string instructions is almost always trivial. Figure 4 shows which registers the instructions use. Regardless of the particular instruction in the group, the registers are pointer values. It's immediately noticeable if a NULL pointer is the culprit. For example, if the faulting instruction is REP STOSB and you see that EDI is 0, you know that the CPU was trying to write using a NULL pointer. If the registers in question aren't 0, check if their value is a multiple of 4KBthe size of a page on Intel CPUs. It's entirely possible that the instruction has executed successfully a number of times until the ESI or EDI register pointed to a page of memory that's not accessible. An easy way to know if you're on a page boundary is to look at the bottom three digits of the hex address. If they are 000, you're on a page boundary. You can double-check this invalid memory diagnosis by trying to display memory at the value of ESI or EDI. If the debugger can't see it, your code can't either. I'm assuming you're using an application debugger such as Visual Studio. If you're using a system-level debugger, this may not be true since the memory may only be visible from kernel mode. On the other hand, if you're using a system-level debugger, you probably already know how to track down this kind of problem. If you use recursive functions (or just lots of stack space), stack faults might plague you. Unfortunately, the operating system and debugger don't go out of their way to clarify that it's a stack overflow problem. For example, Figure 5 shows a very simple program that recurses until it runs out of stack space and faults. Figure 6 shows the none too helpful fault dialog that results.
Figure 5 RecursionOverflow.cpp
int foo( int i ) { return foo( i ); } int main() { return foo( 2 ); }
Figure 6 An Unhelpful Fault Dialog
If you select Cancel to debug, the Visual Studio debugger briefly tells you that a stack overflow occurred, but not at the same time as it shows you the faulting instruction. However, there are clues you can infer from the debugger that would indicate a stack overflow. For starters, the ESP register value is probably on a 4KB boundary. Likewise, the faulting instruction is probably a PUSH. There are
30
other ways to cause a stack fault, but most of the time it will look something like what I've described. While I'm on the subject of the stack, my final tidbit this month is on problems caused by PUSHing or POPing too much data to or from the stack. When whole programs were written in assembly language, programmers spent a lot of time matching up every PUSH instruction with an equivalent POP or ADD ESP,XX instruction. However, since compilers are so widespread, this tedious process isn't normally necessary. Believe it or not, it's still sometimes necessary to verify that what's pushed on the stack eventually gets removed. For example, if the code for calling a __stdcall function places two DWORD values on the stack, the called function should end with a RET 8 instruction. Likewise, if you see a __cdecl function being called with three DWORD parameters, there should be an ADD ESP,0Ch instruction following the call. More importantly, the called function should return with a simple RET instruction. If you're not familiar with __cdecl versus __stdcall functions, see my February 1998 column. These kinds of stack parameter mismatch problems can be minimized by following a few simple rules. First, make sure that there's only one prototype for any given function. Put that prototype in a .H file, never in a .C or .CPP file. Finally, make sure that the source file that actually defines the function includes the .H file. If you follow all these steps, you'll get a compiler or linker error rather than a bogus program. I've seen programmers cheat by including prototypes for just one or two functions in their code modules. (You know who you are!) These functions have a prototype in a .H file, but the programmer doesn't want to incur the overhead of bringing in a whole .H file for just a few items. Inevitably something changes and the programmer ends up counting PUSHs, POPs, and ADD ESPs because the code crashes.
Wrap-up
I use "DUMPBIN /DISASM filename.obj" to look at the code generated by the C++ compiler. However, Paul DiLascia (my fellow MSJ columnist) mentioned that Visual C++ has a compiler switch, /Fas, that produces an .ASM file from the input C++ code. The .ASM file that is generated contains all the necessary blood and guts that go along with hardcore assembler programming. Although you may never need to program in assembler, it's always enlightening to see what your tools are doing under the hood. Have a question about programming in Windows? Send it to Matt at mpietrek@tiac.com.
31

005 Pietrek Matt Just Enough Assembly To Survive Kit HOOD SVE

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

005 Pietrek Matt Just Enough Assembly To Survive Kit HOOD SVE

Încărcat de

Drepturi de autor:

Formate disponibile

Figure 1

Common Intel x86 Registers

EDX ESI EDI

Procedure Entry and Exit

POP ESI POP EBX

if ( MyVariable == 2 ) { // Whatever code you want }

if ( MyVariable & 0x00400024 ) { // Whatever code you want }

MyVar &= 0x00010001;

could be implemented as:

would translate to something like this:

// Assignment to global // signed type promotion

__except( EXCEPTION_EXECUTE_HANDLER ) { g_myGlobalVariable = 4; } }

Figure 3 InstructionDemo Mixed Source and Assembly

401056: else 401060:

localSignedLong = 2; MOV DWORD PTR [EBP-00000084],00000002

DWORD PTR [EBP-00000090],00000003

4010CD: 4010D3: 4010D9: 4010DF:

MOV ADD MOV JMP

POP MOV POP RET

EDI ESP,EBP EBP

DWORD PTR [EBP-04],00000000

401174: 40117E: 401185:

__except( EXCEPTION_EXECUTE_HANDLER ) 401187: MOV EAX,00000001 40118C: RET

ESP,DWORD PTR [EBP-18]

g_myGlobalVariable = 4; MOV DWORD PTR [004088E8],00000004

DWORD PTR [EBP-04],FFFFFFFF

MOV MOV POP POP POP MOV POP RET

IDIV EAX,DWORD PTR [EDX]

1285EFE: 1285EFF: 1285F00: 1285F04: 1285F09:

INT INT MOV MOV RET

3 3 EAX,DWORD PTR [ESP+04] [0128F4F0],EAX

Common Instruction Sequences

CMP EAX,3 JE 0040026E

CMP EAX,7 JE 004002A0

struct Foo { int short char } i; j; k;

ADD DWORD PTR [ESI],2

ADD WORD PTR ADD BYTE PTR

[ESI+4],2 ;; Foo.j [ESI+6],2 ;; Foo.k

Oops! How did I Get Here?

Figure 2 A NULL Pointer Fault

0x72 = 'r' 0x6C = 'l' 0x64 = 'd' 0x21 = '!'

Other Common Causes of Faults Figure 4 String Instructions and Registers

Figure 6 An Unhelpful Fault Dialog

S-ar putea să vă placă și