Sunteți pe pagina 1din 67

Code Generation : Assembler syntax

NASM

The NASM assembler is an open source project to develop a Net-wide Assembler. The assembler is included as standard in most Linux distributions and is available for download to run under Windows. It provides support for the full Intel and AMD SIMD instruction-sets and also recognises some extra MMX instructions that run on Cyrix CPUs. NASM provides support for multiple object module formats from the old MS-DOS com les to the obj and elf formats used under Windows and Linux. If you are programming in assembler, NASM provides a more complete range of instructions, in association with better portability between operating systems than competing assemblers. Microsofts MASM assembler is restricted to Windows. The GNU assembler, as, runs under both Linux and Windows, but uses non-standard syntax which makes it awkward to use in conjunction with Intel documentation.

It is beyond the scope of this course to provide a complete guide to assembler programming for the Intel processor family. Readers wanting a general background in assembler programming should consult appropriate text books in conjunction with the processor reference manuals published by Intel and AMD .

General instruction syntax


Assembler programs take the form of a sequence of lines with one machine instruction per line. The instructions themselves take the form of an optional label, an operation code name conditionally followed by up to three comma separated operands. For example:
l1: SFENCE PREFETCH [100] MOVQ MM0,MM1 PSHUFD XMM1,XMM3,00101011b ; ; ; ; 0 1 2 3 operand operand operand operand instruction instruction instruction instruction

As shown above, a comment can be placed on an assembler line, with the comment distinguished from the instruction by a leading semi-colon. The label, if present is separated from the operation code name

General instruction syntax


by a colon.

Case is signicant neither in operation code names nor in the names of registers. Thus prefetch is equivalent to PREFETCH and mm4 equivalent to MM4.

In the NASM assembler, as in the original Intel assembler, the direction of assignment in an instruction follows high level language conventions. It is always from right to left1, so that MOVQ MM0,MM4 is equivalent to MM0:=MM4 and ADDSS XMM0,XMM3 is equivalent to
1

If you chose to use the GNU assembler as, instead of NASM you should be aware that this follows the opposite convention of left to right assignment. This is a result of as having originated as a Motorola assembler that was converted to recognise Intel opcodes. Motorola follow a left to right assignment convention.

XMM0:= XMM0 + XMM3

Operand forms
Operands to instructions can be constants, register names or memory locations. Constants Constants are values known at assembly time, and take the form of numbers, labels, characters or arithmetic expressions whose components are themselves constants.

Constant forms
The most important constant values are numbers. Integer numbers can be written in base 16, 10, 8 or 2. mov mov mov add mov xor al,0a2h bh,$0a2 cx,0xa2 ax,101 bl,76q ax,11010011b ; ; ; ; ; ; base base base base base base

16 leading zero requ 16 alternate notatio 16 C style 10 8 2

Floating constants
Floating point constants are also supported as operands to store allocation directives : dd 3.14156 dq 9.2e3 It is important to realise that due to limitations of the AMD and Intel instruction-sets, oating point constants can not be directly used as operands to instructions. Any oating point constants used in an algorithm have to be assembled into a distinct area of memory and loaded into registers from there.

Labels

Constants can also take the form of labels. As the assembler program is processed, NASM allocates an integer value to each label. The value is either the address of the operation-code prexed by the instruction or may have been explicitly set by an EQU directive:
Fseek equ 23 Fread equ 24

We can load a register with the address refered to by a label by including the label as a constant operand:
mov esi, sourcebuf

Using the same syntax we can load a register with an equated constant:

Labels

mov cl, fread

Constant expressions
Suppose there exists a data-structures for which one has a base address label, it is often convenient to be able to refer to elds within this structure in terms of their offset from the start of the structure. Consider the example of a vector of 4 single precision oating point values at a location with label myvec. The actual address at which myvec will be placed is determined by NASM, we do not know it. We may know that we want the address of the 3rd element of the vector: mov esi, myvec + 3 *4 will place the address of this word into the esi register.

Constant expressions
NASM allows one to place arithmetic expressions whose sub-expressions are constants wherever a constant can occur. The arithmetic operators are written C style as shown below. operator means operator means | or + add subtract ^ xor & and * multiply << shift left / signed division >> shift right // unsigned division % modulus %% unsigned modulus

Registers
You should be aware that in the Intel architecture a number of registers are aliased to the same state vectors, thus for example the eax, ax, al, ah registers all share bits. More insidiously the oating point registers ST0..ST7 not only share state with the MMX registers, but their mapping to these registers is dynamic and variable.

register classes

byte word dword number reg reg reg Aliased 0 al ax eax cl bx ecx 1 2 dl cx edx 3 bl bx ebx 4 ah sp esp 5 ch bp ebp dh si esi 6 7 bh di edi

oat nnx reg reg Aliased st0 mm0 st1 mm1 st2 mm2 st3 mm3 st4 mm4 st5 mm5 st6 mm6 st7 mm7

sse reg xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Memory Locations
Memory locations are syntactically represented by the use of square brackets around an address expression thus: [100], [myvec], [esi] all represent memory locations. The address expressions, unlike constant expressions, can contain components whose values are not known until program execution. The nal example above refers to the memory location addressed by the value in the esi register, and as such, depends on the history of prior computations affecting that register.

Addresses
Address expressions have to be encoded into machine instructions, and since machine instructions, although of variable length on a CISC are nonetheless nite, so too must the address expressions be. On Intel and AMD machines this constrains the complexity of address expressions to the following grammer:
memloc::= address | format address format::= byte | word | dword | qword address::= [ const ] | [ aexp ] | [ aexp + const ] aexp::= reg | reg + iexp iexp::= reg | reg * scale scale::= 2|4|8 reg::= eax | ecx | ebx | edx | esp | ebp | esi | edi const::= integer | label

Examples
byte[edx] byte pointed to by edx dword[edx+10] 4 byte word pointed to by edx+10

dword[edx+esi+34] 4 byte word at the address given by the sum of the esi and edx registers +34 word[eax+edi*4+200] 2 byte word at the address given by the eax register + 4* edi register + 200 The format qualiers are used to disambiguate the size of an operand in memory where the combination of the operation code name and the other nonmemory operands are insufcient so to do.

Sectioning
Programs running under Linux have their memory divided into 4 sections: text is the section of memory containing operation codes to be executed. It is typically mapped as read only by the paging system. is the section of memory containing initialised global variables, which can be altered following the start of the program. is the section containing uninitialsed global variables. is the section in which dynamically allo-

data

bss stack

Sectioning
cated local variables of subroutines are located.

The section directive is used by assembler programers to specify into which section of memory they want subsequent lines of code to be assembled. For example in the listing shown in algorithm 1 we divide the program into three sections: a text section containing myfunc, a bss section containing 64 undened bytes and a data section containing a vector of 4 integers. The label myfuncbase can be used with negative offsets to access locations within the bss, wilst the label myfuncglobal can be used with positive offsets to access elements of the vector in the data sec-

Algorithm 1 Examples of the use of section and data reservation directives section .text global myfunc myfunc:enter 128,0 ; body of function goes here leave ret 0 section .bss alignb 16 resb 64 ; reserve 64 bytes myfuncBase: section .data myfuncglobal: ; reserve 4 by 32-bit integers dd 1 dd 2 dd 3 dd 5

tion.

Data reservation
Data must be reserved in distinct ways in the different sections. In the data section, the data denition directives db, dw, dd, and dq are used to dene bytes, words, doublewords and quad words. The directive must be followed by a constant expression. When dening bytes or words the constant must be an integer. Doublewords and quadwords may be dened with oating point or integer constants as shown previously. In the bss section the directive resb is used to reserve a specied number of bytes, but no value is associated with these bytes.

Stack data
Data can be allocated in the stack section by use of the enter operation code name. This takes the form: enter space, level It should be used as the rst operation code name of a function. The level parameter is only of relevance in block structured languages and should be set to 0 for assembler programming. The space parameter species the number of bytes to be reserved for the private use of the function. Once the enter instruction has executed, the data can be accessed at negative offsets from the ebp register.

Releasing stack space dynamically


The last two instructions in a function should, as shown in algorithm be leave ret 0 The combined effect of these is to free the space reserved on the stack by enter, and pop the return address from the stack. The parameter to the operation code name ret is used to specify how many bytes of function parameters should be discarded from the stack. If one is interfacing to C this should always be set to 0.

Label qualication
The default scope of a label is the assembler source le containing the line it prexes. But labels can be used to mark the start of functions that are to be called from C or other high level languges. To indicate that they have scope beyond the current asscembler le, the global directive should be used as shown in algorithm 1. The converse case, where an assembler le calls a function exported by a C program is handled by the etern directive: extern printreal call printreal

Label qualication
in the above example we assume that printreal is a C function called from assembler.

Linking and object le formats


There are 4 object le formats that are commonly used on Linux and Windows systems as shown in table ??. This lists the name of the format, its le extension - which is often ambiguous and the combination of operating system and compiler that makes use of it. A ag provided to Nasm species which format it should use. We will only go into the use of the gcc compiler, since this is portable between Windows and Linux.

Let us assume we have a C program called c2asm.c and an assembler le asmfromc.asm. Suppose we wish to combine these into a single executable module c2asm. We issue the following commands at the console: nasm -felf -o asmfromc.o asmfromc.asm gcc -oc2asm c2asm.c asmfromc.o This assumes that we are working either under Linux or under Cygwin. If we are using djgpp we type: nasm -fcoff -o asmfromc.o asmfromc.asm

gcc -oc2asm c2asm.c asmfromc.o Format Extension Operating System C++ Compiler win32 .obj Windows Microsoft C++ obj .obj Windows Borland C++ coff .o Windows Djgpp gcc .elf .o Windows Cygwin gcc .elf .o Linux gcc

Leading underbars
If working with djgpp all external labels in your program, whether imported with extern or imported with global must have a leading underbar character. Thus to call the C procedure printreal one would write: extern _printreal call _printreal whilst to export myfunc one would write global _myfunc _myfunc:enter 128,0

Useful x86 instructions


This is a very small subset of the available instructions but should be enough for your purposes.

Data movement
mov mem, reg/lit example mov [ebp+12],eax mov dword[esp+4],12 means store the right operand in the memory location on the left

mov reg, mem/reg/lit example mov ebx,1 mov ecx,[esi+ebp] mov eax,ebx means load the right operand into the register on the left

movss mem, reg example movss [ebp+12],xmm0 means store the right operand in the memory location on the left. The right operand is the bottom 32 bits of an xmm register.

movss reg, mem/reg example movss xmm1,[esi+ebp] movss xmm1,xxm2 means load the right operand into the register on the left, the left operand is the lower 32 bits of a xmm register and the data should be a 32 bit oat

movups
movups mem, reg example movss [ebp+12],xmm0 means store the right operand in the memory location on the left. The right operand is a 128 bit xmm register.

movss
movss reg, mem example movups xmm1,[esi+ebp] means load the right operand into the register on the left, the left operand is a 128 bit xmm register

push
push mem/reg/lit example push dword 10 push dword[esi+ebp+40] push ecx means push the operand on stack, pre-decrementing the esp register by 4

pop
pop mem/reg example pop dword[esi+ebp+40] pop ecx means the operand is assigned the value on the top of stack and the stack pointer is then incremented by 4

d
d mem example fld dword[esi+ebp+40] means the operand which is assumed to be a 32 bit oating point value is pushed on the fpu stack

ld
ld mem example fild dword[esi+ebp+40] means the 32bit integer operand is pushed on the fpu stack as a oating point number

fstp
fstp mem example fstp dword[esi+ebp+40] means the operand is assigned the 32bit oating point value on the fpu stack the fpu stack is then popped

stp
stp mem example fistp dword[esi+ebp+40] means the 32bit oating point value on the fpu stack is converted to an integer and stored in the operand, the fpu stack is then popped.

Arithmetic
Integer arithmetic instructions can be divided into 3 classes 1. Add, subtract, and, or, xor.These are treated absolutely regularly as two operand instructions as shown below in section ??. 2. Multiply, this comes in both 2 and 3 operand forms. 3. Divide and Modulus, these are irregular and make use of specic registers

Regular integer arithmetic


These take the form operation dest, src and mean dest:= dest operation src the following operation codes are allowed add, sub, and, or, xor

The table shows the allowed combinations of destination and source Operand combinations for regular arithmetic dest src register register register constant register memory memory register memory constant Examples add esp,5 sub eax, ebx

and [eax+12],ebp add dword[esi+edi],1 add esi,[edi]

Multiply
imul reg,reg/mem This is functionally the same as the regular 2 operand integer arithmetic instructions. Example imul ebx, dword [ebp-26] imul reg,reg,const This three operand form is particularly useful for computing array offsets. Example imul esi,eax,16

Divide/modulus
A single instruction is used for both division and modulus. idiv reg/mem The 64 bit value in edx:eax is divided by the operand, the quotient is placed in eax, and the remainder is placed in edx. Example idiv [ebp+64]

Floating point arithmetic


The oating point stack can be used to perform arithmetic in a postx manner. The following fpu opcodes operate on the top two items on the fpu stack: faddp st1, fsubp st1, stafdivp st1 fmulp st1 These perfrom an operation between the top of the fpu stack (st0) and st1, store the result in st1, then pop the stack so that st1 becomes the new top of stack. Bear in mind that the maximum depth of the fpu stack is 8. Operations use 80bit internal oating point.

Vector arithmetic
It is possible to perform parallel operations on vectors of 32 bit oats using the xmm registers. These instructions have the general format operationPS xmmreg,xmmreg For example mulps xmm0,xmm5 the sufx PS stands for Packed Single precison oats. In this case the 4 oats in xmm0 are multiplied by the corresponding oats in xmm5 and the result stored in xmm0.

The other useful vector arithmetic instructions in this context are: addps, subps, divps These instructions also exist in a memory to register form but for these to be used you have to guarantee that the operands are aligned on 16 byte memory boundaries. Since this is complicated to ensure, I suggest that you restrict yourself to the register to register forms of these instructions.

Scalar arithmetic
It is also possible to perform scalar arithmetic in the low order 32 bit words of the xmm registers. For instance, you can do all of the vector operations by using the subscript SS standing for Scalar Single precision after the operation thus: addss xmm2,xmm0 would add the bottom 32 bit oat in xmm0 to the bottom oat in xmm2 and leave the result in xmm2.

Conversion instructions

operation

dest

src

cvtsi2ss xmm register general register cvtsi2ss xmm register memory cvtss2si general register xmm register cvtss2si memory xmm register
If you are going to use these scalar instructions it is worth taking note of the conversion instructions cvtsi2ss and cvtss2si which convert signed doubleword integers to single precision oats and vice versa.

Examples cvtsi2ss xmm4, ebx cvtsi2ss xmm3, [ebp+20] cvtss2si eax, xmm0

Integer comparisons
Comparison instructions exist which will place the results of comparison in the ags. The cmp instruction compares two integers. Examples cmp eax, 12 cmp eax, ecx cmp ebx, [ebx+16]

Set
The result of the comparison is written to the ags and can be used either by a SET instruction or by a conditional jump instruction. For instance to test if the eax register was less then 10 we could write cmp eax,10 setl bl At the end of this the bl register will contain a boolean value of 1 if eax had been less than 10 and 0 if it had been greater than 10. The sufxes used by the SET instruction indicate which comparison is being tested. The sufxes that are most likely to be of use to you are L, G and E standing for <, >, and =.

fcomip st0,st1
The instruction fcomip compares the top two elements of the oating point stack, popping the top one from the stack and placing the result of the comparison in the cpu ags. It may be necessary to discard the next item on the fpu stack using an fincstp instruction which increments the oating point stack pointer.

cmpss
There are a family of comparison operations that work between scalar xmm registers. These leave an integer result in one of the registers thus: cmpltss xmm2,xmm4 would compare the oat in the bottom 32 bits of xmm2 with the corresponding oat in xmm4 and set xmm2 to all 1s if xmm2 was less than xmm4, otherwise it would set xmm2 to zero.

Scalar comparisons

instruction

means

cmpltss xmma,xmmb xmma<xmmb cmpeqss xmma,xmmb xmma=xmmb cmpnless xmma,xmmb xmma>xmmb

comiss
Comiss is an alternative technique for performing scalar comparisons it compares the contents of two xmm registers and returns the results in the cpu ags. Example comiss xmm1, xmm7

Branches
Branches can be unconditional and direct: jmp lab or uncoditional and indirect: jmp dword[ebp+10] or conditional on a condition code and direct: jl lab1 jg lab3 je lab4

Calls
Calls can be direct: call lab or indirect: call dword[ebp+10] in either case the current value of the eip register is pushed on the stack and the eip register loaded from the operand. Returns are perfomed using the ret instruction which pops the top of stack into the eip register.

S-ar putea să vă placă și