Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 2
In this chapter, we shall discuss the Assembly Language Program development tools, PC memory structure and Assembler directives.
Books to be Referred:
1. 2. 3. 4. 5. Microprocessors and Interfacing 2nd Edition, Douglas V Hall Intel Microprocessors 6th Edition, Barry Brey Peter Nortons DOS Guide IBM PC Assembly language programming Peter Abel Hardware and Software of Computers S. K. Bose
Programming Models:
Depending on the size of the memory the user program occupies, different types of assembly language models are defined.
TINY All data and code in one segment SMALL one data segment and one code segment MEDIUM one data segment and two or more code segments COMPACT one code segment and two or more data segments LARGE any number of data and code segments To designate a model, we use .MODEL directive.
The TPA holds the OS and other programs to control the system operation. The System area holds Video RAM, Video ROM, BIOS ROM etc,. The DOS controls the way the disk memory is organized and controlled. The BIOS is a collection of programs stored in ROM or Flash memory, to access I/O devices and internal features of the system. IO.SYS is a program that loads into the TPA from the disk whenever MS DOS is started. Device drivers are programs that control installable I/O devices. Windows uses a file called System.ini to load the drivers. The Command.Com program controls the operation of the computer from the keyboard. That is, it processes the DOS commands as they are keyed in. The free TPA area holds application programs as they are executed. The TPA also holds TSR programs that remain in memory in an inactive state until activated thro a hot key or thro a command.
2. ASSEMBLER:
An assembler is a system software (program) used to translate the assembly language mnemonics for instructions to the corresponding binary codes. An assembler makes two passes thro your source code. On the first pass, it determines the displacement of named data items, the offset of labels etc., and puts this information in a symbol table. On the second pass, the assembler produces the binary code for each instruction and inserts the offsets, etc., that is calculated during the first pass. The assembler checks for the correct syntax in the assembly instructions and provides appropriate warning and error messages. You have to open your file again using the editor to correct the errors and reassemble it using assembler. Unless all the errors are corrected, the program cannot be executed in the next step. The assembler generates two files from the source file; the first file, called the object file having an extension .obj which contains the binary codes for instructions and information about the addresses of the instructions. The second file is called list file with an extension .lst. This file contains the assembly language statements, the binary codes for each instruction, and the offset for each inst. It also indicates any syntax errors or typing errors in the source program. Note: The assembler generates only offsets (i.e., effective addresses), not absolute physical addresses.
3. LINKER:
Its a program used to join several object files into one large object file. For large programs, usually several modules are written and each module is tested and debugged. When all the modules work, their object modules can be linked together to form a complete functioning program. The LINK program must be run on .obj file. The linker produces a link file which contains the binary codes for all the combined modules. The linker also produces a link map file which contains the address information about the linked files. The linker assigns only relative addresses starting from zero, so that this can be put anywhere in physical primary memory later (by another program called locator or loader). Therefore, this file is called relocatable. The linker produces link files with .exe extension. Object modules of useful programs (like square root, factorial etc) can be kept in a library, and linked to other programs when needed.
4. LOADER (LOCATOR):
Its a program used to assign absolute physical addresses to the segments in the .exe file, in the memory. IBM PC DOS environment comes with EXE2BIN loader program. The .exe file is converted into .bin file. The physical addresses are assigned at run time by the loader. So, assembler does not know about the segment starting addresses at the time program being assembled.
5. DEBUGGER:
If your program requires no external hardware, you can use a program called debugger to load and run the .exe file. A debugger is a program which allows you to load your object code program into system memory, execute the program and troubleshoot or debug it. The debugger also allows you to look at the contents of registers and memory locations after you run your program. The debugger allows you to change the contents of registers & memory locations and rerun the program. Also, if facilitates to set up breakpoints in your program, single step feature, and other easy-to-use features. If you are using a prototype SDK 86 board, the debugger is usually called monitor program. We would be using the development tool MASM 6.0 or higher version from Microsoft Inc. MASM stands for Microsoft Macro Assembler. Another assembler TASM (Turbo Assembler) from Borland International is also available.
B. Assembler Directives
Assembler Directives, also called as pseudo operations are the commands issued to the assembler for many system related tasks such as: variable labeling, memory assignment, reserving memory storage, identifying beginning & end of the program etc,. These directives can be used with Intel macro assembler (ASM80), Borland Turbo Assembler (TASM) and IBM macro assembler (MASM). Depending on the type of functions performed, assembler directives are classified.
Code SEGMENT START: MOV AX,BX Code ENDS b) ASSUME directive: It is used to inform the assembler the name of the logical segment it should use for a specified physical segment. That is, it tells about how to link the logical segments to the actual segment definition. An 8086 program may have several logical segments. ASSUME tells the assembler what names have been chosen for Code, Data, Extra and Stack segments. Informs the assembler that the register CS is to be initialized with the address allotted by the loader to the label, say CODE. DS is similarly initialized with the address of label, sat DATA, etc,. Ex: DATA_HERE SEGMENT
N1 DB 10, 0AH, 20 DATA_HERE ENDS CODE_HERE SEGMENT ASSUME CS: CODE_HERE, DS:DATA_HERE ... MOV AX, 2222 Prog. instructions CODE_HERE ENDS Here, for example, the statement ASSUME CS: CODE_HERE, tells the assembler that the instructions for the program are available in a logical segment named CODE_HERE. Note that, the ASSUME directive does not load the segment starting address into the corresponding segment registers of the CPU. c) ORG directive: As the assembler assembles the program statements or data declarations, it uses a location counter to keep track of how many bytes it is from the start of a segment at any time. The location counter is automatically set to 0000h when the assembler starts reading a segment. The ORG directive allows the user to set the location counter to any value desired. For example, within the data segment, if you write ORG 100h, then the first data item declared there will be available at an offset of 100h from the starting of the data segment. Changes the starting offset address of the data in the data segment Ex: ORG 100H d) EXIT Used to Exit to DOS from MASM environment (can be used before END directive) e) END directive: It is put after the last statement of a program to tell the assembler that this is the end of program module. There should be only one END directive in your program.
a) DB, DW, DD: (Define Byte, Define Word, Define Double Word) These directives are used to assign names to variables in your programs. DB directive is used to declare a byte-type variable and to set aside one or more storage locations of type byte in memory. DW directive is used to declare a variable of type word and to reserve one or more storage locations of type word in memory. DD directive is used to declare a variable of type Double word (32-bits or 4 bytes) and to reserve one or more storage locations of type double word in memory.
NUM DB 23H ; name a memory locn NUM and initialize with data 23h LOC DB ? ; Reserve a memory locn LOC, but value is not initialized. N1 DB 11, 22, 33 ; reserve 3 locations starting from N1 and put the values msg DB HELLO ; store ASCII values for the string in mem starting from msg SUM DW ? ; reserve two memory locations from SUM and uninitialize Note: To reserve or assign large number of locations, use DUP operator. Ex:
Ex: DATA1 db 50d Dup (?) ; reserve 50 locations starting from DATA1 in memory LOCN dw 20d Dup (0) ; reserve 40 locations and all are initialized to value 0 Note: If you want to assign a specific data type to a variable, then PTR attribute operator is used. You can use either BYTE PTR (for byte operations) or WORD PTR (for word operations) operators. Ex: NUM1 dw 0A345h ; any value (number) must start with a digit (0 to 9) MOV AL, NUM1 ; illegal since NUM1 is of type DW. MOV AL, BYTE PTR NUM1 ; AL will be loaded with value 45h. b) DQ, DT (Define Quad word, Define Ten bytes) The operation is similar to DB, DD or DW. Used to declare and reserve memory locations and to initialize the values. c) EQU (Equate) This directive used to give a name to some value or symbol. Each time the assembler finds the given name in the program, it replaces the name with the value or symbol. Equates a numeric, ASCII or label to another label. Ex: Data SEGMENT Num1 EQU 50H Num2 EQU 66H Data ENDS Numeric value 50H and 66H are assigned to Num1 and Num2
3. Attribute Operators:
a) OFFSET: Its an operator which tells the assembler to determine the offset or displacement of a named data item or variable from the start of the segment which contains it. This is used to load the offset of a variable into a register, so that the variable can be accessed using one of the indexed modes. Ex: if you declared a variable as Num db 20h in the data segment, then, in the code segment you can write mov bx, OFFSET Num ; bx = address of NUM. mov al, [bx] ; now al =20h Note: If you write, mov bx, Num ; bx = 0020h ,i.e., the contents of Num will be loaded. But if you use OFFSET operator, you get the address on the variable. b) LENGTH: Its an operator which tells the assembler to determine the number of elements in some named data item such as string or array, That is, it returns the number of units assigned to a variable. Ex: Suppose you declare SUM db 50 Dup (?) in the data segment. Then, in the code segment, if you write the instruction, mov cx, LENGTH Sum ; then cx = 50d. Similarly, if you declare, Array dw 50 dup (?), then mov cx, LENGTH Array, will again return cx = 50d. That is, LENGTH simply tells the number of elements available irrespective of the data type.
c) SIZE: Its an operator which tells the assembler to determine the number of bytes available in the given string or array. Considering the above examples, SUM db 50 Dup (?) mov cx, SIZE Sum ; then cx = 50d. (This is same as LENGTH operator) But if , Array dw 50 dup (?) is declared, Then, mov cx, SIZE Array ; cx = 100d (since Array is of type DW, there are 100 bytes available) d) PTR (Pointer): Its used to assign a specific type to a variable or to a label. For example, if you write an instructions like MOV bx, 2000h ; point bx reg to address 2000 INC [bx] ; the assembler will not know whether to increment the byte at 2000 or word at 2000 & 2001h. In such cases, PTR operator is used, and you re write the instruction as: INC BYTE PTR [bx] or INC WORD PTR [bx]. The PTR operator can also be used to override the declared type of a variable. For example, if you declare Num dw 1223h, and if you write in your program the instruction: mov al, BYTE PTR Num ; then, only one 8-bit data (23h) will be accessed and loaded into al register. PTR operator is also useful for indirect Jump instructions. If you want a near jump, you can specify as Jmp WORD PTR [bx], provided bx has the target address. e) TYPE: The TYPE operator tells the assembler to determine the type of a specified variable. The assembler determines the number of bytes associated with that variable. For a byte-type variable, the assembler will give a value 1; for word-type variable, the value is 2. It can be used for auto increment mode of operation. Ex: if you write an instruction like, Add si, TYPE Array SI will be added with value 1 if Array is defined with DB directive, with a value 2 if if Array is defined with DW directive and so on.
4. Miscellaneous Directives:
a) PROC and ENDP directives: The PROC directive is used to identify the start of a procedure. PROC is preceded by a procedure name given by the user. After the PROC directive, the label NEAR or FAR is used to identify the type of the procedure. If the procedure is written within the same code segment where the main program resides, its called a NEAR procedure. If the procedure is written in a different code segment, then its called FAR procedure. The ENDP directive is used to indicate the end of the procedure. The procedure is called from the main program using CALL instruction. PROC & ENDP: indicate the start and end of the procedure. They require a label to indicate the name of the procedure. NEAR: the procedure resides in the same code segment. (Local)
FAR: resides at any location in the memory. Ex: Fact PROC Near ; definition of a procedure ... ; body of the procedure Fact ENDP Ex: Add PROC NEAR ADD AX,BX MOV CX,AX RET Add ENDP PROC directive stores the contents of the register in the stack. b) EVEN (Align on Even Memory address) As discussed, the assembler uses a location counter to keep track of the statements in the program. The EVEN directive tells the assembler to increment the location counter to the next even address, if it is not already at an even address. That is, the addresses of all variables declared and all instructions in the program are aligned at even addresses only. This is done to increase the speed of accessing the memory. NOP instruction is inserted at the appropriate places to facilitate this. c) GROUP (Group the related segments) It is used to inform the assembler to form logical group of all segments mentioned after the word GROUP. That is, all such segments and labels can be addressed using one name and same group segment base address. Ex: Main GROUP Code, Data, Stack Now, it directs the linker to prepare an EXE file such that Code, Data and Stack segment must lie within 64 KB memory segment named Main. So, you can use the statements like: ASSUME cs:Main, ds: Main, ss: Main d) LABEL During the assembly process, whenever the assembler comes across the LABEL directive, it assigns the declared label with the current contents of location counter. The LABEL directive must be followed by a term which specifies the type you want to associate with that name. If the label is used as the destination for a jump or call, then the label must be specified as type NEAR or FAR. If the label is referencing a data item, then the label must be specified as type byte, word or double word. e) SEG (Segment of a label) The SEG operator is used to determine the segment address of the label, variable or procedure. Ex: mov bx, SEG Array ; copy the segment address of label Array and mov ds, bx ; store it in DS register. f) EXTRN (External) The EXTRN directive is used to tell the assembler that the names or labels following this directive are in some other assembly module. For example, if you want to call a procedure assembled at a different time in another program module, you must in your
program, tell the assembler that the procedure you are using is external. The assembler will then put information in the object code file so that the linker can connect the two modules together. For a reference to a label, you must specify whether the label is Near or Far. Ex: In your program, if you write EXTRN Division Far, it tells the assembler Division is a label of type far in another assembly module. Ex: If you want to call a Factorial procedure of Module1 from Module2 it must be declared as PUBLIC in Module1. Note: Names or labels referred to as external in one module must be declared public with the PUBLIC directive, in the module in which they are actually defined. g) PUBLIC Large programs are generally written as several separate modules. Each module is individually assembled, tested and debugged. When all the modules are working correctly, their object code files are linked together to form the complete program. In order for the modules to link together correctly, any variable name or label referred to in other modules must be declared public in the module in which it is defined. The PUBLIC directive is used to tell the assembler that a specified name or label will be accessed from some other modules. For example, if you write PUBLIC Divisor, Dividend - then these two variables are available to other assembly modules.
Note: The PUBLIC and EXTRN directives are used within SEGMENT ENDS brackets. Ex: Mod1 SEGMENT Mod2 SEGMENT PUBLIC Fact Far EXTRN Fact Far Mod1 ENDS Mod2 ENDS Note: The linker will verify that every identifier appearing in an EXTRN statement is matched by a PUBLIC statement.
MOV NUM1, AX ; Store low word of result into memory MOV AH, 4CH INT 21H END Start ; END of program /// The program may be written in a different form as below. .CODE .STARTUP ; indicates start of code segment. Also, ; loads DS reg with base address of data segment MOV AX, NUM MOV BX, AX MUL BX MOV NUM1, AX MOV AX, 4C00h INT 21H .EXIT END 2. Program without using .MODEL directive ; Program to illustrate the use of SEGMENT, ENDS and ASSUME directives ; Define all the variables here
DATA_HERE SEGMENT NUM DB 20, 20H Body of the logical segment Variable NUM1 DB DUP (?) to define all the variables name DATA_ HERE ENDS CODE_HERE SEGMENT Directive ; Write the Program code here Main : MOV AX, DATA_SEG ; Segment register cant be MOV DS, AX ; loaded with an immediate operand ; use any General purpose register MOV AX, NUM MOV BX, AX MUL BX ; (DX) (AX) (AX) * (BX) MOV NUM1, DX ; Move upper word of product into memory MOV AH, 4CH ; DOS INT 21H INT 21H function to return to DOS CODE_HERE ENDS END Main ; Main label is optional Different ways of initializing Segment base address to DS register: (1) MOV AX, @ DATA MOV DS, AX (3) MOV BX, _DATA MOV DS, BX (2) MOV AX, SEG_DATA MOV DS,AX (4) MOV CX, SEG NUM MOV DS, CX
Summary:
Assembler is a program that accepts an assembly language program as input and converts it into an object module and prepares for loading the program into memory for execution. Loader (linker) further converts the object module prepared by the assembler into executable form, by linking it with other object modules and library modules. The final executable map of the assembly language program is prepared by the loader at the time of loading into the primary memory for actual execution. The assembler prepares the relocation and linkages information (subroutine, ISR) for loader. The operating system that actually has the control of the memory, passes the memory address at which the program is to be loaded for execution and the map of the available memory to the loader. Based on this information and the information generated by the assembler, the loader generates an executable map of the program and further physically loads it into the memory and transfers control to for execution. Thus the basic task of an assembler is to generate the object module and prepare the loading and linking information.
Operation of an Assembler:
Assembling a program proceeds statement by statement sequentially. The first phase of assembling is to analyze the program to be converted. This phase is called Pass1 defines and records the symbols, pseudo operands and directives. It also analyses the segments used by the program types and labels and their memory requirements. The second phase l (Pass2) looks for the addresses and data assigned to the labels. It also finds out codes of the instructions from the instruction machine, code database and the program data. It processes the pseudo operands and directives. It is the task of the assembler designer to select the suitable strings for using them as directives, pseudo operands or reserved words and decides syntax.