Sunteți pe pagina 1din 12

Assembly language

See the terminology section below for information regarding inconsistent use of the terms assembly and assembler.

Assembly language uses a mnemonic to represent each


low-level machine instruction or operation. Typical operations require one or more operands in order to form
a complete instruction, and most assemblers can therefore take labels, symbols and expressions as operands to
represent addresses and other constants, freeing the programmer from tedious manual calculations. Macro assemblers include a macroinstruction facility so that (parameterized) assembly language text can be represented
by a name, and that name can be used to insert the expanded text into other code. Many assemblers oer additional mechanisms to facilitate program development,
to control the assembly process, and to aid debugging.

1 Key concepts
1.1 Assembler
An assembler is a program which creates object code
by translating combinations of mnemonics and syntax
for operations and addressing modes into their numerical equivalents. This representation typically includes an
operation code ("opcode") as well as other control bits.[1]
The assembler also calculates constant expressions and
resolves symbolic names for memory locations and other
entities.[2] The use of symbolic references is a key feature
of assemblers, saving tedious calculations and manual address updates after program modications. Most assemblers also include macro facilities for performing textual
substitution e.g., to generate common short sequences
of instructions as inline, instead of called subroutines.
Some assemblers may also be able to perform some simple types of instruction set-specic optimizations. One
concrete example of this may be the ubiquitous x86 assemblers from various vendors. Most of them are able
to perform jump-instruction replacements (long jumps
replaced by short or relative jumps) in any number of
passes, on request. Others may even do simple rearrangement or insertion of instructions, such as some assemblers
for RISC architectures that can help optimize a sensible
instruction scheduling to exploit the CPU pipeline as efciently as possible.

Motorola MC6800 Assembly listing, showing original assembly


language and the assembled form

An assembly language is a low-level programming language for a computer, or other programmable device, in
which there is a very strong (generally one-to-one) correspondence between the language and the architectures
machine code instructions. Each assembly language is
specic to a particular computer architecture, in contrast to most high-level programming languages, which
are generally portable across multiple architectures, but Like early programming languages such as Fortran,
require interpreting or compiling.
Algol, Cobol and Lisp, assemblers have been available
Assembly language is converted into executable machine since the 1950s and the rst generations of text based
code by a utility program referred to as an assembler; the computer interfaces. However, assemblers came rst as
conversion process is referred to as assembly, or assem- they are far simpler to write than compilers for high-level
languages. This is because each mnemonic along with the
bling the code.
1

1 KEY CONCEPTS

addressing modes and operands of an instruction translates rather directly into the numeric representations of
that particular instruction, without much context or analysis. There have also been several classes of translators and semi automatic code generators with properties
similar to both assembly and high level languages, with
speedcode as perhaps one of the better known examples.
There may be several assemblers with dierent syntax
for a particular CPU or instruction set architecture. For
instance, an instruction to add memory data to a register in a x86-family processor might be add eax,[ebx], in
original Intel syntax, whereas this would be written addl
(%ebx),%eax in the AT&T syntax used by the GNU Assembler. Despite dierent appearances, dierent syntactic forms generally generate the same numeric machine
code, see further below. A single assembler may also
have dierent modes in order to support variations in syntactic forms as well as their exact semantic interpretations
(such as FASM-syntax, TASM-syntax, ideal mode etc., in
the special case of x86 assembly programming).
1.1.1

Number of passes

the program load if the assembler directly produces executable code) faster.[3]
1.1.2 High-level assemblers
More sophisticated high-level assemblers provide language abstractions such as:
Advanced control structures
High-level procedure/function declarations and invocations
High-level abstract data types, including structures/records, unions, classes, and sets
Sophisticated macro processing (although available
on ordinary assemblers since the late 1950s for
IBM 700 series and since the 1960s for IBM/360,
amongst other machines)
Object-oriented programming features such as
classes, objects, abstraction, polymorphism, and
inheritance[4]

There are two types of assemblers based on how many


passes through the source are needed to produce the ex- See Language design below for more details.
ecutable program.
One-pass assemblers go through the source code
once. Any symbol used before it is dened will
require errata at the end of the object code (or,
at least, no earlier than the point where the symbol is dened) telling the linker or the loader to go
back and overwrite a placeholder which had been
left where the as yet undened symbol was used.

1.2 Assembly language

A program written in assembly language consists of


a series of (mnemonic) processor instructions and
meta-statements (known variously as directives, pseudoinstructions and pseudo-ops), comments and data. Assembly language instructions usually consist of an opcode mnemonic followed by a list of data, arguments or
parameters.[5] These are translated by an assembler into
Multi-pass assemblers create a table with all symbols
machine language instructions that can be loaded into
and their values in the rst passes, then use the table
memory and executed.
in later passes to generate code.
For example, the instruction below tells an x86/IAIn both cases, the assembler must be able to determine the 32 processor to move an immediate 8-bit value into a
size of each instruction on the initial passes in order to cal- register. The binary code for this instruction is 10110
culate the addresses of subsequent symbols. This means followed by a 3-bit identier for which register to use.
that if the size of an operation referring to an operand de- The identier for the AL register is 000, so the followned later depends on the type or distance of the operand, ing machine code loads the AL register with the data
[6]
the assembler will make a pessimistic estimate when rst 01100001.
encountering the operation, and if necessary pad it with 10110000 01100001
one or more "no-operation" instructions in a later pass or
the errata. In an assembler with peephole optimization, This binary computer code can be made more humanaddresses may be recalculated between passes to allow readable by expressing it in hexadecimal as follows.
replacing pessimistic code with code tailored to the exact B0 61
distance from the target.
Here, B0 means 'Move a copy of the following value into
The original reason for the use of one-pass assemblers
was speed of assembly often a second pass would require rewinding and rereading a tape or rereading a deck
of cards. With modern computers this has ceased to be
an issue. The advantage of the multi-pass assembler is
that the absence of errata makes the linking process (or

AL', and 61 is a hexadecimal representation of the value


01100001, which is 97 in decimal. Intel assembly language provides the mnemonic MOV (an abbreviation of
move) for instructions such as this, so the machine code
above can be written as follows in assembly language,
complete with an explanatory comment if required, after

3
the semicolon. This is much easier to read and to remem- so dier; the corresponding assembly languages reect
ber.
these dierences.
MOV AL, 61h ; Load AL with 97 decimal (61 hex)

Multiple sets of mnemonics or assembly-language syntax


may exist for a single instruction set, typically instantiated
in dierent assembler programs. In these cases, the most
In some assembly languages the same mnemonic such as
MOV may be used for a family of related instructions popular one is usually that supplied by the manufacturer
and used in its documentation.
for loading, copying and moving data, whether these are
immediate values, values in registers, or memory locations pointed to by values in registers. Other assemblers
may use separate opcode mnemonics such as L for move 2 Language design
memory to register, ST for move register to memory,
LR for move register to register, MVI for move im2.1 Basic elements
mediate operand to memory, etc.
The Intel opcode 10110000 (B0) copies an 8-bit value
into the AL register, while 10110001 (B1) moves it into
CL and 10110010 (B2) does so into DL. Assembly language examples for these follow.[6]

There is a large degree of diversity in the way the authors


of assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything
other than a machine mnemonic or extended mnemonic
MOV AL, 1h ; Load AL with immediate value 1 MOV as a pseudo-operation (pseudo-op). A typical assembly
CL, 2h ; Load CL with immediate value 2 MOV DL, 3h language consists of 3 types of instruction statements that
are used to dene program operations:
; Load DL with immediate value 3
The syntax of MOV can also be more complex as the
following examples show.[7]
MOV EAX, [EBX] ; Move the 4 bytes in memory at the
address contained in EBX into EAX MOV [ESI+EAX],
CL ; Move the contents of CL into the byte at address
ESI+EAX
In each case, the MOV mnemonic is translated directly
into an opcode in the ranges 88-8E, A0-A3, B0-B8, C6
or C7 by an assembler, and the programmer does not have
to know or remember which.[6]

Opcode mnemonics
Data denitions
Assembly directives
2.1.1 Opcode mnemonics and extended mnemonics
Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages.
Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode),
and there is at least one opcode mnemonic dened for
each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more
operands. Most instructions refer to a single value, or a
pair of values. Operands can be immediate (value coded
in the instruction itself), registers specied in the instruction or implied, or the addresses of data located elsewhere
in storage. This is determined by the underlying processor architecture: the assembler merely reects how this
architecture works. Extended mnemonics are often used
to specify a combination of an opcode with a specic
operand, e.g., the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP
(NO OPeration do nothing for one step) for BC with
a mask of 0.

Transforming assembly language into machine code is the


job of an assembler, and the reverse can at least partially
be achieved by a disassembler. Unlike high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language
instructions. However, in some cases, an assembler may
provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a
machine that lacks a branch if greater or equal instruction, an assembler may provide a pseudoinstruction that
expands to the machines set if less than and branch
if zero (on the result of the set instruction)". Most fullfeatured assemblers also provide a rich macro language
(discussed below) which is used by vendors and programmers to generate more complex code and data sequences. Extended mnemonics are often used to support specialEach computer architecture has its own machine lan- ized uses of instructions, often for purposes not obvious
guage. Computers dier in the number and type of op- from the instruction name. For example, many CPUs
erations they support, in the dierent sizes and numbers do not have an explicit NOP instruction, but do have inof registers, and in the representations of data in storage. structions that can be used for the purpose. In 8086 CPUs
While most general-purpose computers are able to carry the instruction xchg ax,ax is used for nop, with nop being
out essentially the same functionality, the ways they do a pseudo-opcode to encode the instruction xchg ax,ax.
Some disassemblers recognize this and will decode the

2 LANGUAGE DESIGN

xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended
mnemonics NOP and NOPR for BC and BCR with zero
masks. For the SPARC architecture, these are known as
synthetic instructions.[8]

structures, and assign labels that refer to literal values or


the result of simple computations performed by the assembler. Labels can also be used to initialize constants
and variables with relocatable addresses.

Assembly languages, like most other computer languages,


allow comments to be added to program source code that
will be ignored during assembly. Judicious commenting is essential in assembly language programs, as the
meaning and purpose of a sequence of binary machine
instructions can be dicult to determine. The raw (uncommented) assembly language generated by compilers
Mnemonics are arbitrary symbols; in 1985 the IEEE pub- or disassemblers is quite dicult to read when changes
lished Standard 694 for a uniform set of mnemonics to be must be made.
used by all assemblers. The standard has since been withdrawn.
Some assemblers also support simple built-in macroinstructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b.[9] These are sometimes known as pseudoopcodes.

2.2 Macros

2.1.2

Data directives

There are instructions used to dene data elements to


hold data and variables. They dene the type of data,
the length and the alignment of data. These instructions
can also dene whether the data is available to outside
programs (programs assembled separately) or only to the
program in which the data section is dened. Some assemblers classify these as pseudo-ops.
2.1.3

Assembly directives

Assembly directives, also called pseudo-opcodes,


pseudo-operations or pseudo-ops, are instructions that
are executed by an assembler at assembly time, not by a
CPU at run time. The names of pseudo-ops often start
with a dot to distinguish them from machine instructions.
Pseudo-ops can make the assembly of the program
dependent on parameters input by a programmer, so that
one program can be assembled dierent ways, perhaps
for dierent applications. Or, a pseudo-op can be used
to manipulate presentation of a program to make it
easier to read and maintain. Another common use of
pseudo-ops is to reserve storage areas for run-time data
and optionally initialize their contents to known values.

Many assemblers support predened macros, and others support programmer-dened (and repeatedly redenable) macros involving sequences of text lines in
which variables and constants are embedded. This sequence of text lines may include opcodes or directives.
Once a macro has been dened its name may be used
in place of a mnemonic. When the assembler processes
such a statement, it replaces the statement with the text
lines associated with that macro, then processes them
as if they existed in the source code le (including, in
some assemblers, expansion of any macros existing in
the replacement text). Macros in this sense date to IBM
autocoders of the 1950s.
This denition of macro is slightly dierent from the
use of the term in other contexts, like the C programming
language. C macros created through the #dene directive are typically just one line, or a few lines at most. Assembler macro instructions can be lengthy programs by
themselves, executed by interpretation by the assembler
during assembly.

Since macros can have 'short' names but expand to several


or indeed many lines of code, they can be used to make
assembly language programs appear to be far shorter, requiring fewer lines of source code, as with higher level
languages. They can also be used to add higher levels of
structure to assembly programs, optionally introduce emSymbolic assemblers let programmers associate arbitrary bedded debugging code via parameters and other similar
names (labels or symbols) with memory locations and var- features.
ious constants. Usually, every constant and variable is Macro assemblers often allow macros to take parameters.
given a name so instructions can reference those locations Some assemblers include quite sophisticated macro lanby name, thus promoting self-documenting code. In ex- guages, incorporating such high-level language elements
ecutable code, the name of each subroutine is associated as optional parameters, symbolic variables, conditionals,
with its entry point, so any calls to a subroutine can use its string manipulation, and arithmetic operations, all usname. Inside subroutines, GOTO destinations are given able during the execution of a given macro, and allowing
labels. Some assemblers support local symbols which are macros to save context or exchange information. Thus
lexically distinct from normal symbols (e.g., the use of a macro might generate numerous assembly language in10$" as a GOTO destination).
structions or data denitions, based on the macro arguSome assemblers, such as NASM provide exible sym- ments. This could be used to generate record-style data
bol management, letting programmers manage dierent structures or unrolled loops, for example, or could gennamespaces, automatically calculate osets within data erate entire algorithms based on complex parameters. An

2.3

Support for structured programming

organization using assembly language that has been heavily extended using such a macro suite can be considered
to be working in a higher-level language, since such programmers are not working with a computers lowest-level
conceptual elements.

2.3 Support for structured programming


Some assemblers have incorporated structured programming elements to encode execution ow. The earliest example of this approach was in the Concept-14 macro set,
originally proposed by Dr. H.D. Mills (March 1970),
and implemented by Marvin Kessler at IBMs Federal
Systems Division, which extended the S/360 macro assembler with IF/ELSE/ENDIF and similar control ow
blocks.[11] This was a way to reduce or eliminate the use
of GOTO operations in assembly code, one of the main
factors causing spaghetti code in assembly language. This
approach was widely accepted in the early '80s (the latter
days of large-scale assembly language use).

Macros were used to customize large scale software systems for specic customers in the mainframe era and
were also used by customer personnel to satisfy their employers needs by making specic versions of manufacturer operating systems. This was done, for example,
by systems programmers working with IBM's Conversational Monitor System / Virtual Machine (VM/CMS)
and with IBMs real time transaction processing addons, Customer Information Control System CICS, and
ACP/TPF, the airline/nancial system that began in the A curious design was A-natural, a stream-oriented as1970s and still runs many large computer reservations sembler for 8080/Z80 processors from Whitesmiths Ltd.
systems (CRS) and credit card systems today.
(developers of the Unix-like Idris operating system, and
what
was reported to be the rst commercial C compiler).
It was also possible to use solely the macro processing
The
language
was classied as an assembler, because
abilities of an assembler to generate code written in comit
worked
with
raw machine elements such as opcodes,
pletely dierent languages, for example, to generate a
registers,
and
memory
references; but it incorporated an
version of a program in COBOL using a pure macro asexpression
syntax
to
indicate
execution order. Parenthesembler program containing lines of COBOL code inside
ses
and
other
special
symbols,
along with block-oriented
assembly time operators instructing the assembler to genstructured
programming
constructs,
controlled the seerate arbitrary code.
quence of the generated instructions. A-natural was built
This was because, as was realized in the 1960s, the con- as the object language of a C compiler, rather than for
cept of macro processing is independent of the concept hand-coding, but its logical syntax won some fans.
of assembly, the former being in modern terms more
word processing, text processing, than generating object There has been little apparent demand for more sophistidecline of large-scale assemcode. The concept of macro processing appeared, and cated assemblers since the[12]
bly
language
development.
In spite of that, they are still
appears, in the C programming language, which supports
being
developed
and
applied
in
cases where resource conpreprocessor instructions to set variables, and make
straints
or
peculiarities
in
the
target
systems architecture
conditional tests on their values. Note that unlike cerprevent
the
eective
use
of
higher-level
languages.[13]
tain previous macro processors inside assemblers, the C
preprocessor was not Turing-complete because it lacked
the ability to either loop or go to, the latter allowing
programs to loop.
3 Use of assembly language
Despite the power of macro processing, it fell into disuse in many high level languages (major exceptions being 3.1 Historical perspective
C/C++ and PL/I) while remaining a perennial for assemblers.
Assembly languages date to the introduction of the
Macro parameter substitution is strictly by name: at stored-program computer. The Electronic Delay Storage
macro processing time, the value of a parameter is tex- Automatic Calculator (EDSAC) (1949) had an assembler
tually substituted for its name. The most famous class of called initial orders featuring one-letter mnemonics.[14]
bugs resulting was the use of a parameter that itself was an Nathaniel Rochester wrote an assembler for an IBM 701
expression and not a simple name when the macro writer (1954). SOAP (Symbolic Optimal Assembly Program)
expected a name. In the macro: foo: macro a load a*b the (1955) was an assembly language for the IBM 650 comintention was that the caller would provide the name of a puter written by Stan Poley.[15]
variable, and the global variable or constant b would be
Assembly languages eliminated much of the errorused to multiply a. If foo is called with the parameter
prone and time-consuming rst-generation programming
a-c, the macro expansion of load a-c*b occurs. To avoid
needed with the earliest computers, freeing programmers
any possible ambiguity, users of macro processors can
from tedium such as remembering numeric codes and calparenthesize formal parameters inside macro denitions,
culating addresses. They were once widely used for all
or callers can parenthesize the input parameters.[10]
sorts of programming. However, by the 1980s (1990s on
microcomputers), their use had largely been supplanted
by higher-level languages, in the search for improved
programming productivity. Today assembly language is

6
still used for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, lowlevel embedded systems, and real-time systems.
Historically, numerous programs have been written entirely in assembly language. Operating systems were entirely written in assembly language until the introduction of the Burroughs MCP (1961), which was written
in Executive Systems Problem Oriented Language (ESPOL), an Algol dialect. Many commercial applications
were written in assembly language as well, including a
large amount of the IBM mainframe software written by
large corporations. COBOL, FORTRAN and some PL/I
eventually displaced much of this work, although a number of large organizations retained assembly-language application infrastructures well into the 1990s.

3 USE OF ASSEMBLY LANGUAGE


strings. It also allowed address expressions which could
be combined with addition, subtraction, multiplication,
division, logical AND, logical OR, and exponentiation
operators.[17]

3.2 Current usage


There have always been debates over the usefulness and
performance of assembly language relative to high-level
languages. Assembly language has specic niche uses
where it is important; see below. Assembler can be
used to optimize for speed or optimize for size. In the
case of speed optimization, modern optimizing compilers
are claimed[18] to render high-level languages into code
that can run as fast as hand-written assembly, despite the
counter-examples that can be found.[19][20][21] The complexity of modern processors and memory sub-systems
makes eective optimization increasingly dicult for
compilers, as well as assembly programmers.[22][23]
Moreover increasing processor performance has meant
that most CPUs sit idle most of the time,[24] with delays
caused by predictable bottlenecks such as cache misses,
I/O operations and paging. This has made raw code execution speed a non-issue for many programmers.

Most early microcomputers relied on hand-coded assembly language, including most operating systems and large
applications. This was because these systems had severe
resource constraints, imposed idiosyncratic memory and
display architectures, and provided limited, buggy system services. Perhaps more important was the lack of
rst-class high-level language compilers suitable for microcomputer use. A psychological factor may have also
played a role: the rst generation of microcomputer pro- There are some situations in which developers might
grammers retained a hobbyist, wires and pliers attitude. choose to use assembly language:
In a more commercial context, the biggest reasons for us A stand-alone executable of compact size is required
ing assembly language were minimal bloat (size), minithat must execute without recourse to the run-time
mal overhead, greater speed, and reliability.
components or libraries associated with a high-level
Typical examples of large assembly language programs
language; this is perhaps the most common situafrom this time are IBM PC DOS operating systems and
tion. For example, rmware for telephones, autoearly applications such as the spreadsheet program Lotus
mobile fuel and ignition systems, air-conditioning
1-2-3. Even into the 1990s, most console video games
control systems, security systems, and sensors.
were written in assembly, including most games for the
Mega Drive/Genesis and the Super Nintendo Entertain Code that must interact directly with the hardware,
ment System. According to some industry insiders, the
for example in device drivers and interrupt handlers.
assembly language was the best computer language to use
Programs that need to use processor-specic into get the best performance out of the Sega Saturn, a constructions not implemented in a compiler. A comsole that was notoriously challenging to develop and pro[16]
mon example is the bitwise rotation instruction at
gram games for.
The arcade game NBA Jam (1993)
the core of many encryption algorithms.
is another example.
Assembly language has long been the primary development language for many popular home computers of the
1980s and 1990s (such as the Sinclair ZX Spectrum,
Commodore 64, Commodore Amiga, and Atari ST). This
was in large part because interpreted BASIC dialects on
these systems oered insucient execution speed, as
well as insucient facilities to take full advantage of the
available hardware on these systems. Some systems even
have an integrated development environment (IDE) with
highly advanced debugging and macro facilities.
The Assembler for the VIC-20 was written by Don French
and published by French Silk. At 1,639 bytes in length,
its author believes it is the smallest symbolic assembler
ever written. The assembler supported the usual symbolic
addressing and the denition of character strings or hex

Programs that create vectorized functions for programs in higher-level languages such as C. In the
higher-level language this is sometimes aided by
compiler intrinsic functions which map directly to
SIMD mnemonics, but nevertheless result in a oneto-one assembly conversion specic for the given
vector processor.
Programs requiring extreme optimization, for example an inner loop in a processor-intensive algorithm. Game programmers take advantage of the
abilities of hardware features in systems, enabling
games to run faster. Also large scientic simulations
require highly optimized algorithms, e.g. linear algebra with BLAS[19][25] or discrete cosine transformation (e.g. SIMD assembly version from x264[26] )

3.3

Typical applications

Situations where no high-level language exists, on a


new or specialized processor, for example.
Programs that need precise timing such as
real-time programs such as simulations, ight
navigation systems, and medical equipment.
For example, in a y-by-wire system, telemetry must be interpreted and acted upon within
strict time constraints. Such systems must
eliminate sources of unpredictable delays,
which may be created by (some) interpreted
languages, automatic garbage collection, paging operations, or preemptive multitasking.
However, some higher-level languages incorporate run-time components and operating
system interfaces that can introduce such delays. Choosing assembly or lower-level languages for such systems gives programmers
greater visibility and control over processing
details.

7
programmers today regularly work with assembly language as a tool, the underlying concepts remain very important. Such fundamental topics as binary arithmetic,
memory allocation, stack processing, character set encoding, interrupt processing, and compiler design would be
hard to study in detail without a grasp of how a computer
operates at the hardware level. Since a computers behavior is fundamentally dened by its instruction set, the
logical way to learn such concepts is to study an assembly
language. Most modern computers have similar instruction sets. Therefore, studying a single assembly language
is sucient to learn: I) the basic concepts; II) to recognize
situations where the use of assembly language might be
appropriate; and III) to see how ecient executable code
can be created from high-level languages. [28] This is analogous to children needing to learn the basic arithmetic
operations (e.g., long division), although calculators are
widely used for all except the most trivial calculations.

cryptographic algorithms that must always 3.3 Typical applications


take strictly the same time to execute, preventing timing attacks.
Assembly language is typically used in a systems
boot code, the low-level code that initializes and
Situations where complete control over the environtests the system hardware prior to booting the opment is required, in extremely high security situaerating system and is often stored in ROM. (BIOS
tions where nothing can be taken for granted.
on IBM-compatible PC systems and CP/M is an example.)
Computer viruses, bootloaders, certain device
drivers, or other items very close to the hardware
or low-level operating system.
Some compilers translate high-level languages into
assembly rst before fully compiling, allowing the
Instruction set simulators for monitoring, tracing
assembly code to be viewed for debugging and optiand debugging where additional overhead is kept to
mization purposes.
a minimum
Reverse-engineering and modifying program les
such as
existing binaries that may or may not have
originally been written in a high-level language, for example when trying to recreate
programs for which source code is not available or has been lost, or cracking copy protection of proprietary software.
Video games (also termed ROM hacking),
which is possible via several methods. The
most widely employed is altering program
code at the assembly language level.
Self-modifying code, to which assembly language
lends itself well.
Games and other software for graphing calculators.[27]
Assembly language is still taught in most computer science and electronic engineering programs. Although few

Relatively low-level languages, such as C, allow the


programmer to embed assembly language directly in
the source code. Programs using such facilities, such
as the Linux kernel, can then construct abstractions
using dierent assembly language on each hardware
platform. The systems portable code can then use
these processor-specic components through a uniform interface.
Assembly language is useful in reverse engineering. Many programs are distributed only in machine code form which is straightforward to translate
into assembly language, but more dicult to translate into a higher-level language. Tools such as the
Interactive Disassembler make extensive use of disassembly for such a purpose.
Assemblers can be used to generate blocks of data,
with no high-level language overhead, from formatted and commented source code, to be used by other
code.

7 EXAMPLE LISTING OF ASSEMBLY LANGUAGE SOURCE CODE

Related terminology
Assembly language or assembler language is
commonly called assembly, assembler, ASM, or
symbolic machine code. A generation of IBM
mainframe programmers called it ALC for Assembly Language Code or BAL[29] for Basic Assembly
Language. Calling the language assembler might
be considered potentially confusing and ambiguous,
since this is also the name of the utility program that
translates assembly language statements into machine code. However, this usage has been common among professionals and in the literature for
decades.[30] Similarly, some early computers called
their assembler their assembly program.[31]

5 List of assemblers for dierent


computer architectures
Main article: List of assemblers

6 Further details
For any given personal computer, mainframe, embedded
system, and game console, both past and present, at least
one possibly dozens of assemblers have been written.
For some examples, see the list of assemblers.

On Unix systems, the assembler is traditionally called as,


although it is not a single body of code, being typically
The computational step where an assembler is run, written anew for each port. A number of Unix variants
including all macro processing, is termed assembly use GAS.
time. The assembler is said to be assembling the Within processor groups, each assembler has its own disource code.
alect. Sometimes, some assemblers can read another
assemblers dialect, for example, TASM can read old
MASM code, but not the reverse. FASM and NASM
The use of the word assembly dates from the early have similar syntax, but each support dierent macros
years of computers (cf. short code, speedcode).
that could make them dicult to translate to each other.
The basics are all the same, but the advanced features will
[34]
A cross assembler (see also cross compiler) is an dier.
assembler that is run on a computer or operating sys- Also, assembly can sometimes be portable across diftem of a dierent type from the system on which the ferent operating systems on the same type of CPU.
resulting code is to run (the target system). Cross- Calling conventions between operating systems often difassembling may be necessary if the target system fer slightly or not at all, and with care it is possible to gain
cannot run an assembler itself, as is typically the some portability in assembly language, usually by linkcase for small embedded systems. The computer on ing with a C library that does not change between operatwhich the cross assembler is run must have some ing systems. An instruction set simulator can process the
means of transporting the resulting machine code to object code/ binary of any assembler to achieve portathe target system. Common methods involve trans- bility even across platforms with an overhead no greater
mitting an exact byte-by-byte copy of the machine than a typical bytecode interpreter. This is similar to use
code or an ASCII representation of the machine of microcode to achieve compatibility across a processor
code in a portable format (such as Motorola or Intel family.
hexadecimal) through a compatible interface to the
Some higher level computer languages, such as C and
target system for execution.
Borland Pascal, support inline assembly where sections
of assembly code, in practice usually brief, can be em An assembler directive or pseudo-opcode is a com- bedded into the high level language code. The Forth lanmand given to an assembler directing it to perform guage commonly contains an assembler used in CODE
operations other than assembling instructions.[2] words.
Directives aect how the assembler operates and An emulator can be used to debug assembly-language
may aect the object code, the symbol table, the programs.
listing le, and the values of internal assembler parameters. Sometimes the term pseudo-opcode is reserved for directives that generate object code, such
7 Example listing of assembly lanas those that generate data.[32]

guage source code


A meta-assembler is a program that accepts the
syntactic and semantic description of an assembly The following is a partial listing generated by the NASM,
language, and generates an assembler for that lan- an assembler for 32-bit Intel x86 CPUs. The code is for
a subroutine, not a complete program.
guage. [33]

9
1. ;-----------------------------------------------------------

8 See also

2. ; zstr_count:

Compiler

3. ; Counts a zero-terminated ASCII string to determine its size

Disassembler

4. ; in: eax = start address of the zero terminated string


5. ; out: ecx = count = the length of the string

High Level Assembly


Instruction set

6.

Little man computer an educational computer


model with a base-10 assembly language

7. zstr_count: ; Entry point

Microassembler

8. 00000030 B9FFFFFFFF mov ecx, 1 ; Init the loop


counter, pre-decrement

Typed assembly language

9. ; to compensate for the increment


10. .loop:

Comparison of assemblers

9 References

11. 00000035 41 inc ecx ; Add 1 to the loop counter


12. 00000036 803C0800 cmp byte [eax + ecx], 0 ;
Compare the value at the strings

[1] Whether these bitgroups are orthogonal, or to what extent


they are, depends on the CPU and instruction set design
at hand.

13. ; [starting memory address Plus the

[2] David Salomon (1993). Assemblers and Loaders

14. ; loop oset], to zero

[3] Beck, Leland L. (1996). 2. System Software: An Introduction to Systems Programming. Addison Wesley.

15. 0000003A 75F9 jne .loop ; If the memory value is


not zero,
16. ; then jump to the label called '.loop',
17. ; otherwise continue to the next line
18. .done:
19. ; We don't do a nal increment,
20. ; because even though the count is base 1,
21. ; we do not include the zero terminator in the
22. ; strings length
23. 0000003C C3 ret ; Return to the calling program

[4] Hyde, Randall. Chapter 12 Classes and Objects. The


Art of Assembly Language, 2nd Edition. No Starch Press.
2010.
[5] Intel Architecture Software Developers Manual, Volume 2:
Instruction Set Reference. Intel Corporation. 1999. Retrieved 18 November 2010.
[6] Intel Architecture Software Developers Manual, Volume 2:
Instruction Set Reference. Intel Corporation. 1999. pp.
442 and 35. Retrieved 18 November 2010.
[7] Evans, David (2006). x86 Assembly Guide. University
of Virginia. Retrieved 18 November 2010.
[8] The SPARC Architecture Manual, Version 8. SPARC,
International. 1992.
[9] Z80 Op Codes for ZINT. Z80.de. Retrieved on 2013-07-

21.
The rst column (from the left) is simply the line number
in the listing and is otherwise meaningless. The second
[10] Macros (C/C++), MSDN Library for Visual Studio
column is the relative address, in hex, of where the code
2008. Microsoft Corp. Retrieved 2010-06-22.
will be placed in memory. The third column is the actual
compiled code. For instance, B9 is the x86 opcode for [11] Concept 14 Macros. MVS Software. Retrieved May
25, 2009.
the MOV ECX instruction; FFFFFFFF is the value 1 in
twos-complement binary form.
[12] Answers.com. assembly language: Denition and Much

Names suxed with colons (:) are symbolic labels; the


More from Answers.com. Retrieved 2008-06-19.
labels do not create code, they are simply a way to tell
the assembler that those locations have symbolic names. [13] Provinciano, Brian. NESHLA: The High Level, Open
Source, 6502 Assembler for the Nintendo Entertainment
The .done label is only present for clarity of where the
System.
program ends, it does not serve any other purpose. Prexing a period (.) on a label is a feature of the assembler, [14] Salomon. Assemblers and Loaders. p. 7. Retrieved 201201-17.
declaring the label as being local to the subroutine.

10

11

[15] The IBM 650 Magnetic Drum Calculator. Retrieved


2012-01-17.
[16] Eidolons Inn: SegaBase Saturn
[17] Jim Lawless (2004-05-21). Speaking with Don French
: The Man Behind the French Silk Assembler Tools.
Archived from the original on 21 August 2008. Retrieved
2008-07-25.
[18] Rusling, David A. The Linux Kernel. Retrieved Mar 11,
2012.
[19] Writing the Fastest Code, by Hand, for Fun: A Human
Computer Keeps Speeding Up Chips. New York Times,
John Marko. 2005-11-28. Retrieved 2010-03-04.
[20] Bit-eld-badness.
hardwarebug.org.
2010-01-30.
Archived from the original on 5 February 2010. Retrieved
2010-03-04.
[21] GCC makes a mess. HardwareBug.org. 2009-05-13.
Archived from the original on 16 March 2010. Retrieved
2010-03-04.
[22] Randall Hyde. The Great Debate. Archived from the
original on 16 June 2008. Retrieved 2008-07-03.
[23] Code sourcery fails again. hardwarebug.org. 2010-0130. Archived from the original on 2 April 2010. Retrieved
2010-03-04.
[24] Click, Cli. A Crash Course in Modern Hardware. Retrieved May 1, 2014.
[25] BLAS Benchmark-August2008. eigen.tuxfamily.org.
2008-08-01. Retrieved 2010-03-04.
[26] "x264.git/common/x86/dct-32.asm".
2010-09-29. Retrieved 2010-09-29.

git.videolan.org.

EXTERNAL LINKS

10 Further reading
Yurichev, Dennis, An Introduction To Reverse
Engineering for Beginners. Online book: http:
//yurichev.com/writings/RE_for_beginners-en.pdf
ASM Community Book An online book full of helpful ASM info, tutorials and code examples by the
ASM Community
Jonathan Bartlett: Programming from the Ground
Up. Bartlett Publishing, 2004. ISBN 0-97528384-7
Also available online as PDF
Robert Britton: MIPS Assembly Language Programming. Prentice Hall, 2003. ISBN 0-13-142044-5
Paul Carter: PC Assembly Language. Free ebook,
2001.
Website
Je Duntemann: Assembly Language Step-by-Step.
Wiley, 2000. ISBN 0-471-37523-3
Randall Hyde: The Art of Assembly Language. No
Starch Press, 2003. ISBN 1-886411-97-2
Draft versions available online as PDF and HTML
Peter Norton, John Socha, Peter Nortons Assembly
Language Book for the IBM PC, Brady Books, NY:
1986.
Michael Singer, PDP-11. Assembler Language Programming and Machine Organization, John Wiley &
Sons, NY: 1980.

[27] 68K Programming in Fargo II. Archived from the original on 2 July 2008. Retrieved 2008-07-03.

Dominic Sweetman: See MIPS Run. Morgan Kaufmann Publishers, 1999. ISBN 1-55860-410-3

[28] Hyde, Randall (1996-09-30). Foreword (Why would


anyone learn this stu?"), op. cit.". Archived from the
original on 25 March 2010. Retrieved 2010-03-05.

John Waldron: Introduction to RISC Assembly Language Programming. Addison Wesley, 1998. ISBN
0-201-39828-1

[29] Technically BAL was only the assembler for BPS; the others were macro assemblers.
[30] Stroustrup, Bjarne (1986). The C++ Programming Language. Addison-Wesley. ISBN 0-201-12078-X. C++ was
primarily designed so that the author and his friends would
not have to program in assembler, C, or various modern
high-level languages. [use of the term assembler to mean
assembly language]
[31] Saxon, James; Plette, William (1962). Programming the
IBM 1401. Prentice-Hall. LoC 62-20615. [use of the
term assembly program]
[32] Microsoft Corporation. MASM: Directives & PseudoOpcodes. Retrieved March 19, 2011.
[33] (John Daintith, ed.) A Dictionary of Computing: metaassembler
[34] Randall Hyde. Which Assembler is the Best?". Archived
from the original on 18 October 2007. Retrieved 200710-19.

11 External links
Unix Assembly Language Programming
Linux Assembly
PPR: Learning Assembly Language
NASM The Netwide Assembler (a popular assembly language)
Assembly Language Programming Examples
Authoring Windows Applications In Assembly Language
Iczelions Win32 Assembly Tutorial at the Wayback
Machine
Assembly Optimization Tips by Mark Larson

11

12
12.1

Text and image sources, contributors, and licenses


Text

Assembly language Source: http://en.wikipedia.org/wiki/Assembly%20language?oldid=653903113 Contributors: Damian Yerrick, Derek


Ross, Wesley, Bryan Derksen, Robert Merkel, Zundark, 0, Andre Engels, Scipius, XJaM, Christian List, SimonP, Alan_d, Hirzel, Edward,
Patrick, Pnm, Lousyd, Ixfd64, Karada, Alo, Ahoerstemeier, Ronz, Nanshu, Angela, Nikai, Andres, Emperorbma, Popsracer, Dcoetzee,
Greenrd, Dtgm, Furrykef, Gutsul, Wernher, Bevo, AnthonyQBachler, Robbot, Murray Langton, Lowellian, Stewartadcock, Pingveno,
Rfc1394, Auric, Blainster, Rasmus Faber, Mendalus, Wikibot, Blashyrk, Wereon, Fuelbottle, Superm401, Tobias Bergemann, Filemon,
Giftlite, Graeme Bartlett, DavidCary, Giddie, Ferkelparade, Everyking, Mellum, VampWillow, Darrien, Just Another Dan, Quadell, Bgbot, Beland, Sanbec, Bumm13, Red Prince, Rdnk, Tzarius, Abdull, Shadanan, Discospinster, Rich Farmbrough, Guanabot, Michal Jurosz,
Wrp103, Paul August, Bender235, Uli, CanisRufus, Kwamikagami, Shadow demon, Pneuhaus, Femto, Nigelj, Spalding, Quarryman, R.
S. Shaw, SpeedyGonsales, Guy Harris, Atlant, Wtshymanski, Suruena, Mozillaman, Scott Gall, Sleigh, SteinbDJ, Alex.g, A D Monroe III,
Feezo, Marasmusine, Mindmatrix, RHaworth, LOL, Capi crimm, Pol098, Ruud Koot, Apokrif, Kglavin, Easyas12c, Wikiklrsc, Kaster,
ThomasHarte, Toussaint, Ashmoo, Graham87, Bilbo1507, Kbdank71, Rjwilmsi, Vegaswikian, DouglasGreen, Nandesuka, Fred Bradstadt,
Ysangkok, Wknight8111, RexNL, Swtpc6800, Toksyuryel, TeaDrinker, Chobot, Garas, Mr.Do!, Wavelength, Crotalus horridus, Wolfmankurd, RussBot, Wengier, SpuriousQ, Akamad, Gaius Cornelius, Wiki alf, Msikma, Grafen, Zarel, JoeBruno, Mkouklis, Cheeselet, Dasnov, Jeh, Tonymec, Jth299, Simon80, Schultkl, Tcsetattr, SilentC, Arthur Rubin, Cedar101, Th1rt3en, StealthFox, Mustafazamany, Teply,
Oldhamlet, Chris Chittleborough, Locke Cole, True Pagan Warrior, SmackBot, Elving, Mmernex, Slashme, Henriok, Pgk, LeeZ, Monz,
Gilliam, Ohnoitsjamie, Cquarksnow, Anwar saadat, KD5TVI, Michele.alessandrini, Dmbrunton, MalafayaBot, Mdwh, Jerome Charles
Potts, Nbarth, ZyMOS, DHN-bot, Konstable, Audriusa, Can't sleep, clown will eat me, Frap, Kindall, An-chan, Jukrat, Fmiddleton, Binrapt,
Coolv, A.R., DMacks, Autodmc, Wizardman, Daniel Santos, Chopin1810, Spinality, MCWNT, SashatoBot, SpareHeadOne, Derek farn,
Vanished user 9i39j3, Darktemplar, RCX, Soumyasch, MonstaPro, Beefball, Fuzzbox, Ben Moore, 16@r, Loadmaster, Hvn0413, Paresthesias, CUTKD, Gondooley, EricR, Mdanh2002, Accatagon, Blakegripling ph, Drlegendre, Beno1000, Az1568, Anger22, MP64, Tawkerbot2, SkyWalker, CRGreathouse, Ahy1, Ale jrb, Stmrlbs, Velle, HenkeB, SpooK, Funnyfarmofdoom, Inzy, JohnCJWood, Zonination,
Vwollan, Chasingsol, Solidpoint, Tawkerbot4, DumbBOT, Starionwolf, Surturz, Mtpaley, PamD, Epbr123, Kubanczyk, Ultimus, Hotdogger, James086, Mellamokb, NERIUM, Rotundo, Brage, Rivanov, AntiVandalBot, Gioto, Okane, DekuDekuplex, Dragon DASH, Kdakin,
JAnDbot, MER-C, IanOsgood, Andonic, Greensburger, Nmcclana, VoABot II, JamesBWatson, Sanoj1234, BroodKiller, Stuart Morrow,
Tedickey, Srice13, Vobis132, Wwmbes, Martynas Patasius, JaGa, Matt B., GermanX, Wre2wre, Gwern, Dsavi.x4, Eagle246, MartinBot, GoldenMeadows, FDD, ItsProgrammable, Mike Field, Trusilver, Kjp993, Hans Dunkelberg, NightFalcon90909, Jesant13, Athaenara,
Cpiral, Starnestommy, DeanHarding, Emuguru, Bigdumbdinosaur, Mufka, JavierMC, 28bytes, Owl3638, VolkovBot, Philip Trueman,
TXiKiBoT, Jpsowin, Vanished user ikijeirw34iuaeolaseric, Anna Lincoln, Lkopeter, Vid512, CanOfWorms, Akyprian, LeaveSleaves,
Stormy Ordos, Jorgon, Slaryn, KP-Adhikari, BotKung, Doug, Andy Dingley, Sploonie, Zx-man, Praefectorian, Struway, Haiviet, TheStarman, Kbrose, Wjl2, SieBot, Whitehatnetizen, BotMultichill, Gerakibot, Eagleal, Xymmax, Jerryobject, Keilana, Flyer22, Masgatotkaca,
Reinderien, Harry, Chieancypants, Nusumareta, Capitalismojo, CultureDrone, Anchor Link Bot, Jacob.jose, Mykk, ArchiSchmedes,
AMRAN AL-AGHBARI, Startswithj, Martarius, ClueBot, The Thing That Should Not Be, LizardJr8, Angusus, System86, Rbakels, Excirial, Tim32, Goodone121, Manatee0, Jack O exiled, Aitias, Versus22, Moonlit Knight, SoxBot III, Ospix, HumphreyW, Sanpnr, DumZiBoT, Meera123, Khunglongcon, Galzigler, Subversive.sound, Addbot, Ramu50, 3Nigma, PJonDevelopment, Tanhabot, Wilky DiFendare,
Areebmajeed, Jiveshwar, Scientus, NjardarBot, MrOllie, AndersBot, AnnaFrance, Herzleid, Tide rolls, Another-anomaly, Tomasz Tybulewicz, Gannimo, Luckas-bot, Yobot, JackPotte, Fraggle81, Utvik, Denispir, Pcap, Peter Flass, Backslash Forwardslash, AnomieBOT,
Galoubet, Piano non troppo, TParis, Flewis, Materialscientist, Citation bot, Aleph Innity, Xqbot, Edward.in.Edmonton, Koektrommel,
Jerey Mall, Tarikes, Arbabarehman, Miguelito Vieira, GrouchoBot, Poolisfun, Shadowjams, Chatul, FrescoBot, ChazZeromus, Tsetsee yugi, DrilBot, Pinethicket, Trijnstel, Mikellewellyn, Skyerise, Hoo man, Reaper Eternal, Tbhotch, Dexter Nextnumber, RjwilmsiBot,
ButOnMethItIs, Regancy42, Mario Blunk, Eleolar, DASHBot, Dead Horsey, Bison bloke, BillyPreset, Evilshy1, Slightsmile, Casascius,
Dawkcodave, Donner60, Colejohnson66, 594rk, TS133, Petrb, ClueBot NG, Cwmhiraeth, Chester Markel, Giobe2000, Btzh2o, Rezabot,
Mcandre, MasterProf, Compfreak7, Syjsoc, ChrisGualtieri, Longtt89, Rezonansowy, OldCodger2, Frosty, Jamesx12345, Geau, EvilKeyboardCat, FallingGravity, The Editors United, Franois Robere, Jcherman, Cognoscent, Tusonchaz, Peetz1, Ththrthtrh, Halfhat, Macofe,
Ankitapasricha and Anonymous: 611

12.2

Images

File:8bit-dynamiclist.gif Source: http://upload.wikimedia.org/wikipedia/commons/1/1d/8bit-dynamiclist.gif License: CC-BY-SA-3.0


Contributors: Own work Original artist: Seahen
File:Commons-logo.svg Source: http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
File:Internet_map_1024.jpg Source: http://upload.wikimedia.org/wikipedia/commons/d/d2/Internet_map_1024.jpg License: CC BY
2.5 Contributors: Originally from the English Wikipedia; description page is/was here. Original artist: The Opte Project
File:Motorola_6800_Assembly_Language.png Source:
http://upload.wikimedia.org/wikipedia/commons/f/f3/Motorola_6800_
Assembly_Language.png License: Public domain Contributors: Own work Original artist: Swtpc6800 en:User:Swtpc6800 Michael Holley
File:Wikibooks-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikibooks-logo.svg License: CC BY-SA 3.0
Contributors: Own work Original artist: User:Bastique, User:Ramac et al.
File:Wikinews-logo.svg Source: http://upload.wikimedia.org/wikipedia/commons/2/24/Wikinews-logo.svg License: CC BY-SA 3.0
Contributors: This is a cropped version of Image:Wikinews-logo-en.png. Original artist: Vectorized by Simon 01:05, 2 August 2006 (UTC)
Updated by Time3000 17 April 2007 to use ocial Wikinews colours and appear correctly on dark backgrounds. Originally uploaded by
Simon.
File:Wikiversity-logo-Snorky.svg Source: http://upload.wikimedia.org/wikipedia/commons/1/1b/Wikiversity-logo-en.svg License: CC
BY-SA 3.0 Contributors: Own work Original artist: Snorky
File:Wiktionary-logo-en.svg Source: http://upload.wikimedia.org/wikipedia/commons/f/f8/Wiktionary-logo-en.svg License: Public domain Contributors: Vector version of Image:Wiktionary-logo-en.png. Original artist: Vectorized by Fvasconcellos (talk contribs), based
on original logo tossed together by Brion Vibber

12

12

12.3

Content license

Creative Commons Attribution-Share Alike 3.0

TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

S-ar putea să vă placă și