Documente Academic
Documente Profesional
Documente Cultură
Fall 2009
By David Wurmfeld
Today, the domain of an embedded system is almost limitless, ranging from a full
blown LINUX system deployed on a single VirtexIV FPGAi chip with a PowerPC
microprocessor core and custom integrated peripherals to a 4-bit data security chip
glued onto the front of a smart bank card. Indeed, today’s embedded systems may not
have any “pins” to speak of, they may be pre-compiled “cores” or “software templates”
of hardware architectures designed to implement and complement common computer
resources. These cores are purely software in nature, describing hardware architecture
using a “Hardware Description Language” or HDL, and only take on a physical
manifestation when implemented within a particular ASIC or FPGA scheme. These
cores are often referred to as Intellectual Property or IP. The domain of the typical
embedded system however is dominated by single chip microcontrollers with fewer
than a dozen Input-Output (I/O) pins.
This paper will confine the domain to 8-bit and 16-bit microcontrollers, concentrating
on how they compare with one another, from the 30,000-foot view down to the register
level3. In addition to the “anatomy” or architecture of the embedded systems, the
“physiology” or behaviour, from high level constructs down to bits in silicon will be
outlined and compared. To synthesize the disparate facts and processes into meaningful
information, the embedded systems outlined will be compared from a simple
performance metric, using a “Gedanken experiment”4 to explore the performance of
three hypothetical embedded systems.
Like its predecessors, the computer needed all the elements of a traditional Finite State
Machine with a new twist; the ability to change states by following bit patterns; bit
patterns found in configurable structures, structures not hard wired into the design.
3
For the purposes of this paper, we are taking as faith the silicon topologies and processes used to
implement registers work and are well described in other tomes.
4
Thought Experiment: http://en.wikipedia.org/wiki/Thought_experiment
5
For the initial part of this discussion, we will refer to a chip as the fundamental embedded system
building block. Later we will expand that definition to include the concept of a microcontroller “core”.
6
About.com: Inventors - http://inventors.about.com/library/blcoindex.htm
7
Technical Institute of Berlin; http://user.cs.tu-berlin.de/~zuse/Konrad_Zuse/index.html
The Guts of the Processor: the Program Counter, ALU and Control
Unit.
Keeping with the comparative anatomy theme, every computer is built using these two
memory structures in one form or another. They are used to provide long and short-
term storage for data and instructions. The third element necessary to the operation of
the computer is the control unit. It is a multi-function module that controls and
synchronizes the flow of data from the outside to the inside, and between memory
elements and the outside world.
Like a policeman directing traffic, the control unit directs when and where data will
move. It also records state information for use by other operations. The key to the
control module is the program counter or PC. It is a special purpose register 10 that holds
the memory address of the next instruction to be executed. The width in bits of the PC
corresponds to the maximum number of instructions that can be addressed by the
computer.
The number of instructions that can be addressed is 2 n locations. The control module
“fetches” the instruction from ROM, “pointed” to by the PC. It then translates the
instruction into a sequence of control signals that route the data from and to the
appropriate location.
8
ROM is the acronym for Read Only Memory, meaning not writable but readable. In practice, these
memories are writable at least once, to configure the memory. Typically, they are implemented using
FLASH technology, allowing multiple write cycles using proper programming equipment.
9
RAM is the acronym for Random Access Memory, which is a misnomer as all addressable memory is
random access by definition. It traditionally refers to memory that looses all data when the power is off,
and is typically of a static or dynamic nature.
10
A “register” is a fixed width, volatile memory element, used to store intermediate information. This
“information” may be the next address to execute, or the flag bits used to configure the built in A/D
converter.
The following structures are the building blocks of all embedded computer systems:
• Program memory – read mostly, stores instructions and constant data (data
that does not change over time). Non-Volatile, data is retained after the power
is turned off. Typically it is organized as an addressable matrix of α x β bytes,
where α is the memory address width and represents 2α locations, and β
represents 1 or 2 bytes of memory width.
• Data memory – read/write; stores the results of instructions and state
interactions. Volatile, looses all data when power is turned off. Typically it is
organized as an addressable matrix of α x β bytes, where α is the memory
address width and represents 2α locations, and β represents 1 or 2 bytes of
memory width.
• Program counter – a special purpose volatile memory element (usually a
dedicated register) that holds the address of where the processor is in its
instruction sequence, usually the address of the next instruction to be
“fetched”12 executed. The width of the PC corresponds to the maximum number
of instructions the computer can hold.
• Control unit – A dedicated Finite State Machine (FSM) that takes as its inputs
the instruction from the program memory, translating the bit pattern into actions
manifested as synchronized control signals and states to the other modules in
the computer.
• Arithmetic-Logic-Unit – A dedicated FSM that takes as inputs control signals
and “chunks” of data13 (usually whole bytes or words14) and gives as outputs the
results of the operation, in similar chunks of data.
11
It will be seen that embedded processors are “theme” oriented, that is as a motor controller, or
communications controller or a sensor controller, and the architecture correspondingly includes “special
functionality” (read dedicated registers/operations) that make those features efficient compared to doing
it manually in software.
12
The term “fetch” normally associated with a ball and the family dog is a good analogy in this context
as the verb describing the action of retrieving the instruction from program memory. It involves looking
it up, getting it physically and bringing it back.
13
Historically, the term “data word” is the fundamental width of the registers and program memory
native to the microprocessor. In the domain of processors we are outlining, it dependent on the
architecture. Wikkipedia: http://en.wikipedia.org/wiki/Word_%28computing%29
14
Other types of embedded processors have enough pins to support accessing memory
off chip. Starting out, we will address those topologies that have memory built into the
chip. The following figures illustrate three different embedded processors, the ARM-7,
the Atmel 89C2051 and the Microchip PIC18F1330 8-bit microcontrollers. Starting out
lets look at the “anatomy” of these processors.
In Figure 4, the PIC18 processor, it is easy to identify the modules we have described
so far, ALU, Program Counter, Control Unit “Instruction Decode and Control”,
Program Memory and Data Memory. You will also notice there are many other
modules in the processor we haven’t discussed yet but may be able to guess at the
15
The word “chip” is loosely used to mean those devices built from “chips” of silicon wafer, mounted
onto a leaded carrier, providing the pins that allow connection to the circuit.
16
Image copyright © © 2009 Micro Control Journal. All rights reserved.
(http://www.mcjournal.com/articles/arc105/arc105.htm)
So far we have outlined where data exists and what is used to access and modify it.
Now that we can understand what these modules are, from a generic concept to actual
examples of real machines, the next big step is to understand how these modules
interact. The next question to answer is: Exactly how does an embedded processor
execute an instruction?
19
Image © 2009, Microchip, http://www.microchip.com/wwwproducts/Devices.aspx?dDocName=en022957
For the time being we will focus on these simple but vital tasks. Somehow, the control
unit is “smart” enough to know what the next address is to fetch the next instruction.
The key to understanding how something works is to “walk a mile in its shoes” as it
were, to follow it step by step as it does its job. Let’s consider a simple scenario with
two questions. What actually happens when an embedded processor20 is powered up
and how does the control unit orchestrate these events?
As we are dealing with events that take place in time, it is traditional to illustrate these
event relationships that happen in time or in synchrony with a “waveform” chart. A
20
From this point forward, the term “processor” or “embedded processor” or “computer” will all refer to
the same thing.
The following is a generic power up sequence that could apply to almost any embedded
processor. It is organized as several rows, each representing a particular signal as it
changes in the time domain. The “signal” may represent an actual voltage, or a logical
state, for example 0V to 3.3V, or “asserted” or “not asserted”21. The row images of the
waveform are linked in time, that is they all start at the same time, and important events
are usually labelled. In this example, the first row represents a logic condition of power
being applied, (the lower line illustrates the zero or off condition, and the upper line
represents the high or on condition) rather than its actual voltage value(s). Important
here is the idea that not all signals are valid at all times.
POWER
power
stable
RESET
Start
Reset
CLOCK
clock
stable
ADDRESS
DATA
INST
Cont
FETCH
Fetch 1st Instructionf
Power on sequence:
1. Power is applied to the chip (The beginning of time as the chip sees it)
2. The reset signal is asserted, holding the chip in a reset state.
3. In the reset state, nothing happens within the computer, but the computer cycle
clocks start oscillating and everything is poised, just waiting for the reset to be
released. This is one of the most important times for a computer, without it, the
control unit and program counter would be in unknown states22, and could cause the
computer to go haywire, not knowing what state it is in, or where to go next.
21
It is more accurate to use the term “asserted/not asserted” to indicate the value of a particular state. “1
or 0”, or “true” or “false” can all imply an implementation of a state. A logic 0 may be represented by
anything less than 0.9 VDC in a 3.3V system, and represents the asserted or enabled state of a processor
reset signal, which would be logically “true” for its value.
22
By unknown state, consider what is physically happening in time when power is first applied to a
transistor circuit. This all happens on the time scale of pico and nanoseconds, but when dozens of
transistors are linked together, it can take hundreds or thousands of nanoseconds to settle down into a
known state
23
For this overview, we are playing fast and loose with the necessity of system synchronism. Assume on
faith that every processor cycle that is executed is done in time and in sync with a clock, or clocks, or
portions of a clock to insure the data is taken or arrives where it belongs when it is valid to do so.
24
The “pieces referred to will be described in detail later, suffice it for now the pieces may be an offset
from the current location and the previous location, along with any increment pending.
Buses
Data
Program Memory
PC
Memory
ALU
I/O
Port
The next part of our discussion of embedded processor “physiology” is how the control
unit “knows” what to do with the instruction it fetched from the program memory.
The Control Unit is the very heart of any embedded processor. Ultimately it is
responsible for knowing what instruction to fetch next, how to fetch it, set up for the
next instruction, decode and execute the instruction just fetched. It is a relatively
complex FSM designed specifically to control the inner workings of the computer
according to basic cycle specifications like the fetch cycle mentioned previously, or in
real time by decoding the cycle information contained with the instruction.
So far so good, before we continue, let’s do a little Boolean algebra review. Recall that
the number of permutations a particular binary number has is equal to 2 n where n is the
number of bits in the binary number. For example, if we had a 4 digit binary number it
has 24 or 16 possible combinations. An 8-bit byte has 28 or 256 possible combinations.
This organization is used extensively in computers to allow us to select one from many,
or address one memory location from the tens of thousands of memory locations
available to us. Like the ubiquitous “Apartment Number” analogy, for every memory
location, there is a unique address, just as there is a unique physical address or number
for every apartment.
This is what is being done with the so-called “instructions”, bit patterns are being used
to represent places and actions we wish the computer to access or execute. Now we can
say it; when a computer “executes” an instruction, it means that particular instruction
has a physical meaning associated to its unique bit pattern. That meaning is used to
enable the sequence of events that is required to “execute” the meaning or command.
Time for a real example; let’s say our embedded processor has an instruction called
“Add”. Its function is to cause the contents of some register (lets call it “a”) to be
added with the fixed value 0x14 and the results stored back in register a, wherever
that is. The operation code (opcode25) for this instruction could be “110001” in
binary, and the fixed (immediate) value might be 0x1426, “00010100”. The entire
25
See definition: http://en.wikipedia.org/wiki/Opcode
26
The traditional prefix for a hexadecimal number is the two character pair “0x”. Each hexadecimal digit
is four bits wide, thus having 16 values, from 0 (0000) to F (1111).
http://en.wikipedia.org/wiki/Hexadecimal
The first string28 of binary digits is traditionally called “machine language” (computer
baby talk) and the second statement is called “assembly language”, a pseudo English
patois of suggestive verbs and nouns loosely cobbled together to garner meaning.
Here is the beginning of what is called the “Tool Chain”, a very important concept in
understanding how computers work. There are tools (actually applications that run on a
separate development computer system) that help us translate language a human can
understand into machine language a computer can execute; the actual, physical binary
pattern stored in program memory.
In this simple example, we would create a program using a stand alone text editor or an
editor within an IDE29 containing among other things the “add” statement above and
use that human readable text file as the input to an application called an “assembler”.
The assembler interprets the assembly language “source code” into the appropriate bit
pattern. To complete the chain, that bit pattern is then combined with other bit patterns
to form an executable bit image. This bit image is then “programmed” or “burned30”
into the computer program memory ROM. We will be discussing the tool chain
concept in more detail later. At this point in our tutorial accept it on faith that there is
indeed a way that humans can create programs that ultimately physically are
manifested as bit patterns or instructions inside the processor, ready to execute when
the power is turned on.
27
Derived from Wikipedia definition: http://en.wikipedia.org/wiki/Mnemonic
28
Be careful, this is not a binary number; it is a composite representation of opcode and data.
29
“Integrated Development Environment”, a computer application that streamlines the creation of
computer programs by integrating the editor, compiler, assembler and linker into a single user interface.
30
“Burning” a ROM is a throw back to when physical metal fuses integral to the memory were burned
away using a high current pulse, permanently setting the state for that memory location. The specific
mechanisms for memory is beyond the scope of this paper; See
http://www.howstuffworks.com/rom.htm/printable for more details.
CU
Buses
Data
Program Memory
PC
Memory
ALU
I/O
Port
As you can see from the illustration above, most computer architectures have at least
some sort of program memory, data memory, program counter and control unit. We are
almost ready to start looking at particular embedded processors. It is first necessary to
understand the relationship between the program memory, data memory and dedicated
volatile memory elements or registers.
Some embedded processors for example, have a special dedicated register for
everything. This is where the uniqueness of a processor manifests itself; how the
functions and data are organized physically on the processor. Registers may be general-
purpose scratch pad to hold any value (say the intermediate result of a logic operation)
or a special function register to hold a binary value that corresponds to the artefacts of
the last instruction31.
As embedded processors contain more and more functionality (timers, serial ports, A/D
converters…) it is necessary to have volatile memory elements to keep track of all their
settings and status. In some processors, there are over two-dozen separate special
function registers just for this purpose. Here is where the similarities end and
individual architectures begin to diverge from the generic model. How does the
computer organize the needs of program memory, data memory and special function
registers? Keep that question close to mind as we continue our exploration of the
computer’s last generic element, the ALU.
Consider the addition of two 8-bit numbers. The sum could be larger than an 8-bit
number can handle so the ALU must be able to accommodate that possibility. The
ALU must also be able to provide some sort of floating point functionality (or at least
mechanisms to support such operations), usually incorporated as partially hardware,
partially custom math libraries for that processor.32 The full complement of math
operations take up a lot of processor real estate and compromises need to be made to
get the maximum functionality in the minimum space with the best performance
possible. It is possible to multiply or divide any two numbers using successive
additions (or subtractions) but that would take a long time. Time then would be the
compromise over the real estate33 needed to have a hardware multiplier integral to the
ALU.
Recall that it is the control unit that calculates the address for the next instruction to
fetch. In the case of the PIC18, this is an address that can accommodate a maximum of
8192 memory locations. As a review, how many bits are needed (the minimum
32
It is mathematically possible to do any math operation just using two bits and a lot of RAM, it would
just tale a lot of instructions to orchestrate even a simple 16 bit integer addition. On the other hand, you
could dedicate three separate registers, two 16 bit and one 32 bit register to hold the operands and the
sum respectfully. Controller architecture is a balance of what space you physically have and what
operations can be done in software.
33
It physically takes space in silicon to do anything, store a bit or make a control unit. Each processor
designer is faced with the problem of trying to find space for everything the marketing people want in
the new version. Compromises are made in performance or size (and power) when design decisions
(architectures) are made.
34
Off-hand homage to the so-called bit-slicers of old.
35
As is often the case, the choice of one particular architecture over another has “religious” implications,
with each ideology having its priests, each believing in their brand of “the truth”. More often than not,
the choice of processor is cost, or number of pins, or “what chip did we use last?” or “how much do the
tools cost?” and not some idealized architecture philosophy.
This common control structure is not by accident. It is to insure instructions written for
the least capable member of the family will work on the most capable member. In fact,
the address latch for the program memory is a full 20-bits wide, allowing up to 220 or
one million memory locations (can we expect future versions of the PIC18 family with
more program memory?). For our PIC18 however, we have more than enough address
bits to accommodate the 8192 locations (16 kBytes) of instructions and constants37.
As we mentioned earlier, the Program counter is much more than a simple register, in
this example it is almost a mini ALU in the operations it can perform to assemble the
correct address for the next instruction. Keeping with the program memory theme, look
at the Atmel 89C2051 memory. Although difficult to read from the simple block
diagram, the literature specifies the program memory to be byte wide, with 16 address
bits for a maximum of 65532 locations, (64 k, 1k = 1024 locations) With our variant,
the 89C2015 has 2k of program memory or 2048 locations, each one byte wide.
Consider this simple fact for a moment. Each program memory access of a
PIC18F1330 processor returns 16 bits. Each program memory access of the 89C2015
returns half as much data. If both processors are running at the same speed, which one
moves more data per unit time? We can’t answer that right now, but keep it in mind
when we compare performance between our three embedded processors.
To complete the program memory tour, the ARM-7 processor addresses 32-bit wide
program memory, unlike the PIC18, which uses a 16-bit address. Very similar
processors, but different approaches to how the program instructions are addressed.
36
2n = 8192, log2(8192) = n = 13 bits
37
We include “and constants” on purpose when describing “program memory” as it is the ideal place to
store values that are known when the code is assembled, and would overwhelm the limited data memory
space. This convenience however comes at a cost, as we will soon see.
This ability begs to be used in a parallel fashion, and not simply in the serial “follow
the recipe” concept of a computer program. The tricky part is, how does the computer
keep track of all these independent operations? More on that later.
Going back to our real life examples, consider the monolithic memory architecture of
the ARM-7. To fetch an instruction, the control unit updates the program counter, then
In this simple example, there is some “dead time”, that time between subsequent
operations that could have been used doing things in parallel.
The organization or architecture of the computer includes how the various volatile
memory elements: registers: are organized and controlled. Some architectures use
individual, separate registers for everything: see Figure 11. An alternative to having
separate physical registers is the model used by the RC8 and PIC18, the registers for
the whole computer are contained in data memory as a set of registers, addressed like
any other memory element and often organized as “files” or “blocks” of memory.
This significantly reduces the complexity of the control unit while maintaining the
flexibility of added functionality. For example, consider two processors from the same
family; the PIC18F8722 and the PIC18F1330. Using the same register file,
architecture, it is possible to accommodate the five timers and 12 A/D modules using
the same control unit that the PIC18F1330 uses to maintain 2 timers and 5 A/D
modules.
In addition to input and output functionality, most microcontrollers these days have
some sort of built in timer capability. These timer modules operate independently once
set and started, and provide a much-needed function to count events or time to be used
in an embedded application. Again, there is a trade off between chip real-estate and
software overhead. Any timer function can be implemented in software using loops and
tests, at the cost of having to execute in linear time with the program. No matter how
fast a program is it can only be doing one thing at a time. Implementing a timer in
hardware however, relieves the burden of maintaining a count.
Timers, I/O ports, A/D modules, indeed most if not all computer special function
modules need registers to hold configuration parameters (in the case of the timer it may
be the flag bit that controls if the timer restarts after an overflow). As we saw in the
PIC18, these registers are part of the computer data memory area. In the case of the
PPC405, they are individual registers, peppered all over the die. Figure 15 illustrates
what is arguably the quintessence of microcontroller technology to date, the Philips
LPC2114.
Using what we have learned, let’s examine this “animal” closely. The key to evaluating
a microcontroller is in answering fundamental structure (anatomy) questions first, and
then if the structure is appropriate to the task, then take a closer look at the
functionality (physiology) of the beast.
First the “bones”; how is the Program counter, Control Unit and ALU arranged? How
does the chip get/put data to the outside world? What other goodies are available
(timers, USARTS, A/D…)?
At first blush, the block diagram of the LPC2114 seems to be missing many essential
elements. They are there; it just takes a little digging. This chip represents a new trend
in embedded processors, that is a common core of functionality, surrounded by the I/O
that makes that particular processor special. Here the core is illustrated in the block
named “ARM7-DTMI”. The actual “guts” of the core is illustrated in Figure 14. The
Program Counter is implemented by two elements, the “Address Incrementer” and the
“Address Register”, which makes sense considering how the program counter normally
functions. The other basic processor elements are present, The ALU, the Control Unit,
and a data register. Some interesting additions are a 32 x 8 hardware multiplier, and a
barrel shifter interfaced with the ALU. These could help floating point operations by
speeding up in hardware common math tasks. (Remember the real-estate –
performance trade-off discussion? Clearly, this chip is built with the building blocks of
speed.) Another interesting element is the 31 x 32 bit register bank, a good place to
hold intermediate results, or condition flags; at this point in the investigation it is not
clear, but should not come as a surprise when looking at the “physiology” of this beast
to discover there are such register locations in the bank.
38
Philips LPC2124 Datasheet.
39
Timers are characterized by how many bits are used to “count” with.
40
A timer typically starts at some preloaded value and count up, one bit at a time until the maximum
value is reached. Then, depending on the mode used, a “overflow” flag is set, the initial value reloaded
and the cycle started all over again.
In this initial look, it is clear this chip has been designed to accommodate almost any
I/O scenario you would encounter with an embedded processor.
Summary.
In this paper we have explored how any embedded processor is made up of the same
elements, and each individual element has basically the same behaviour. These
elements are mixed and matched to create the animal known as an embedded
processor, and along with the software applications in the tool chain, make up a
development environment the designer can use to solve real world problems using
embedded processors.
The heart of the embedded processor is the concept of an “infinite state machine”, that
I, dedicated hardware that can reconfigure functionality by following a set of
instructions. This innovation has enabled designers to move away from the direct
hardware manipulating bits to looking at problems from a modular perspective.
As important as the hardware advances is the tool chain used to design the control
software, the ultimate arbiter of any embedded system.