Microprocessor CoursePDF

Microprocessor Course
There are two typical components of a CPU: arithmetic logic unit (ALU or ALSU) and the control unit (CU). The
ALU performs arithmetic and logical operations, while the CU extracts instructions from memory and decodes
and executes them; calling on the ALU for help when necessary.
A microprocessor incorporates all of the functions of a computer’s CPU on a single integrated circuit (IC).
Initially, it required more than one circuit in order to house the CPU, as the technology became more
advanced, the need for that many circuits reduced. Today a microprocessor can house around 4 CPUs for
quad-core technologies. In addition to the CPU, the microprocessor also packs BIOS and memory access
circuits. (cache, some said also all RAM) It is a programmable device that receives data, processes it according
to stored directions and then gives results in the form of output. The primary language is binary code: 0, 1.
Due to the similarity in usage, it is easy to understand why both of these words have become synonymous. If a
person were to refer to a microprocessor as a CPU and vice-versa, it would be acceptable.
Computer Models. Many Exist, depending what you want to stress on

Secondary Memory:
Secondary memory (or secondary storage) consists of all permanent or persistent storage devices, such as
read-only memory (ROM), flash drives, hard disk drives (HDD), magnetic tapes and other types of
internal/external storage media. In computing operations, secondary memory is accessed only by the primary
or main memory and later transported to the CPU.
Secondary memory is slower than primary memory but can store and retain data, even if the computer is not
connected to electrical power. It also has substantial storage capacities, ranging from some MBs to several TBs
of storage space within single memory.
Main Memory:
Also known as physical memory, it is internal to the computer. The word main is used to distinguish it from
external mass storage devices such as disk drives. Another term for main memory is RAM.
The computer can manipulate only data that is in main memory. Therefore, every program you execute and
every file you access must be copied from a storage device into main memory. The amount of main
memory on a computer is crucial because it determines how many programs can be executed at one time and
how much data can be readily available to a program.
Because computers often have too little main memory to hold all the data they need, computer engineers
invented a technique called swapping, in which portions of data are copied into main memory as they are
needed. Swapping occurs when there is no room in memory for needed data. When one portion of data is
copied into memory, an equal-sized portion is copied (swapped) out to make room.
RAM is constructed from integrated circuits and needs to have electrical power in order to maintain its
information. When power is lost, the information is lost too! It can be directly accessed by the CPU. The access
time to read or write any particular byte are independent of whereabouts in the memory that byte is, and
currently is approximately 50 nanoseconds (a thousand millionth of a second). This is broadly comparable with
the speed at which the CPU will need to access data. Main memory is expensive compared to external
memory so it has limited capacity. The capacity available for a given price is increasing all the time. For
example many home Personal Computers now have a capacity of 16 megabytes (million bytes), while 64
megabytes is commonplace on commercial workstations. The CPU will normally transfer data to and from the
main memory in groups of two, four or eight bytes, even if the operation it is undertaking only requires a
single byte.
RAM and CPU timing:

The CPU is synchronized by its own clock pulses, known as the CPU speed in M,GHZ. The memory however,
does not use the CPU speed (SDRAM does, see later). Instead it has read (Access time at) and write (write
cycle time wct) operations, not necessary equal. They are the maximum time needed from the CS active,
application of the address and then Read or Write Operation fully complete. These must be synchronized
with the CPU speed. The number of clock pulses required for a memory request is the integer value greater
than or equal to wct and at, and then divided by the clock period.
Example:
CPU clock: 50 Mhz,  period = 1/50 *10-6 = 20ns ( 1 nanosecond (ns) = 10-9 s )
at: 65ns, wct: 75ns (Generally wct >= at)
So N = ceil (75/4 ) = 4. This means CPU will devote four clock cycles for each memory request (R or W)
RAM Kinds: SRAM and DRAM

SRAM where the word static indicates that it does not need to be periodically refreshed, as SRAM uses a
latching circuitry (i.e., flip-flops) to store each bit. Each bit is stored as a voltage. Each memory cell requires six
transistors, thus giving chip low density but high speed. However, SRAM is still volatile in the (conventional)
sense that data is lost when powered down.
Disadvantages are more expensive and also consume more power than DRAM.
SRAM is known as cache memory and is included on the processor chip.
DRAM (Dynamic)- Its advantage over SRAM is its structural simplicity: only one transistor (MOSFET gates) and
a capacitor (to store a bit as a charge) are required per bit, compared to six transistors in SRAM. This allows
DRAM to reach very high density. Also it consumes less power and is even cheaper than SRAM (except when
the system size is less than 8 K). DRAM is the main Computer Memory.
But the disadvantage is that since it stores bit information as charge which leaks; therefore information needs
to be read and written again every few milliseconds. This is known as refreshing the memory and it requires
extra circuitry, adding to the cost of system.
DRAM evolved to FPM DRAM (Fast Page mode), to EDO DRAM (Extended Data-Out) to SDRAM (synchronous),
to DDR SDRAM (Double Data Rate) The improvements were in the Refresh Speed, Access time, Write Cycle
Time.
Classic DRAM has an asynchronous interface, which means that it responds as quickly as possible to changes in
control inputs. SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding
to control inputs and is therefore synchronized with the computer's system bus. The clock is used to drive an
internal circuit that pipelines incoming commands. The data storage area is divided into several banks,
allowing the chip to work on several memory access commands at a time, interleaved among the separate
banks. This allows higher data access rates than an asynchronous DRAM.
Pipelining means that the chip can accept a new command before it has finished processing the previous one.
In a pipelined write, the write command can be immediately followed by another command, without waiting
for the data to be written to the memory array. In a pipelined read, the first requested data appears after a
fixed number of clock cycles after the read command (latency). During these clock cycles, additional
commands can be sent. (This delay is called the latency and is an important performance parameter to
consider when purchasing SDRAM for a computer.) SDRAM has same read and write cycle time as it is
synchronized by CPU clock. Generally we use latency term when talking about SDRAM, and Access time when
talking about normal DRAM
SDRAM is widely used in computers; after the original SDRAM, further generations of double data rate
(DDR) RAM have entered the mass market – DDR (also known as DDR1), DDR2, DDR3 and DDR4, with the
latest generation (DDR4) released in second half of 2014.
SDRAM is efficient for burst read or burst write, the resulting multiple word read or write from or to
consecutive addresses. Only one clock cycle is needed for the next word. See below example
SDRAM has a Memory Bandwidth which is the rate at which data can be read from or written into it, and is
generally associated with the burst read and write.
SDRAM Example:
Memory data path width: 1 word = 4 bytes
Burst size: 8 words = 32 bytes
Memory clock frequency: 5 ns
Latency time (from application of row address until first word available): 4 clock cycles
Single Latency time: 4 * 5ns = 20ns

Complete Burst Latency time: (4 + 7) x 5 ns = 55 ns (+7, not +8 as the first word appeared at 20ns)
Memory Bandwidth: 32 / (55 x 10-9) = 582 Mbytes/sec
RAM Modules:
1- Single Inline Memory Module (SIMM) SIMM is a memory module with 32 or 72 pins. SIMMs can be
found in older machines. SIMMs with 72 pins can support 32-bit transfer rates and 32-pin SIMMs can
support 16-bit transfer rates. (2B or 4B)
2- Dual Inline Memory Module (DIMM) DIMM is a memory module with 168 pins. DIMMs are commonly
used today and support 64-bit transfer rates. (8B)
3- Rambus Inline Memory Module (RIMM) is a 184-pin memory module that uses only the DDR SDRAM,
advanced high speed memory module.
The transfer rate is the number of bits RAM modules can read or write simultaneously on the BUS
RAM Block
seen in CO
a: number of address lines

a d: number of data lines
d: power of two, smallest 8 bits = 1 Byte (B)
2a * d 2 Bytes = 16 bits = 1 Word (W)
CS 4 Bytes = 32 bits = 1 Long (L)
8 Bytes = 64 bits = 1 Double Long (D)
R/W
On some computers, we say its word is 16 bits or 32
bits or 64 bits.
d
210B = 1024 Bytes = 1KB, 220B = 210KB = 1MB
230B = 220KB = 210MB = 1GB
Generally Chip Select (CS) is active low. 240B = 230KB = 220MB = 210GB = 1TB
Array of RAM Blocks

Seen in CO
Expansion by data: new d / old d
a a
Expansion by Address: A = (new 2 / old 2 )
Decoder needed: log2 (A), A decoder, if A is 2, we will just need an inverter.
Example: We have 128K * 8, we want 1M*16
Old a = log2 (128K) = log2 (217) = 17 lines, New a = log2 (1M) = log2 (220) = 20 lines
Expansion by data: new d / old d = 16/8 = 2 Blocks
a a
Expansion by Address: A = (new 2 / old 2 ) = (1M / 128K ) = (220 / 217 ) = (23 ) = 8 Blocks
So 2*8 = 16 Blocks are needed

Decoder needed: log2 (8), 8 = 3 to 8 Decoder
16
20 17 To all Blocks
0 To cs, 2 by 2
3, 8 DEC
E
CS
R/W to All Memory Blocks
As I have a larger memory expansion, the decoder size will become large. A larger decoder has many gates in
it.
A good way called coincident selection can be used to split the decoder size in two and have instead of N, 2 N
decoder, two N/2,2N/2 decoders, row and column decoders. To be selected, a memory block must have both
its row and column decoders enabled. We associate a gate from each 2 outputs of the decoders to be
connected to the RAM row chip Selects. If all are active lows, we use OR gates
Note that each RAM memory block is designed internally with coincident selection decoders to save space and
power dissipation.
Theoretical Example: Construct a 1GB* 8 from 2MB*8
1GB/2MB = 230/221 = 29
So a 9, 29 decoder is needed, this decoder contains at least 29 = 512 AND Gates with 9 inputs each.
Using one 4,16 and 5,32 decoders instead will be much more efficient. You have just 48 AND Gates and also
with lesser number of inputs.
Imagine for larger expansion how much you will save.
Practical Example: Construct a 256KB* 8 from 16K*8
256K/16K = 16. A 4,16 decoder is needed. 256K has 18 bits address, 16K has 14 bits address
Splitting into two 2,4 decoders, we will have 16 connections with 16 2 inputs OR Gates
18 14 to all blocks to CS
4
1
2, 4 DEC
0 1 2 3
First Second
2, 4 DEC
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
1 1 1 0
1 1 1 1
Exercise: Complete the following Drawing.

ROM (Seen in CO.)
Read-only memory (ROM) is an integrated circuit programmed with specific data when it is manufactured.
ROM chips are used not only in computers, but in most other electronic items as well.
There are five basic ROM types:

 ROM
 PROM
 EPROM
 EEPROM
 Flash memory
Each type has unique characteristics, but they are all types of memory with two things in common:
 Data stored in these chips is nonvolatile -- it is not lost when power is removed.
 Data stored in these chips is either unchangeable or requires a special operation to change (unlike
RAM, which can be changed as easily as it is read).
This means that removing the power source from the chip will not cause it to lose any data.
2,4
Decoder
ROM Properties:
The ROM basic structure contains a decoder and an Array of gates to collect Outputs
The ROM may be Active High (AH) or Active Low (AL) (Decoder AH or AL then Array of OR, or Nand Gates)
The ROM has a size which is equal to 2#inputs * Outputs. In the above example, I have a 4*8 ROM.
The ROM has a truth table which indicates the 0 and 1 bits values for each row and column.
ROM Applications:
There are two major ROM Applications:
 In Computers (forming the BIOS, Basic Input Output System, primary role for booting), or computer
based applications (modern cameras, modern cars, planes…)
 They can be used to generate mathematical functions (CO, squarer, comparator…)
Standard ROM
ROM chips are fundamentally different from RAM chips. While RAM uses transistors (or FFs) to turn on or off
access to a capacitor to set or reset a bit value, ROM uses a diode to connect the lines if the value is 1. If the
value is 0, then the lines are not connected at all.
A diode normally allows current to flow in only one direction and has a certain threshold, known as the
forward breakover, that determines how much current is required before the diode will pass it on which is
approximately 0.6 volts. If the diode is present, it is a 1, otherwise it is a 0.
As you can see, the way a ROM chip works necessitates the programming of perfect and complete data when
the chip is created. You cannot reprogram or rewrite a standard ROM chip. If it is incorrect, or the data needs
to be updated, you have to throw it away and start over. Creating the original template for a ROM chip is
often a laborious process full of trial and error. But the benefits of ROM chips outweigh the drawbacks. Once
the template is completed, the actual chips can cost as little as a few cents each. They use very little power,
are extremely reliable and, in the case of most small electronic devices, contain all the necessary programming
to control the device.
PROM
Creating ROM chips totally from scratch is time-consuming and very expensive in small quantities. For this
reason, mainly, developers created a type of ROM known as programmable read-only memory (PROM). Blank
PROM chips can be bought inexpensively and coded by anyone with a special tool called a programmer.
PROM chips (see above) have a grid of columns and rows just as ordinary ROMs do. The difference is that
every intersection of a column and row in a PROM chip has a fuse connecting them. A charge sent through a
column will pass through the fuse in a cell to a grounded row indicating a value of 1. Since all the cells have a
fuse, the initial (blank) state of a PROM chip is all 1s. To change the value of a cell to 0, you use a programmer
to send a specific amount of current to the cell. The higher voltage breaks the connection between the column
and row by burning out the fuse. This process is known as burning the PROM.
PROMs can only be programmed once. They are more fragile than ROMs. Some static electricity can easily
cause fuses in the PROM to burn out, changing essential bits from 1 to 0. But blank PROMs are inexpensive
and are great for prototyping the data for a ROM before committing to the costly ROM fabrication process.
EPROM
Working with ROMs and PROMs can be a wasteful business. Even though they are inexpensive per chip, the
cost can add up over time. Erasable programmable read-only memory (EPROM) addresses this issue. EPROM
chips can be rewritten many times. Erasing an EPROM requires a special tool that emits a certain frequency
of ultraviolet (UV) light. EPROMs are configured using an EPROM programmer that provides voltage at
specified levels depending on the type of EPROM used. To reprogram an EPROM, I have to remove it from
where it is, place it on the EPROM programmer, erase it all, and then rewrite it.
EEPROM
Electrically erasable programmable read-only memory (EEPROM) is based on a similar semiconductor
structure to EPROM, but allows its entire contents (or selected banks, advantage on EPROM) to be electrically
erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3
player, etc.).
Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified one bit at a time.
Writing is a very slow process and again needs higher voltage (usually around 12 V) than is used for read
access. EAROMs are intended for applications that require infrequent and only partial rewriting.
Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory can be erased
and rewritten faster than ordinary EEPROM, and newer designs feature very high speed. Modern flash makes
efficient use of silicon chip area, resulting in individual ICs with a capacity as high as 32 GB as of 2007; this
feature, along with its endurance and physical durability, has allowed modern flash to be used frequently for
backups. Flash memory is sometimes called flash ROM or flash EEPROM and now it is called USB flash drives.
USB Flash Drive

A USB flash drive is a data storage device that includes flash memory with an integrated Universal Serial
Bus (USB) interface. USB flash drives are typically removable and rewritable, and physically much small Most
weigh less than 30 grams. As of January 2013, drives of up to 512 GB were available. A one (TB) drive was
unveiled at the 2013 Consumer Electronics Show and became available later that year. Storage capacities as
large as 2 TB are planned, with steady improvements in size and price per capacity expected.
CD ROM: Compact Disc ROM

The CD-ROM is a compact disc that is only readable. In other words, you can only write to the disk once. It was
used for decades to supply ready-made software like OS, Applications and games. Audio CD is used for music
only and has a different format. Some CD Drives read both formats. It has been updated to a CD-RW is a disc
that a CD-ROM may write and delete files from. To some extent, it belongs to the ROM family even though
some may argue it is not a ROM due to its optical structure (light, mirror). Now it is becoming obsolete.
DVD
Digital Versatile Disc or Digital Video Disc, a type of optical disk technology similar to the CD-ROM. A DVD
holds a minimum of 4.7GBof data, enough for a full-length movie. DVDs are commonly used as a medium for
digital representation of movies and other multimedia presentations that combine sound with graphics.
The DVD specification supports disks with capacities of from 4.7GB to 17GB and access rates of 600KBps to
1.3 MBps. One of the best features of DVD drives is that they are backward-compatible with CD-ROMs,
meaning they can play old CD-ROMs.
Computer Bus Architecture
The BUS connects all computer parts together. They are three sets of parallel lines
The three buses are the address bus, the data bus, and the control bus.
The data bus consists of 8, 16, or 32, or 64 parallel signal lines. The data bus lines are bidirectional. Many
devices in a system will have their outputs connected to the data bus, but only one device at a time will have
its outputs enabled. Any device connected on the data bus must have three-state outputs (1, 0, or high
resistance, not connected) so that its outputs can be disabled when it is not being used to put data on the bus.
The address bus consists of 16, 20, 24, or 32 parallel signal lines. On these lines the CPU sends out the address
of the memory location that is to be written to or read from. The number of memory locations that the CPU
can address is determined by the number of address lines. If the CPU has N address lines, then it can directly
address 2N memory locations. When the CPU reads data from or writes data to a port, it sends the port
address out on the address bus.
The control bus consists of many parallel signal lines depending on the computer complexity. The CPU sends
out signals on the control bus to enable the outputs of addressed memory devices or port devices. Typical
control bus signals are Memory Read, Memory Write, Keyboard Read, and Printer Write. To read a byte of
data from a memory location, the CPU sends out the memory address of the desired byte on the address bus
and then sends out a Memory Read signal on the control bus. The Memory Read signal enables the addressed
memory device to output a data word onto the data bus. The data word from memory travels along the data
bus to the CPU.
Today all computers utilize two types of buses, an internal or local bus and an external bus. An internal bus
enables a communication between internal components such as a computer video card and memory (e.g. ISA,
EISA, PCI, AGP, etc.) and an external bus is capable of communicating with external components such as a SCSI
bus, USB, etc.
A computer or devices bus speed or throughput is always measured in bits per second or megabytes per
second.
The bus is not only cable connection but also hardware (bus architecture), protocol, software, and bus
controller, including decoders and MUXs. In general a bus system will multiplex k registers of n bits each to
produce an n-line common bus. Number of MUXs needed will be n, equal to the number of bits in each
register. Each bit in the register will pass through a MUX output before going to the bus. The MUX size is k,1.
The decoder uses 3-state buffers to deliver data to the BUS. An n,2n decoder will select one out of n bits of
data to pass to a bus single line. Decoders use a buffer that has an enable input. If enabled, data will pass
otherwise it is disabled.
Example of BUS Multiplexing:

If I have 8 registers with 16 bits each, I have k=8, n=16. My bus is 16 bits, I have 16 MUXs of size 8,1 each. I
have 3 selection bits working in parallel to select one register of 8. Total MUX size: 128,16; 3 selection lines
Example of a Decoder with 3-state buffers:
A0 Bus line for bit 0
B0
C0
D0
Select
2, 4 Dec
Enable
The above decoder will ensure that only one data bit ( 1 out of 4) will pass to the bus, if it is disabled, no data
will pass. Other connected circuit to this bus line may let its data to pass. The Enable bit of this decoder is one
example of a control bus line
Internal BUSES
ISA BUS
Introduced by IBM, ISA or Industry Standard Architecture was originally an 8-bit bus that was later expanded
to a 16-bit bus in 1984. It was widely standard for two decades and was also known as an internal and external
bus.
In 1993, Intel and Microsoft introduced a PnP (Plug and Play) ISA bus that allowed the computer to
automatically detect and setup computer ISA peripherals such as a modem or sound card.
Using the PnP technology an end-user would have the capability of connecting a device and not having to
configure the device using jumpers (to open or shortcut two extremities) or dipswitches (two settings).
Many manufacturers are trying to eliminate the usage of the ISA slots; however for backwards compatibility
you may find 1 or 2 ISA slots with additional newer technology slots.
An ISA Computer main board contained 8 slots with two usually used for
1. VGA display controller adapter
2. Multi I/O Port adapter with
- 2 serial ports (mouse…)
- a parallel port (printer)
- floppy disk drive interface (1.44 MB)
- IDE (Integrated Drive Electronics) interface for hard disk drives, and CDROM
- a Game Port (joystick)
The other slots were initially empty; the user may add any additional card like a network card.
MCA BUS & EISA BUS

Short for Micro Channel Architecture, MCA was introduced by IBM in 1987, The MCA bus offered several
additional features over the ISA such as a 32-bit bus. It never became widely used.
Short for Extended Industry Standard Architecture, EISA was announced September of 1988. EISA is a
computer bus designed by 9 competitors to compete with IBM's MCA BUS. It provided 32-bit slots. It never
became widely used.
PCI BUS
Introduced by Intel in 1992, PCI is short for Peripheral Component Interconnect and is a 32-bit computer bus
that is also available as a 64-bit bus today. The PCI bus is the most commonly used and found bus in
computers today. Mini PCI is a new standard developed by leading notebook manufactures. This technology
could allow manufactures to lower their price as the motherboards would be simpler to design. PCI-X is a high
performance bus that is designed to meet the increased I/O demands of technologies such as Fiber Channel,
Gigabit Ethernet and Ultra3 SCSI.
PCI-X capabilities include:
1. Up to 133 MHz bus speed
2. 64-Bit bandwidth with 1GB/sec throughput (compare with SDRAM bandwidth before)
Furthermore, the PCI Bus has taken over as the dominant high-performance local bus due to its openness,
performance, and support within the industry.
External Buses
AGP Bus
Introduced by Intel in 1997, AGP or Advanced Graphic Port is a 32-bit bus designed for the high demands of 3-
D graphics. AGP bus began to show out with Pentium II machines. It is used with PCI.
PCMCIA Bus
The Personal Computer Memory Card Industry Association (PCMCIA) was founded to provide a standard bus
for laptop computers. So it is basically used in the small computers.
SCSI Bus
Small Computer System Interface (SCSI) is a parallel interface standard used by Apple Macintosh computers,
PC's and Unix systems for attaching peripheral devices to a computer.
Universal Serial Bus (USB)

Universal Serial Bus is a new external bus developed by Intel, Compaq, DEC, IBM, Microsoft, NEC and Northern
Telecom and released to the public in 1996 with the Intel 430HX Triton II Mother Board.
This is an external bus standard that supports data transfer rates of 12 Mbps. A single USB port can be used to
connect up to 127 peripheral devices, such as mice, modems, and keyboards. The USB also supports hot
plugging/insertion (ability to connect a device without turning the PC off) and plug and play (pnp: You connect
a device and start using it without configuration).
We have two versions of USB:
USB 1X
First released in 1996, the original USB 1.0 standard offered data rates of 1.5 Mbps. The USB 1.1 standard
followed with two data rates: 12 Mbps for devices such as disk drives that need high-speed throughput and
1.5 Mbps for devices such as joysticks that need much less bandwidth.
USB 2X
In 2002 a newer specification USB 2.0, also called Hi-Speed USB 2.0, was introduced. It increased the data
transfer rate for PC to USB device to 480 Mbps, which is 40 times faster than the USB 1.1 specification. With
the increased bandwidth, high throughput peripherals such as digital cameras, CD burners and video
equipment could now be connected with USB. USB 3X is under development.
ATA BUS
ATA - Advanced Technology Attachment, (also SATA, S for Serial, PATA, P for Parallel) was also a widely used
standard to connect external devices. It is constantly being reviewed. It was an improvement of the standard
ISA and is widely used nowadays. Both SATA and USB can coexist. (Make a research)
Other Buses, external connectors

HDMI (High-Definition Multimedia Interface) is a compact audio/video interface for transferring data from an
HDMI-compliant source device, such as a display controller, to a compatible computer monitor, video
projector, digital television, or digital audio device. HDMI is a digital replacement for existing analog
video standards. It is available on most modern computers and laptops, not truly a BUS.
Registers
Seen in CO.
Registers are storage circuits composed of generally of D FFs, they are part of the CPU. They are used to store
information for different operations and computations. When talking about a CPU we specify its registers,
their names, numbers, jobs, number of bits… More to say later on.
Control Unit
The Control Unit is the circuitry that controls the flow of data through the processor, and coordinates the
activities of the other units within it. In a way, it is the "brain within the brain", as it controls what happens
inside the processor, which in turn controls the rest of the computer. The Control Unit receives external
instructions or commands which it converts into a sequence of control signals that it applies to the bus to
implement a sequence of register-transfer level operations. It supervises the transfer of information among
the registers and Memory, and instructs the ALU which operation to perform.
The Control Unit (CU) is generally a sizable collection of complex digital circuitry interconnecting and
controlling the many execution units contained within a CPU. The CU accepts an instruction stored in Memory,
decode it into individual sequential steps (fetching addresses/data from registers/memory, managing
execution [i.e. data sent to the ALU or I/O], and storing the resulting data back into registers/memory). It
controls and coordinates the CPU’s interworks. These detailed steps from the CU dictate which of the
numerous CPU’s interconnecting hardware control signals to enable/disable or which CPU units are
selected/de-selected and the unit’s proper order of execution as required by the instruction’s operation. CU
will make use of the Control Bus there. It manages the translation of instructions (but not the data containing
portion) into several micro-instructions at the machine level. There exist two different CU structures,
hardwired and microprogrammed ones.
The Hardwired are implemented through use of sequential logic units, featuring a finite number of gates that
can generate specific results based on the instructions that are used. Hardwired control units are generally
faster than microprogrammed ones, but has become less popular as computers have evolved. (Ones has been
seen in CO). The Microprogrammed control units use Microprograms or Control Words that are a sequence of
micro instructions stored in special control memory.
Hardwired Control Unit: (Seen in CO Course)
This is a hardwired CU composed of Gates,

Decoders, MUXs that generate Control Lines.
We have at the right the generated input

Controls for AR.
Microprogrammed Control Unit:
We will study it through an Example. Suppose a control word with 14 bits. I want to study the micro
operations (mics) generated by the ALSU. A maximum of 3 registers may be involved in an operation.
Explanation:
3 bits: Sel A 3 bits: Sel B 3 bits: Sel D

5 bits: Operation ALSU
Sel A, first Operand Register, Sel B, Second Operand Register, Sel D, Result Register
000 not applicable, otherwise its number from 1 to 7
Operation ALSU, the operation table code happening in ALSU

The following operations are handled by the ALSU:
Operation Bits Explanation Symbol

00000 Transfer A TSF A
00001 Increment A INC A
00010 Add: A + B ADD
00101 Subtract A – B SUB
00110 Decrement A DEC A
01000 AND A.B AND
01010 OR A ν B OR
01100 XOR A θ B XOR
01110 NOT A’ COM A
10000 Shift Right A SHR A
11000 Shift Left A SHL A
More Operations can be added. Up to 32 different ALSU mics for this structure. Imagine a larger structure.
Example of Micro operations:
R1  R2 – R3 it is: 010 011 001 00101
R7  R1 it is: 001 000 111 00000 (000 no second Operand register, 00000 for TSF)
R4  SHL R4 it is: 100 000 100 11000 (D and A are the same, no B)
R3  R3 ν R5 it is: 011 101 011 01010 (D and A are the same, no B)
R6  0 it is: 110 110 110 01100 (Xoring with itself gives 0, this is one way to clear a register)
R2  R2’ it is: 010 000 010 01110
R1  R5 – 1 it is: 101 000 001 00110 (Decrement a register and store result in another one)
It is apparent from these examples that many other mics can be generated in the CPU. The most efficient way
to generate control words like the above ALSU mics is to store them in a memory unit referred as a control
memory. By reading consecutive control words from memory, it is possible to initiate the desired sequence of
mics for the CPU.
The Stack
A useful feature that is included in most CPU is a stack or a last in, first out (LIFO) list. It is a storage device that
stores information where the last inserted item (pushed) will be the first removed or retrieved one (popped).
Think of a stack of plates. In computers, it is a reserved memory unit where a special register, the stack pointer
(SP) holds the address of the stack, the last item pushed: the top of the stack. When pushing and popping, the
SP is decremented (with push) or incremented (with pop). The stack is initially empty (empty flag = 1, no
popping is possible), and it has a max size to avoid overrun the memory. If max size is reached (full flag = 1), no
more pushing. A major application of stacks in computers is to save register values upon subroutine call.
Example of a Memory Unit with a reserved stack

0000
Not Assigned 4 Hex Digits memory:
Compute the total memory and the memory blocks
PC 3000 assigned for each part.
We don’t care about #bits / location
Program
(Instructions) 4*4 = 16 bits address.
216 = 65536 = 64K
5000
AR Not assigned part 1: 300016 = 12288 locations (12K)
Data Not assigned part 2: (D-8 =5) 500016 = 20480 locations
(Operands) (20K)
8000 Program Instructions: 200016 = 8192 locations (8K)

Not Assigned
D000 Data Operands: 300016 = 12288 locations (12K)
Stack
Stack (F+1 – D) = 300016 = 12288 locations (12K)
The PC points to the instruction to fetch, decode then

SP to run.
The AR points to the operand we have to get from
memory.
FFFF The SP points to the top of the stack
First Element of stack at address FFFF, last element of
stack at D000
When popping, we have the following mics happening:

If (empty=1) return (no popping, stack empty)
DR  M[SP] (DR, any Data Register, or AC)
SP  SP + 1
If (SP=0000) empty  1 (because first element was on FFFF: FFFF + 1 = 0000)
Else Full 0
When pushing, we have the following mics happening:

If (full=1) return (no pushing, stack full)
SP  SP - 1
M[SP]  DR
If (SP=D000) full  1 (no more room, D000 was the last location)
Else empty 0
Another Stack Application: Evaluating Expressions: The Postfix Notation
1- Consider the expression F = A*B + C*D
Written normally, this is the infix notation.

Placing the operator before Operands, this is the Prefix Notation.
Placing the operator After Operands, this is the Postfix Notation. This is how a computer evaluates
mathematical expressions using its internal stack, and taking into consideration operators’ precedence.
() first, then */, then +-.
In PostFix, F can be rewritten: F = AB*CD*+
2- Consider F2 = (A+B) * (C* (D+E) + F)

PostFix: AB+DE+C*F+*
3- Consider F3 = A + B * (C*D + E* (F+G))

PostFix: ABCD*EFG+*+*+
4- Consider F4 = (A* (B +C* (D+E))) / (F*(G+H))

PostFix: ABCDE+*+*FGH+*/
Stack Idea:
When converting from Infix to PostFix, you have to add the operand variable, then you add the operator when
you are evaluating two values, either directly or a current result with a previous one.
We have an operand, we push it.

We have an operand, we pop 2 last values, we execute the operation using the operator I have and then I
push the result.
We repeat this procedure until we finish. The stack will just contain the final result at the end (to pop).
Example: (3*11) + (15*6) = 3 11 * 15 6 * +
6
Stack content
11 15 15 90
3 3 33 33 33 33 123
3 11 * 15 6 * + Value being processed
Reverse Work: Converting from PostFix to Infix:
5- Consider F5 = A B C D E + * – /
D+E then C*(D+E) then B – C*(D+E) then A / (B – C*(D+E))
6- Consider F6 = A B C * / D – E F / +
B*C then A / (B*C) then A / (B*C) – D then E / F then A / (B*C) – D + E / F
Instruction Formats:
A computer will usually have a variety of instruction code formats, which are depicted in a rectangular box
symbolizing the bits of the instruction. These bits are divided into groups called fields.
In CO, we have seen an instruction format:
16 bits: 1 indirect bit, 3 bits opcode, 12 bits Address or type.
All operations were carried in AC. For example ADD X: AC  AC + M[X]
This Format is known as a Single Accumulator Organization Instruction Format (Small Computer)
In a more general Register Organization, all registers have same importance, meaning much more hardware
must be added.
ADD R1, R2, R3 (R1  R2 + R3)
ADD R1, R2 (R1  R1 + R2)
ADD R1, X (R1  R1 + M[X])
ADD X, R1, R2 (M[X]  R1 + R2)
If a CPU implements stack operations (most of the cases, it is called stack organized CPU) , we have two
general instructions Push X, and POP.
Push X will push the memory word at address X into the stack.
POP will retrieve the top value in the stack and store it in an assigned register.
The SP will be updated automatically.
Saying ADD in a stack organized CPU means: pop 2 values from stack, add them then push answer back.
So we have generally three instruction Formats CPUs. Most of the CPUs combine the three features like Intel
8080, the father of Pentium, which has 7 CPU Registers, one of them is the AC, also it has a stack pointer
register and an assigned memory stack.
When writing Assembly instructions, we must take into account the number of registers or addresses an
instruction can have. Sometimes one or two or all of them can be available.
Evaluation Example:
Compute X  (A+B) * (C+D) using 0, 1, 2, 3 address instructions.
A, B, C, D are memory addresses, meaning A is M[A]: meaning we want to Add M[A] and M[B] then multiply
by M*C++M*D+ and store result in M*X+. The point is that I don’t work with direct address but content.
Use the following Assembly Symbols: ADD, SUB, MUL, DIV, MOV, LOAD, and STORE
3 Address Instructions:
ADD R1, A, B R1  M[A] + M[B]
ADD R2, C, D
MUL X, R1, R2 M[X]  R1 * R2
The advantage is short programs.

The disadvantage is that I need instructions having many bits to specify 3 addresses  complex hardware
2 Address Instructions:
MOV R1, A
ADD R1, B The point is that I cannot add directly 2 memory locations to a register
MOV R2, C
ADD R2, D
MUL R1, R2
MOV X, R1
This type of instruction is the most common in commercial computers
One Address Instructions:

It uses an implied AC for all data manipulation (Assembly like that of CO)
LOAD A
ADD B
STORE T
LOAD C
ADD D
MUL T
STORE X
For short computations, this one is the most efficient in terms of hardware and number of instructions. For
more complex computations, 2 Address Instructions is better.
Zero Address Instructions:

Here we are dealing with a stack. Pop and Push have an address field to specify the operand that
communicates with the stack. All other instructions do not have, they just use the stack
PUSH A
PUSH B
ADD M[A] are M[B] and result pushed automatically to the stack
PUSH C
PUSH D
ADD M[C] are M[D] and result pushed automatically to the stack
MUL We have two addition results in the stack, we multiply them, then we push
POP X Get the final result
Assembly Programming using a stack is not easy but efficient
Addressing Modes
The way the operands are chosen while executing an instruction depends on the addressing mode.
It specifies a rule for interpreting the address field of the instruction before the operand is actually referenced.
It gives program versatility to the user by providing such facilities as pointers to memory, counters for loops,
array indexing… The good assembly programmer will choose between different Addressing modes to achieve
best performance. Of course most of the times a specific assembly program task can be achieved in different
ways using different addressing modes.
In CO, we saw this Instruction Format:
1:I 3: Opcode 12: Address in MRI or Kind in Reg,I/O

I: Indirect Bit
MRI: Memory Reference Instruction
We saw also in CO that an Instruction Cycle is divided into 3 major phases:
1. Fetch the Instruction from Memory
2. Decode the Instruction
3. Execute the Instruction
PC holds the Address of the instruction (step 1). The decoding done in step 2 determines the operation to be
performed, the addressing mode of the instruction and the location of the operands.
The addressing mode may be specified with a distinct binary code.
Opcode Mode Address or Register number or …
The opcode, operation code, specifies the operation to perform (ADD, MOVE…)
The mode tells how to locate the operands needed for operation.
The Address will give an address if it is MRI, or a register number or the operand itself if immediate.
Mode Kinds:
Mode 0: zero address instruction, we are using a stack; the address will be that of the address value being
popped or pushed. If not push or pop, this address field is ignored.
Mode 1: we are using an AC, operation like what we saw in CO. This mode will accept direct and indirect
addressing, also Register and I/O Kinds. But working through AC.
Immediate Mode: the operand here is specified in the instruction itself, and is located in the address field.
Register Mode: The operand is in a register that is within the CPU. A particular register is selected from the
address field in the instruction.
Register Indirect Mode: the instruction specifies a register whose content gives the address of the operand in
memory.
Autoincerment / Autodecrement Mode: like register indirect but the register value is either incremented or
decremented directly after execution of the instruction to access contiguous memory locations. This is useful
for table of data: arrays.
Direct Access Mode (seen in CO): the address of the operand is equal to the address part of the instruction;
the difference with mode 1 is that the operand will come to any data register not AC.
Indirect Access Mode (seen in CO): the address of the operand is equal to the address of the address part of
the instruction; the difference with mode 1 is that the operand will come to any data register not AC.
Relative Address Mode: In this mode, the PC is added to the address part of the instruction. Useful for huge
amount of data to process, also useful in subroutine.
Indexed addressing mode: we have an index register (XR) which value is added to the address part of the
instruction to find the address of the operand.
Effective Address: after seeing all these modes, we define effective address to be the final address of the
operand.
Example for all addressing modes: (Numbers in Hex)
Suppose PC = 200, R1 = 400, XR = 100, M[200] = Load AC + mode, M[201] = B00, M[202] next instruction
Question: Find effective address of operand and what to load in AC.
Immediate Mode: AC will get what is directly after, 201 is the effective address, B00 is the operand
Direct Mode: AC  M[B00]
Indirect Mode: AC  M[M[B00]]
Relative Address Mode: effective address is: B00 + 202 = D02, AC  M[D02]
202 and not 201 because PC is incremented two times to point to next instruction
Indexed Addressing Mode: effective address: B00 + XR = B00 + 100 = C00, AC  M[C00]
Register Mode: AC  R1, no address here, AC  400
Register Indirect Mode: effective address 400, AC  M[400]
Number Representations:
1- Whole Numbers: Integers (Review from CO)
For n bits Register
Unsigned goes from 0  2n – 1
Signed goes from –2n-1 to 2n-1 – 1
Examples:
Unsigned Short int 16 bits: 0  216 – 1, 0 to 65535 (largest unsigned short)
Increase 1, you go back to 0
Unsigned Int 32 bits: 0  232 – 1, 0 to 4,294,967,295 (largest unsigned int)
In signed, we have a sign bit which is the most significant bit (MSB) in the register.
If 0, stored number is positive. If 1, stored number is negative and already in two’s complement.
By default, a number is signed.
Examples:
Short int 16 bits: –215  215 – 1, –32768 to 32767
Negative has one more element due to the Zero counted with positives.
Int (32 bits) from –231 to 231 – 1 (See Figure for examples and overflows)
Long (64 bits) (in C++ long long) from –263 to 263 – 1
2- Floating Point Computations

Used for real numbers
It has two parts: mantissa (fraction) and exponent (radix)
Mantissaexponent The mantissa < 1 always
This is called a scientific notation, all numbers must be converted to that first.
Both mantissa and exponent have a fixed number of bits, for example float in C++ has 24 bits for
mantissa and 8 bits for exponent, double has the double.
Examples:
+6132.789  +.6132789+04 M*10E
In binary 9.25 = 1001.01  .100101 * 2100
The floating number is said to be normalized if MSB in the mantissa is 1 (not 0)
Exact fractional values need many negative powers:
.5 .25 .125 .0625 .03125 .015625 .0078125 .00390625(2 -8)
0.45 = .011101 around (exactly 0.453125)
18.45 = 10010.011101 = .10010011101 * 2101
Negative power (number smaller than 1): I am choosing
.03125 + .0078125 = .0390625 in binary it is .0000101 = .101* 2 -100
18.45 = 10010.011101 = .10010011101 * 2101
Negative number: (mantissa negative)
Not included in Program
Computer Instructions:
 Most Computer Instructions can be classified into three categories:
 Data Transfer, moving data between Registers, or Registers and Memory
 Data Manipulation: ALSU Operation
 Program Control: decision making capabilities (comparing values, brunching…)
Most Computers have the same set of instructions, sometimes formulated differently.
Data Transfer Instructions:

LOAD LD LDA
STORE ST STA We may differences of writing between different Assembly
MOVE MOV LD, ST between Register and Memory
EXCHANGE XCH MOV between Registers
INPUT INP IN XCH: swap Register and Register or Register and Memory
OUTPUT OUT
PUSH
POP
These instructions are often associated with a variety of addressing modes.
Example: Addressing modes used with Load and AC (or any Ri)
Direct Addressing LD ADR AC  M[ADR]

Indirect Addressing LD @ADR AC  M[M[ADR]]
Relative Addressing LD $ADR AC  M[PC + ADR]
Immediate Operand LD #NBR AC  NBR
Index Addressing LD ADR(X) AC  M[ADR + XR]
Register Addressing LD R1 AC  R1
Register Indirect Addressing LD (R1) AC  M[R1]
Autoincrement LD (R1)+ AC  M[R1], R1++
In each mode, we can see a special character that implies it.

Same thing applies to ST and other instructions.
Data Manipulation Instructions

They perform operations on data.
They fall into 3 basic data types
Arithmetic
Logic, bit manipulation and shift. Seen in CO, performed in ALU or ALSU
Typical Arithmetic Instructions
Description and Mnemonic (abbreviated to Assembly Symbol, generally 2 to 4 letters, mostly 3)
Description Mnemonic Comments
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL MUL and DIV are available in
Divide DIV more advanced microprocessors
Add with Carry ADDC
Subtract with Borrow SUBB
Negate NEG A  –A (Two’s Complement)
We can have more specification for several instructions, for example ADD
ADDI Add 2 integers
ADDF Add 2 floating point numbers
ADDD Add 2 BCD numbers
Logical and Bit Manipulation instructions

Clear CLR CLR Or Arithmetic
Complement CMP CMP
Anding AND AND Pure Logic
Oring OR OR
Xoring XOR XOR
Clear Carry CLRC CLRC
Set Carry SETC SETC
Complement Carry CMPC CMPC
Enable Interrupt EI Interrupt is a bit FF
Disable Interrupt DI
Shift Instructions: mostly seen in CO

Logical Shift Right SHR Insert a 0
Logical Shift Left SHL
Arithmetic Shift Right SHRA or ASHR Sign bit at MSB
Arithmetic Shift Left SHLA or ASHL
Rotate Right ROR Circular Shift
Rotate Left ROL
Rotate Right + Carry RORC Circular Shift through
Rotate Left + Carry ROLC Carry (CO)
Program Control
Instructions are stored in successive memory locations. A program control instruction may change the PC, thus
we will have a break in the sequence of instruction execution. This is a capability for branching to different
program segments.
Branch BR or BRA BR and JMP are the same but generally are
Jump JMP used with different addressing modes
Skip SKP
Call CALL Used with Subroutines (in CO: BSA)
Return RET
Compare CMP Compare 2 values, result affect status bits
Test TST Test a single bit (status register)
Status bits: we have four standard status bits which after an arithmetic operation, they may change
 Carry bit
 Sign bit These form the status register, we may have more bits
 Zero bit
 Overflow bit
You can see above an 8 bits ALU with the 4 bits status register. The bits are set or cleared as a result of an ALU
operation.
C8 is the end Carry of the operation.
F7 is the MSB of the result, 1 negative, 0 positive
Check for zero output above block is an 8 inputs NOR Gate, if all zeros, output is 1: Z1
The Overflow V bit is set to 1 if the last two carries are different, meaning I have a change of sign.
The status bits can be checked after an ALU Operation to determine certain relationships that exist between
the values of A and B.
The table on next page lists the most common branch instructions. If the stated condition is true, program
control is transferred to the address specified by the instruction. If not, control continues with the instruction
that follows.
These conditional instructions can be associated also with the JMP, SKP, CALL, and RET of program control
instructions. Imagine how many instructions we may have, just only during comparison.
Numerical Example on 8 bits:
Let A = 11110000, B = 00010100
1- For Unsigned: A is 240, B is 20
A – B will give 1 11011100,  C=1, S=1, V=0, Z=0
The compare instruction CMP updates the status bits as shown above.
The instructions that will cause a branch after this comparison are BHI, BHE, BNE.
2- For signed, A is negative – (00010000) = –16

So we add both numbers then we negate, getting –36
The instructions that will cause a branch after this comparison are BLT, BLE, BNE.
Subroutines
Recall BSA in CO
It is a very important in Assembly Programming.
It is a branch while saving the return address. At the end of the subroutine, we have a branch to the saved
address.
Different Computers use different locations to save return address. It may be a fixed location in memory, in
memory stack or in a register.
The best way is to use a stack, useful if we have nested subroutine. In this way, the return is always to the
program that last called a subroutine.
Code mics:
SP  SP – 1
M[SP]  PC (save return address)
PC  effective address (transfer control to the subroutine)
Then to return
PC  M[SP] (Pop stack and transfer to PC)
SP  SP + 1
It should be noted that recursive subroutine will work correctly only through a stack, you just continue
pushing returning addresses.
Program Interrupt
It refers to the transfer of program control from a currently running program to another service program as a
result of an external or internal generated request. Control returns to the original program after the service
program is executed.
The interrupt procedure is similar to a subroutine call except for some variations:
 It is initiated by internal or external signals like I/O rather than an instruction.
 Address of interrupt Service Program is determined by hardware, not address field of the instruction.
 Interrupt procedure usually stores all the information in Registers rather than only the PC.
The state of the CPU when the interrupt happens is determined from all the registers including PC and certain
status bits forming Program Status Word (PSW), which is stored in a specific register and has several functions:
 It includes status bits from last ALU Operation
 It specifies what interrupts are allowed to occur.
 It specifies if CPU is running in user or supervisor mode.
In Supervisor or Kernel mode, the CPU can run advanced instructions or privileged instructions in relation with
the Operating System (OS), these are not accessible in the normal or user mode. Switching between the two
modes is done through an interrupt.
The major advantage of having the supervisor mode is to protect the OS. Without this mechanism there
would be no "security" in an OS, as the most obscure piece of code could simply access OS (Kernel) memory
for viewing, deleting or changing.
The CPU does not respond to an interrupt until the end of the instruction. Before fetching a new one, it checks
its PSW for any pending interrupt.
The CPU may disable processing interrupts while in supervisor mode; it may be processing more privileged
instructions. In normal user mode, interrupts are always enabled. (Recall IEN and R in CO)
More than one interrupt may be received simultaneously or an interrupt may interrupt another interrupt. The
CPU has priorities assigned to it and will react according to that.
The last instruction in an interrupt service program is a return (like subroutine), where the stack is popped for
the old Register values and PC, and the original state of CPU before interrupt is restored.
Type of Interrupts:
1. External: it comes from I/O, transfer of data, networking. Even power failure is detected few
milliseconds so that CPU has some time to save critical data, or stop any happening save or update to
avoid memory corruption or Hard Disk Damage.
2. Internal: it comes from illegal or erroneous use of an instruction or data. Also called traps, internal
interrupts are caused by register overflow, divide by zero, stack overflow… The trap determine the
corrective measure to be taken. Note we are not talking about what is happening with a student doing
this (like in C++), but rather during a running of a large program. (Office, AutoCAD, Game…)
3. Software: It is initiated by an instruction, like switching to and from user and supervisor mode. Many
assemblers offer a special instruction to interrupt the CPU and perform I/O (trap #15 in 68K)
Instruction Set Design:

Two major different Fields of Instruction Set Design exist: CISC and RISC
Complex Instruction Set Computer (CISC) characteristics

 A large number of instructions
 Specialized Instructions that perform not frequent tasks
 A large Variety of addressing modes
 Variable length Instruction Formats
 Instructions that manipulate Operands in Memory
 Instructions may take several clock cycles to execute
The essential goal of CISC architecture is to attempt to provide a single machine instruction for each statement
that is written in high level language, thus making easier Software Design like Compilers at the expense of
more complex Hardware. One of the primary advantages of this system is that the compiler has to do very
little work to translate a high-level language statement into assembly. Because the length of the code will be
relatively short, very little RAM is required to store instructions. The emphasis is put on building complex
instructions directly into the hardware.
Examples of CISC instruction set architectures are PDP-11, VAX, Motorola 68k, and x86.
Reduced Instruction Set Computer (RISC) characteristics

 Relatively fewer instructions
 No Specialized Instructions, just general Ones
 Relatively fewer addressing modes
 Fixed length, easily decoded Instruction Formats
 Memory Address limited to Load and Store Instructions.
 Single cycle Instruction Execution
By simplifying the instructions and their formats, the control logic will be less complex; all hardware will be
less complex. We may need just more registers to use instead of memory as they are faster.
The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of
cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of
instructions per program.
To achieve one instruction per clock cycle, RISC microprocessors will overlap the fetch, decode and execute
phases of two, three instructions by using what is called Pipelining.
In Pipelining, I have several segments doing different work. In a segment an instruction is being fetched, in
another one, the previous instruction is being decoded and so on. Segments communicate between
themselves. Sometimes a segment has to wait for a result from a previous segment thus lowering pipeline
efficiency, but this is not always the case.
Well-known RISC families include DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR, Blackfin, Intel
i860 and i960, MIPS, Motorola 88000, PA-RISC, Power (including PowerPC), SuperH, and SPARC.
Hybrid Instruction Set Computer (HISC)

Most nowadays CPUs are HISC, meaning they are not fully CISC or RISC, taking some features from RISC like
pipelining while still having some complex instructions. Examples include the famous Intel Pentium 1 to 4, and
the multi-core Microprocessor.
Multi-core Microprocessor
A multi-core processor is a single computing component with two or more independent actual CPUs (called
"cores"), the multiple cores can run multiple instructions at the same time, increasing overall efficiency.
Processors were originally developed with only one core. Multi-core processors were developed in the early
2000s by Intel, AMD and ARM. Multicore processors may have two cores (dual-core CPUs, for example AMD
Phenom II X2 and Intel Core Duo), four cores (quad-core CPUs, for example AMD Phenom II X4,
Intel's i5 and i7 processors), six cores (hexa-core CPUs, for example AMD Phenom II X6 and Intel Core i7
Extreme Edition 980X), eight cores (octo-core CPUs, for example Intel Xeon E7-2820 and AMD FX-8350), ten
cores (for example, Intel Xeon E7-2850), or more.
A multi-core processor implements multiprocessing in a single physical package. Multi-core processors are
widely used across many application domains including general-purpose, embedded, network, digital signal
processing (DSP), and graphics.
The improvement in performance gained by the use of a multi-core processor depends very much on the
software algorithms used and their implementation. In particular, possible gains are limited by the fraction of
the software that can be run in parallel. . This is called parallel computing. For that, a high level language must
now support that. An example is multithreading, available in new C++ and Java.
Major Multicore Processor Companies

i.
If there was a single semiconductor chip maker the average consumer is aware of it would likely be Intel. It is
the premier chip maker for personal computers—companies such as Apple, Dell, HP, Samsung, and Sony have
product lines that depend on the processors that Intel produces. Intel's processors generally offer the best
performance for all-around usage. This has been especially the case the last several years with the
introduction and evolution of Intel's Core series product line. Currently, Intel's flagship consumer product line
consists of mobile and desktop-grade Core i3, Core i5 and Core i7 processors now in their second generation
(dubbed "Sandy Bridge"). The third and latest generation of these processors (dubbed "Ivy Bridge") began to
roll out for release late April 2012.
ii.
Though not considered the behemoth in the personal computing space as Intel, AMD (Advanced Micro
Devices) is a decisive runner-up—and arguably the only true competitor Intel has in this domain. After
spending much of the early to middle 2000's as being the performance and value leader with their Athlon 64
line of personal computing processors, AMD—unable to mimic this success in more recent years, has shifted
their focus towards both enthusiast and budget-oriented system configurations. As a result, AMD is
considered to be a viable alternative to Intel. Their current offerings are flanked by the Phenom series
processors and Fusion APU processors. The Fusion APU (AMD A-Series) is a relatively new platform (as of 2011
and ongoing) that attempts to merge high-end graphical capabilities on the same chip as the processor. This
means if your work or play requires a powerful graphics card, then AMD can potentially offer a cost effective
alternative.
iii.
The increased need for mobile productivity and entertainment has given rise to a relatively new class of
devices: smartphones and tablets. ARM (Advanced RISC Machines ) is well-known for the design of mobile,
power-efficient processor designs. In recent years it has seen its technology used in the products of many
prominent electronics companies. Apple's A4/A5/A5X, Nvidia's Tegra, Samsung's Exynos and Texas
Instruments' OMAP products all integrate ARM processors into what is known as a system-on-a-chip (SoC).
SoCs merge many of the essential components of a computer (such as the CPU, RAM, ROM etc.) on a single
chip which allows devices that utilize them to be lightweight and compact. These SoCs have gone on to be
implemented in blockbuster products such as Apple's iPhone and iPad or Samsung's series of Galaxy phones.
ARM's presence as the CPU and architecture of choice on many mobile devices cannot be understated as
estimates put their numbers in the billions.

Microprocessor CoursePDF

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Microprocessor CoursePDF

Încărcat de

Drepturi de autor:

Formate disponibile

Microprocessor Course

Computer Models. Many Exist, depending what you want to stress on

RAM and CPU timing:

RAM Kinds: SRAM and DRAM

Single Latency time: 4 * 5ns = 20ns

a: number of address lines

Array of RAM Blocks

Decoder needed: log2 (A), A decoder, if A is 2, we will just need an inverter.

Example: We have 128K * 8, we want 1M*16

Expansion by data: new d / old d = 16/8 = 2 Blocks

So 2*8 = 16 Blocks are needed

R/W to All Memory Blocks

Practical Example: Construct a 256KB* 8 from 16K*8

Exercise: Complete the following Drawing.

There are five basic ROM types:

USB Flash Drive

CD ROM: Compact Disc ROM

Example of BUS Multiplexing:

A0 Bus line for bit 0

MCA BUS & EISA BUS

Universal Serial Bus (USB)

Other Buses, external connectors

This is a hardwired CU composed of Gates,

We have at the right the generated input

3 bits: Sel A 3 bits: Sel B 3 bits: Sel D

000 not applicable, otherwise its number from 1 to 7

Operation ALSU, the operation table code happening in ALSU

Operation Bits Explanation Symbol

Example of Micro operations:

R1  R2 – R3 it is: 010 011 001 00101

R3  R3 ν R5 it is: 011 101 011 01010 (D and A are the same, no B)

R2  R2’ it is: 010 000 010 01110

Example of a Memory Unit with a reserved stack

8000 Program Instructions: 200016 = 8192 locations (8K)

The PC points to the instruction to fetch, decode then

When popping, we have the following mics happening:

When pushing, we have the following mics happening:

1- Consider the expression F = A*B + C*D

Written normally, this is the infix notation.

2- Consider F2 = (A+B) * (C* (D+E) + F)

3- Consider F3 = A + B * (C*D + E* (F+G))

4- Consider F4 = (A* (B +C* (D+E))) / (F*(G+H))

We have an operand, we push it.

Example: (3*11) + (15*6) = 3 11 * 15 6 * +

Reverse Work: Converting from PostFix to Infix:

The advantage is short programs.

This type of instruction is the most common in commercial computers

One Address Instructions:

Zero Address Instructions:

Assembly Programming using a stack is not easy but efficient

1:I 3: Opcode 12: Address in MRI or Kind in Reg,I/O

Opcode Mode Address or Register number or …

Question: Find effective address of operand and what to load in AC.

Direct Mode: AC  M[B00]

Indirect Mode: AC  M[M[B00]]

Register Mode: AC  R1, no address here, AC  400

Register Indirect Mode: effective address 400, AC  M[400]

2- Floating Point Computations

Data Transfer Instructions:

Direct Addressing LD ADR AC  M[ADR]

In each mode, we can see a special character that implies it.

Data Manipulation Instructions

Logical and Bit Manipulation instructions

Shift Instructions: mostly seen in CO

1- Consider the expression F = AB + CD

3- Consider F3 = A + B * (CD + E (F+G))

Example: (311) + (156) = 3 11 * 15 6 * +