Sunteți pe pagina 1din 153

EC6013-ADVANCED

MICROPROCESSOR AND
MICROCONTROLLER
Objectives
 Study the fundamentals of microprocessor
architecture .

 Learn the advanced features in microprocessors and


microcontrollers.

 Study the Architecture of Various microcontrollers.

2
Syllabus
 UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE –
PENTIUM 9
• CPU Architecture- Bus Operations – Pipelining – Branch prediction –
floating point unit- Operating Modes –Paging – Multitasking – Exception
and Interrupts – Instruction set – addressing modes – Programming the
Pentium processor.

• UNIT II-HIGH PERFORMANCE RISC ARCHITECTURE – ARM


9
• Arcon RISC Machine – Architectural Inheritance – Core &
Architectures - Registers – Pipeline - Interrupts – ARM organization -
ARM processor family – Co-processors - ARM instruction set- Thumb
Instruction set - Instruction cycle timings - The ARM Programmer‟s
model – ARM Development tools ARM Assembly Language
Programming - C programming – Optimizing ARM Assembly Code –
Optimized Primitives.
3
Syllabus
• UNIT III-ARM APPLICATION DEVELOPMENT (9)
• Introduction to DSP on ARM –FIR filter – IIR filter – Discrete fourier
transform – Exception handling – Interrupts – Interrupt handling schemes-
Firmware and bootloader – Embedded Operating systems – Integrated Development
Environment- STDIO Libraries – Peripheral Interface – Application of
ARMProcessor - Caches – Memory protection Units – Memory Management
units-Future ARM Technologies.

• UNIT IV - MOTOROLA 68HC11 MICROCONTROLLERS (9)


• Instruction set addressing modes – operating modes- Interrupt system- RTC-
Serial Communication Interface – A/D Converter PWM and UART.

• UNIT V - PIC MICROCONTROLLER (9)
• CPU Architecture – Instruction set – interrupts- Timers- I2C Interfacing –
UART- A/D Converter –PWM and introduction to C-Compilers. TOTAL: 45
PERIODS
4
Text Books

• [1] Andrew N.Sloss, Dominic Symes and Chris


Wright “ ARM System Developer‟s Guide :
Designing and Optimizing System Software” , First
edition, Morgan Kaufmann Publishers, 2004.

5
References
• 1.Steve Furber , “ARM System –On –Chip architecture”, Addision
Wesley, 2000.
• 2.Daniel Tabak , “Advanced Microprocessors”, Mc Graw Hill. Inc., 1995

• 3.James L. Antonakos , “ The Pentium Microprocessor”, Pearson


Education, 1997.
• 4.Gene .H.Miller, “Micro Computer Engineering”, Pearson Education ,
2003.

• 5.John .B.Peatman , “Design with PIC Microcontroller”, Prentice Hall,


1997.

• 6.James L.Antonakos, “An Introduction to the Intel family of


Microprocessors”, Pearson Education, 1999. 6
UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE – PENTIUM 9

Objective
 Study the Architecture of Pentium processor

 Programming the pentium Processor.

7
• Overview
 Introduction to Pentium
 Pentium Architecture
 Addressing Modes
 Instruction Set
 Assembly Language Programming
 Bus Operations
 Pipelining
 Branch Prediction
 Exception and Interrupts
 Floating point unit
 Operating Modes.
 Paging and Multitasking. 8
INTRODUCTION
MICROPROCESSOR
• A microprocessor is a computer processor which incorporates the 
functions of a computer's central processing unit (CPU) on a 
single integrated circuit (IC),or at most a few integrated circuits.

• Microprocessor might only include an arithmetic logic unit (ALU) and 
a control logic section. The ALU performs operations such as addition, 
subtraction, and operations such as AND or OR.
MICROPROCESSOR
MICROCONTROLLER
• A microcontroller (or MCU, short for microcontroller unit) is a 
small computer (SoC) on a single integrated circuit containing a 
processor core, memory, and programmable input/output peripherals.

• Microcontrollers are used in automatically controlled products and 
devices
BLOCK DIAGRAM
DIFFERENCE BETWEEN μP & μC
1. Microprocessor contains only a CPU. In contrast Microcontroller 
contains few other components apart from CPU, which includes 
RAM, ROM and other peripherals like ports, clock, timer, UART 
(Universal Asynchronous Receiver Transmitter), ADC (Analog to 
digital converter), DAC (Digital to analog converter), Drivers for LCD, 
etc.,
2. Microprocessor can be considered as just the processor, while 
microcontroller can be seen as a small computer which is 
embedded on a single IC (Eg. 8051).
So to summarize, we can state the difference between both as:

“Microprocessor is present inside a Microcontroller”.

This is valid to some extent because:

Microcontroller = Microprocessor + Few Extra components


Then,
ADVANCED
MICROPROCESSOR &
MICROCONTROLLER
MEANS?
With added features like 
• High memory capacity
• More number of I/O pins
• High performance
• More external interfacing options etc.,
Example for microprocessor:
        Intel Pentium, Pentium-I, Pentium-II…,i3,i5,i7,8085 & 8086
Example for microcontroller:
        8051,PIC, ARM, Arduino…
MICROPROCESSOR DEVELOPMENT CYCLE
INTEL MICROPROCESSOR DEVELOPMENT CYCLE
MICROCONTROLLER DEVELOPMENT CYCLE
Pentium is a brand used for a series of x86-compatible microprocessors  produced by Intel since 1993.

In its current form, Pentium processors are considered entry-level products that Intel rates as "two stars", meaning that they are 

above the low-end Atom and Celeron series but below the faster Core i3, i5 and i7.
• Pentium-branded processors
• P5 microarchitecture based
• Pentium
• P6 microarchitecture based
• Pentium Pro
• Pentium II
• Pentium III
• Netburst microarchitecture based
• Pentium 4
• Pentium D
• Pentium M microarchitecture based
• Pentium M
• Pentium Dual-Core
• Core microarchitecture based
• Pentium Dual-Core
• Pentium (2009)
CPU ARCHITECTURE
The two integer pipelines, the U pipeline and V pipeline are responsible 
for executing the 80x86 instructions.
The floating point unit is included on the chip to execute mathematical 
functions.
The Pentium communicates with the outside world Via 32 bit address bus 
and 64 bit data bus.
An 8KB instruction cache is used to provide quick access to frequently 
used instructions. When an instruction is not found in cache , it is read 
from the external data bus and copy paste into the instruction cache for 
future reference.
Branch target buffer and prefetch buffers: work together with instruction 
cache to fetch instruction as fast as possible.
Prefetch buffers maintains the copy of next 32 bytes of prefetched 
instruction code.
Branch prediction: Technique to maintain steady flow of instructions 
into pipeline.
To support branch prediction, the branch target buffer maintains a 
copy of instruction in a different parts of the program located at the 
address called branch address.
 Example:
                 CALL XYZ                 Branch target buffer stores the copy of the   
               
                                                  memory location
A separate 8KB data cache stores a copy of the most frequently 
accessed memory data.
The Pentium: A CISC Architecture
What is CISC?
• CISC stands for  Complex Instruction Set Computer
• CISC takes its name from the very large number of instructions 
(typically hundreds) and addressing modes.
History: CISC
• The first PC microprocessors developed were CISC chips, because all 
the instructions the processor could execute were built into the chip. 
• Memory was expensive in the early days of PCs, and CISC chips saved 
memory because their programming could be fed directly into the 
processor. 
History: CISC
• CISC chips were improved mainly by adding more instructions to 
the processor design. This also meant that programming 
changed with new CISC designs. CISC designs grew complex and 
somewhat bulky
Examples of CISC Processors
Examples of CISC processors are
•  VAX
• PDP-11
• Motorola 68000 family
• Intel x86/Pentium CPU’s
Advantages of CISC
• CISC has varying lengths to reduce wasted space in memory.
• Has developed a process to manage power which adjusts clock speed 
and voltage.
• Uses less instructions to perform similar instructions than RISC
Disadvantages of CISC
• CISC chips are relatively slow (compared to RISC chips) per instruction.
• CISC chips require many more transistors than comparable RISC 
designs .
• Harder to pipeline using CISC architecture.
• Expensive to produce.
RISC vs CISC
• RISC puts a greater burden on the software. Software needs to 
become more complex and Software developers need to write more 
lines of code to perform similar  tasks. 
• But by doing this RISC architecture takes the burden away form the 
hardware resulting in an increase in performance(mainly speed).
OPERATING MODES
Real mode and Protected mode
Real mode: The advanced microprocessors, including the Pentium, 
simply operate like 8086 with associated 1MB memory. Real mode is 
automatically selected upon power up. So Pentium boots up into DOS 
operating system in real mode.
Protected mode: The full 4 GB of memory is available to the processor, 
as are special privileged instruction and architectural goodies, including 
multitasking, virtual memory addressing, memory management and 
control over internal data and instruction cache. Writing program in 
protected mode needs special knowledge.
 
Software model of Pentium
Software model of Pentium
Pentium Microprocessor: Registers
• Registers
– Registers are in the CPU and are referred to by specific
names
– Data registers
 Hold data for an operation to be performed
 There are 4 data registers (EAX,EBX, ECX, EDX)
 All are 32 bit wide.
 Lower 16 bit registers are called AX,BX,CX,DX.
 May be Split up into halves of 8 bits each.
– Address registers
 Hold the address of an instruction or data element
 Segment registers (CS, DS, ES, SS,FS,GS)
 Pointer registers (ESP, EBP, EIP)
 Index registers (ESI, EDI)
– Status register
 Keeps the current status of the processor
38
 The status register is called the FLAG register
Data Registers: EAX,EBX, ECX,EDX
• Instructions execute faster if the data is in a
register.(E---Stands for Extended)
• Data Registers are general purpose registers but
they also perform special functions
• AX, BX, CX, DX are the 16 bit data registers.
• Low and High bytes of the data registers can be
accessed separately
– AH, BH, CH, DH are the high bytes
– AL, BL, CL, DL are the low bytes

8086 Architecture (continued…) 39


 AX
– Accumulator Register
– Used in Arithmetic, Logic and Data Transfer instructions
– Used in Multiplication and Division operations
– Used in I/O operations
 BX
– Base Register
– Also serves as an address register
– Used in array operations
– Used in Table Lookup operations (XLAT)
 CX
– Count register
– Used as a Loop Counter
– Used in shift and rotate operations
 DX
– Data register
– Used in Multiplication and Division
– Also used in I/O operations
8086 Architecture (continued…) 40
Pointer and Index Registers

 Contains the offset addresses of memory


locations
 Can also be used in Arithmetic and other
operations
 SP: Stack pointer
– Used with SS to access the stack segment
 BP: Base Pointer
– Primarily used to access data on the stack
– Can be used to access data in other segments

8086 Architecture (continued…) 41


Pointer and Index Registers

 SI: Source Index register


– is required for some string operations
– SI is associated with the DS in string operations.
 DI: Destination Index register
– is also required for some string operations.
– DI is associated with the ES in string operations.

• The SI and the DI registers may also be used to


access data stored in arrays

8086 Architecture (continued…) 42


Segment Registers - CS, DS, SS and ES
 CS: Code segment---Used during instruction fetches.
 DS:Data Segment---Used when reading or writing data.
 SS:stack Segment---During stack operations such as 
subroutine calls and returns.
 ES:Extra Segment---Used for anything the Programmer 
wishes.
 GS and FS:---Used for anything the Programmer wishes.

8086 Architecture (continued…) 43


Segment Registers - CS, DS, SS and ES
 Are Address registers
 Stores the memory addresses of instructions and data
 Memory Organization
– 20 bit address line addresses 1 MB of memory
– Each byte in memory has a 20 bit address
– Addresses are expressed as 5 hex digits from 00000 -
FFFFF
– Problem: 20 bit addresses are TOO BIG to fit in 16 bit
registers!
– Solution: Memory Segment
 A segment number is a 16 bit number
 Segment numbers range from 0000 to FFFF
 Block of 64K (65,536) (i.e 216)consecutive memory bytes
 8086
Within a segment, Architecture
a particular (continued…)
memory 44
location is specified with
Segmented memory addressing:
Absolute Address = Four bit left shifted16-bit segment value
added to a 16-bit offset
1 MB Memory Space

F0000
E0000
5000:FFFF
D0000
C0000
B0000
Starting
A0000
Address
90000
of each 5000:025
80000
segment 0
70000
60000
50000 5000:000
40000 0
SegAddr:Offset
30000
20000
10000
00000

8086 Architecture (continued…) 45


Physical Memory Address Generation
 The BIU has a dedicated adder for determining
Physical memory addresses
Offset Value or Effective address (16
bits)

Segment Register (16 bits) 0 0 0 0

Adder

Physical Address (20 Bits)

8086 Architecture (continued…) 46


Physical Memory Address Generation
• Logical Address is specified as Segment:Offset
• Physical address is obtained by shifting the segment
address 4 bits to the left and adding the offset address
• Thus the physical address of the logical address A4FB:4872
is
A4FB0  1010 0100 1111 1011 0000
+ 4872  0100 1000 0111 0010
A9822 1001 1001 1000 0010 0010

8086 Architecture (continued…) 47


Advantages of using Segment Registers
1. Even though addresses associated with the
instructions are 16 bits only, allows the memory
capacity to be 1MB
2. Permit a program and/or its data to be put into
different areas of memory each time the program is
executed.

8086 Architecture (continued…) 48


Priority level of current
task Flags

current task is nested Carry flag


Overflow flag Parity flag
Direction flag
Interrupt enable Auxiliary flag
Trap flag Zero flag
6 - status flags
3 - control flags Sign flag

49
Flags
 Flags:
• - 32 bit flag register.
• -Used only in Protected mode.

 Status or Conditional flags:


– These are set according to the results of the arithmetic or
logic operations.
– Need not be altered by the user.

 Control flags:
– Used to control some operations of the MPU.
– These flags are to be set by the user, in order to achieve
some specific purposes.
8086 Architecture (continued…) 50
Status or Conditional or Condition Code Flags
 CF (carry) Contains carry from leftmost bit
following arithmetic, also contains last bit from a
shift or rotate operation.
 PF (parity) Indicates the number of 1 bits that result
from an operation.(1=even)
 AF (auxiliary carry) Contains carry out of bit 3 into
bit 4 for specialized arithmetic (BCD).
 ZF (zero) Indicates when the result of arithmetic or
a comparison is zero. (1=yes)
 SF (sign) Contains the resulting sign of an
arithmetic operation (1=negative)
 OF (overflow) Indicates overflow of the leftmost bit
during arithmetic.8086 Architecture (continued…) 51
Control flags:
 DF (direction) Indicates left or right for moving or
comparing string data.

 IF (interrupt) Indicates whether external


interrupts are being processed or ignored.

 TF (trap) Permits operation of the processor in


single step mode.

8086 Architecture (continued…) 52


Example
 Assume that the previous instruction performed the
following addition,

0010 0011 0100 0101 SF= 0 ZF= 0 AF= 0


0011 0010 0001 1001
0101 0101 0101 1110 PF= 0 CF= 0 OF= 0

0101 0100 0011 1001 SF= 1 ZF= 0 AF= 1


0100 0101 0110 1010
1001 1001 0101 0011 PF= 1 CF= 0 OF= 0

8086 Architecture (continued…) 54


Addressing Modes

55
Addressing Modes
 Various methods used to access instruction operands is called
as Addressing Mode

 General Instruction Format


OPCODE Operand  Operand

 Operands may be contained in


 Registers,
 Memory
 I/O ports.
 Three basic modes of addressing are
 Immediate
 Register
 Memory
56
Addressing Modes
Example:
If CS=24F6h & IP=634Ah, show the;
1- The logical address
2- The offset address
3- The physical address
4- The lower range of the segment
Solution: 5- The upper range of the segment

1- The logical address is the CS: IP content which is: 24F6:634A


2- The offset address is the content of the IP register which is: 634A
3- The physical address:

57
Addressing Modes (continued...)
 Addressing modes - classified according to flow of
instruction execution
A. Sequential flow instructions
 Arithmetic
 Logical
 Data transfer
 Processor control
B. Control transfer instructions
 INT
 CALL
 RET
 JUMP
58
Addressing Modes (continued...)
A. Sequential flow instructions
1. Implied Addressing mode
2. Immediate addressing mode
3. Direct addressing mode
4. Register addressing mode
5. Register Indirect addressing mode
6. Indexed addressing mode
7. Register Relative addressing mode
8. Based Indexed addressing mode
9. Relative Based Indexed addressing mode
B. Control transfer instructions
1. Intersegment Direct addressing mode
2. Intersegment Indirect addressing mode
3. Intra segment Direct addressing mode
4. Intra segment Indirect addressing mode

59
Addressing Modes (continued...)
Sequential Flow Instructions

1. Implied Addressing - The data value/data


address is implicitly associated with the
instruction.
◦ AAA
◦ AAS
◦ AAM
◦ AAD
◦ DAA
◦ DAS
◦ XLAT

60
Addressing Modes (continued...)
Sequential Flow Instructions
2. Immediate Addressing – Data / operand is part
of the instruction Destinatio
Source
n
16 Bit
• MOV AX, 25BF[ ;AX25BF H ] Data
• MOV AL, 8EH ; [ AL8E ] 8 Bit Data

3. Direct Addressing – Data is pointed by 16 bit


offset value specified in the instruction
• MOVEffective
AX, [5000H]
Addr = ;5000
PhyAddr = 10H*DS + 5000H

61
Addressing Modes (continued...)
4. Register Addressing – Data is in the register
specified in the instruction

No PhyAddr, since data is in regr


• MOV BX, AX

•16 BIT Operand Registers - AX, BX, CX,DX, SI, DI, SP, BP

8 BIT Operand Registers - AL, AH, BL, BH, CL, CH, DL, DH

62
Addressing Modes (continued...)
5.Register Indirect Addressing – Data is pointed by
the offset value in the register, specified in the
instruction

MOV AX, [BX] Default Segment - DS or ES


Offset – BX or SI or DI

DS BX
PhyAddr = 10H *
ES
+ SI
DI
If DS=5000H; BX=10FF;
Then EffectiveAddr = 10FF
and PhyAddr = 10H*5000H + 10FFH = 510FFH

63
Addressing Modes (continued...)
6.Indexed Addressing
Data is pointed by the offset in the index register specified
in the instruction
DS is the default segment register for SI and DI

MOV AX, [SI] Data is available in the logical


address [DS:SI]

Effective Addr = [SI]


SI
PhyAddr = 10H * DS + DI

64
Addressing Modes (continued...)
7.Register Relative Addressing
Data is pointed by the sum of 8 bit or 16 bit displacement
specified in the instruction plus
Offset specified in the registers –BX, BP, SI, DI
Default segment registers – DS, ES

MOV AX, 50H [BX]

EffectiveAddr = 50H+[BX]
BX
DS BP
PhyAddr = 10H * ES + SI
DI

65
Addressing Modes (continued...)
8.Based Indexed Addressing
Data is pointed by content of base register specified in the
instruction plus
Content of index register specified in the instruction
Default segment registers – DS, ES

MOV AX, [BX] [SI]


BX SI
EffectiveAddr = BP + DI

DS BX SI
PhyAddr = 10H * ES
+ BP
+ DI

66
Addressing Modes (continued...)
9.Register Relative Addressing
Data is pointed by the sum of 8 bit or 16 bit displacement
specified in the instruction plus
Offset specified in the base registers –BX, BP plus
Offset specified in the index registers – SI, DI
Default segment registers – DS, ES

8 bit BX SI
EffectiveAddr = 16 bit + BP + DI

DS 8 bit BX SI
PhyAddr = 10H * ES +
16 bit
+ BP + DI

67
Addressing Modes (continued...)
BUS OPERATION
• The Pentium processor perform a number of different operations over 
its address and data buses.
• Data transfer, Interrupt acknowledgement, Inquire cycle for examining 
the internal code and data cache, and I/O operations.

Decoding a bus cycle:
The Pentium bus logic indicates the type of bus cycle, currently with 
the use of its cycle definition signals. 
            The signals are M/IO,D/C,W/R,CACHE,KEN
• Special bus cycle requires additional decoding and use the byte 
enable outputs for selection.
Bus cycle states:
• There are six possible states the Pentium bus may be in, depending on 
what type of cycle is being processed. 
• The states are Ti,T1,T2,T12,T2P,TD.
• Ti: This is the bus idle state. In this state, no bus cycles are being run. 
The processor may or may not be driving the address and status pins
• T1: This is the first clock of a bus cycle. Valid address and status are 
driven out
• T2: This is the second and subsequent clock of the first outstanding 
bus cycle. In state T2, data is driven out (if the cycle is a write), or data 
is expected
• T12: This state indicates there are two outstanding bus cycles, and 
that the processor is starting the second bus cycle at the same time 
that data is being transferred for the first. In T12, the processor drives 
the address and status
• T2P: This state indicates there are two outstanding bus cycles, and 
that both are in their second and subsequent clocks. In T2P, data is 
being transferred
• TD: This state indicates there is one outstanding bus cycle, that its 
address, status already been driven sometime in the past (in state 
T12) (DEAD LOCK TIME)
Processor bus control state machine:
0: No bus cycle requested
1: New bus cycle started. ADS is taken low.
2: Second clock cycle of current bus cycle.
3: Stay in T2 until BDRY is active or new bus cycle is 
requested
4: Go back to T1 if a new request is pending.
5: Bus cycle complete; go back to idle state.
6: Begin second bus cycle
7: Current cycle is finished and no dead clock is needed.
8: A dead clock is needed after the current cycle is finished.
9: Go to T2P to transfer data
10: Wait in T2P until data is transferred.
11: Current cycle is finished and no dead clock is needed.
12: A dead clock is needed after the current cycle is finished.
13: Begin a pipelined bus cycle if NA is active
14: No new bus cycle is pending
SINGLE TRANSFER CYCLE:
This cycle transfers up to 8 bytes of non cacheable data between processor 
and memory.
 The cycle begins during clock cycle T1, when ADS goes low CACHE is taken 
high to indicate to external circuitry that the data is not going to, or coming 
from the internal cycle.
If BDRY goes low during the T2 clock cycle, the data will be transferred and 
operation completes during clock cycle Ti.
If BDRY is not low during T2, addition T2 clock cycle are generated, these 
extra clock cycle are called WAIT CYCLE.
BURST CYCLE:
Supports burst read and write of 32 bytes.
The cache uses burst cycle for line load and write back.
During a burst operation, a new eight byte chunk can be transferred every clock 
cycle.
LOCKED OPERATION:
Many operating systems processes depend on what is called atomic access to data 
stored in memory.
An atomic operation cannot be broken down into smaller sub-operations.
The data accessed during the atomic operation often comes in the form of a 
semaphore.(uninterruptable operation).
Example:
XCHG instruction 
BOFF:
 The BOFF input provides a way for other processors in a multiprocessor 
system to instantly take over the Pentium buses.
BOFF low put bus into high impedance state and allows the other 
processor to use bus.
BOFF high allows the Pentium to use bus(interrupts the process in 
between if BOFF goes high)
BUS HOLD:
The HOLD input provides a second way for a different bus master to take 
control of the Pentium’s buses.
 Unlike BOFF, HOLD completed the current bus cycle.
INTERRUPT ACKNOWLEDGE:
The processor runs two interrupts acknowledge cycles in response to an INTR 
request. Both cycles are locked.
 To maintain hardware compatibility with earlier 80x86 machines, the data is 
ignored by the processor during the first interrupt acknowledge and accept during 
the second acknowledge.
SHUTDOWN:
If the Pentium detects an internal parity error, a shutdown cycle is run. Execution is 
suspended while in shutdown.
Until the processor receives an NMI,INIT or RESET request.
HALT:
Similar to shutdown, except that the INTR signal may also be used to resume 
execution.
PIPELINED CYCLE:
It process the second cycle before the current one is completed. It does so 
through pipelined read and write logic. In response to a request on NA 
input.
INQUIRE CYCLE:
 Maintain cache coherency in a multiprocessor system. The Pentium 
processor is able to watch the system bus in multiprocessor system. This is 
called BUS SNOOPING.
 If the Pentium detects a memory read/write operation being performed 
by another CPU, it runs an internal inquire cycle to determine whether the 
address in the bus is stored in one of its internal caches. If so, the cache 
may need to be updated. 
PIPELINING
Integer Pipeline
Integer Pipeline
• The pipelines are called “u” and “v” pipes.
• The u-pipe can execute any instruction, while the v-pipe can execute 
“simple” instructions as defined in the “Instruction Pairing Rules”.
•  When instructions are paired, the instruction issued to the v-pipe is
always the next sequential instruction after the one issued to
u-pipe.
Integer Instruction
Pairing Rules
Integer Instruction
Pairing Rules
• To issue two instructions simultaneously they must 
satisfy the following conditions:        
• Both instructions in the pair must be “simple”.
• There must be no read-after-write(RAW) or write-after-
write register(WAW) dependencies
RAW:
i1. R2  R1 + R3
i2. R4  R2 + R3
WAW:
i1. R2  R4 + R7
i2. R2  R1 + R3
• The following integer instructions are considered simple 
and may be paired:
1. mov reg, reg/mem/imm
2. mov mem, reg/imm
3. alu reg, reg/mem/imm
4. alu mem, reg/imm
5. inc reg/mem
6. dec reg/mem
7. push reg/mem
8. pop reg
9. lea reg,mem
10. jmp/call/jcc near
11. nop
12. test reg, reg/mem
13. test acc, imm
Instruction Issue Algorithm
• Decode the two consecutive instructions I1 and I2
• If the following are all true
– I1 and I2 are simple instructions
– I1 is not a jump instruction
– Destination of I1 is not a source of I2
– Destination of I1 is not a destination of I2
• Then issue I1  to u pipeline and  I2 to v pipeline
• Else issue I1 to u pipeline
PIPELINE STAGES:

• Prefetch. During Prefetch, the next instruction to be executed is copied 
from cache memory to the CPU.
• Instruction Decode, Part 1 
• Instruction Decode, Part 2 
• Execution.
• Write Back. Registers and memory locations are updated.
Integer Pipeline
• The integer pipeline stages are as follows:
1. Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction
cache or memory.
2. Decode1(D1):
– Two parallel decoders attempt to decode and issue the 
next two sequential instructions
– It determines the current pair of instruction can execute 
together.
Integer Pipeline
3. Decode2(D2):
• Decodes the control word
• Address of memory resident operands are calculated
4. Execute (EX):
• The instruction is executed in ALU
• Data cache is accessed at this stage
• For both ALU and data cache access requires more than one
clock.
5. Writeback(WB):
• The CPU stores the result and updates the flags
C1 C2 C3 C4 C5 C6 C7 C8 C9
Pipeline Stalls:

 When paired instruction reach the EX stage, it is possible that one or other will stall and require 
additional cycles to execute. A pipeline stall lowers performance, since no work is done during stall
 Instruction stall for various reasons, most notably when their operands are not available in data 
cache. 
 If the instruction in the U pipeline stalls, then V-pipeline does the same.
 If the V pipeline stalls, the instruction in the U-pipeline may continue executing. Both instructions 
must process to the WB stage before another pair may enter the EX stage.
Branch Prediction
Logic
Flushing of pipeline problem
• Performance gain through pipelining can be reduced 
by the presence of program transfer instructions
(such as JMP,CALL,RET and conditional jumps).
• They change the sequence causing all the instructions
that entered the pipeline after program transfer
instruction invalid.
Flushing of pipeline problem
• Suppose instruction I3 is a conditional jump to I50 at 
some other address(target address), then the 
instructions that entered after I3 is invalid and new 
sequence beginning with I50 need to be loaded in.
• This causes bubbles in pipeline, where no work is 
done as the pipeline stages are reloaded.
Flushing of pipeline problem
• To avoid this problem, the Pentium uses a scheme 
called Dynamic Branch Prediction.
• In this scheme, a prediction is made concerning the 
branch instruction currently in pipeline.
• Prediction will be either taken or not taken.
• If the prediction turns out to be true, the pipeline will
not be flushed and no clock cycles will be lost.
Flushing of pipeline problem
• If the prediction turns out to be false, the pipeline is flushed and 
started over with the correct instruction.
• It results in a 3 cycle penalty if the branch is executed in the u-
pipeline and 4 cycle penalty in v-pipeline.
Dynamic Branch Prediction
Mechanism
• It is implemented using a 4-way set associative cache with 256
entries. This is referred to as the Branch Target Buffer(BTB).
• The directory entry for each line contains the following 
information:
• Valid Bit : Indicates whether or not the entry is in use
• History Bits: track how often the branch has been taken
• Source memory address that the branch instruction was fetched from 
(address of I3)

If its directory entry is valid, the target address of the branch is stored in 


corresponding data entry in BTB
Dynamic Branch Prediction
Mechanism
• The first time that a branch instruction enters either pipeline, the BTB
uses its source memory address to perform a lookup in the cache.
• Since the instruction has not been seen before, this results in a BTB 
miss.
Dynamic Branch Prediction
Mechanism
• It means the prediction logic has no history on 
instruction.
• It then predicts that the branch will not be taken and
program flow is altered.
• Even unconditional jumps will be predicted as not
taken the first time that they are seen by BTB.
Dynamic Branch Prediction
Mechanism
• When the instruction reaches the execution stage, the branch 
will be either taken or not taken.
• If taken, the next instruction to be executed should be the one 
fetched from branch target address.
• If not taken, the next instruction is the next sequential memory
address.
Dynamic Branch Prediction
Mechanism
• When the branch is taken for the first time, the execution unit
provides feedback to the branch prediction logic.
• The branch target address is sent back and recorded in BTB.
• A directory entry is made containing the source memory
address and history bits set as strongly taken
Dynamic Branch Prediction
Mechanism

Strongly Weakly
Taken Taken

Strongly Weakly
Not Not
Taken Taken
Dynamic Branch Prediction
Mechanism
History Resulting Prediction If branch is If branch is
Bits Description Made taken not taken
11 Strongly Branch Remains
Strongly
Downgrades to
Weakly Taken
Taken Taken
Taken
10 Weakly Branch Upgrades to Downgrades to
Taken Taken Strongly Weakly Not
Taken Taken
01 Weakly Not Branch Not Upgrades to Downgrades to
Taken Taken Weakly Taken Strongly Not
Taken
00 Strongly Not Branch Not Upgrades to Remains
Taken Taken Weakly Not Strongly Not
Taken Taken
FLOATING POINT UNIT(FPU)
Floating-Point Pipeline
• The floating point pipeline has 8 stages as follows:
1. Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction
cache
2. Instruction Decode(D1):
– Two parallel decoders attempt to decode and issue the 
next two sequential instructions
– It decodes the instruction to generate a control word
Floating-Point Pipeline
3. Address Generate (D2):
• Decodes the control word
• Address of memory resident operands are calculated
4. Memory and Register Read (Execution Stage) (EX):
• Register read, memory read or memory write performed 
as required by the instruction to access an operand.
5. Floating Point Execution Stage 1(X1):
• Information from register or memory is written into FP 
register.
• Data is converted to floating point format before being 
loaded into the floating point unit
Floating-Point Pipeline
6. Floating Point Execution Stage 2(X2):
• Floating point operation performed within floating point 
unit.
7. Write FP Result (WF):
• Floating point results are rounded and the result is 
written to the target floating point register.
8. Error Reporting(ER)
• If an error is detected, an error reporting stage is entered 
where the error is reported and FPU status word is
updated
Instruction Issue for Floating
Point Unit
• The rules of how floating-point (FP) instructions get issued 
on the Pentium processor are :
1. FP instructions do not get paired with integer instructions.
2. When a pair of FP instructions is issued to the FPU, only
the FXCH instruction can be the second instruction of the 
pair. 
The first instruction of the pair must be one of a set F where F =
[ FLD,FADD, FSUB, FMUL, FDIV, FCOM, FUCOM, FTST, FABS,
FCHS].
3. FP instructions other than FXCH and instructions
belonging to set F, always get issued singly to the FPU.
4. FP instructions that are not directly followed by an FXCH
instruction are issued singly to the FPU.
Bypass1

Floating –point 
registers
ST(0)
Read port 1 Write port 1 X1

Ex 80 bits

Read port 2 Write port 2
ST(7)
WF

Bypass2
FPU Register File
PAGING
Paging
• The Pentium supports translation of virtual (linear) addresses into physical addresses 
through the use of special tables that map portions of the virtual address into actual 
physical memory locations. 

• Physical memory is divided into fixed-size page frames of 4KB each. 

• Paging is controlled by three flags in the processor’s control registers:


• Paging is enabled by making PG = 1 in CR0 register (required in    
           multitasking in virtual 8086 model)
• In Pentium no bit mode to disable segmentation

• PSE (page size extensions) flag, bit 4 of CR4. { set => page size 2MB or 4MB
• PAE (physical address extension) flag, bit 5 of CR4).
Paging

• Page directory—An array of 32-bit page-directory entries contained in a 4-KByte page. Up to 1024 page-
directory entries can be held in a page directory.
• Page table—An array of 32-bit page-table entries contained in a 4-KByte page. Up to 1024 page-table 
entries can be held in a page table. (Page tables are not used for 2-MByte or 4-MByte pages. These page 
sizes are mapped directly from one or more page directories.)
• Page—A 4-KByte, 2-MByte, or 4-MByte flat address space.
Paging
32-bit virtual (linear) addresses generated by a running task select entries in the 
systems page directory and page table, which translate the upper 20 bits of the virtual 
address into the actual physical address where a page frame is located. 
The lower 12 bits of the virtual address are not translated and point to one of 4,096 
byte locations within a page frame. 
• How is a 32-bit virtual address translated into a physical address?
• The upper 10 bits of the virtual address select one of 1,024 entries in the page directory.
• The base address of the page directory is stored in the page directory base register (PDBR).
• Each entry in the page directory is 4 bytes wide and contains the base address of a page table.
• The next 10 bits from the virtual address select one of 1,024 entries in the page table pointed to by 
the page directory entry. 
• This entry is also 4 bytes wide and contains the base address of the actual physical memory page 
frame. 
• This address is combined with the lower 12 bits of the virtual address to access the desired location 
in memory.
Paging
Displacement or 
Offset

PDE & PTE format 31 – 12( PT Address) 11- 0 ( control & status flags


Paging
Translation lookaside buffers(TLBs)
• To improve the performance, the internal instruction and data cache 
of the Pentium contain small, special caches called TLBs that 
automatically translate the upper 20 bits of the virtual address into 
upper 20 bits of physical address.
• So it requires only one clock cycle to process.
• TLBs contains only the address of the most recently used pages.
• If the required translation is not available in TLB, then the processor 
access the page directory and page table from RAM and store it TLBs.
• Prior to doing this it may be necessary to invalidate the contents in 
TLBs.
PDE:
Page frame address(12-31) Avail. 0 0 0 A PCD PWT U W F

PTE:
Page frame address(12-31) Avail. 0 0 0 A PCD PWT U W F
• D-Dirty bit: It is set if a write has been performed to the page pointed by 
PTE.
• A-Accessed: It is set if a read or write was performed to the page selected 
by the PTE and PDE.
• PCD-cache disable: This bit determines whether the current memory 
accessed is cache.
• PWT-Writethrough: This bit enables writethrough operations between 
cache and memory.
• U-user: This bit is set when performing protection check in memory
• W-writable: This bit determines whether page may be written to and is 
also used in protection checks
• P-Present: This page indicates page is actually stored in memory. If new 
page is needed, processor creates one and updates TLBs.
Paging
Summary….

• Page translation allows the physical memory used by a system to be much 
smaller than the linear addressing space. 
• For instance, the Pentium’s 4GB linear addressing space may be mapped to a 
physical memory of only 512MB. 
• The pages used by a program do not need to be stored consecutively. 
• A program’s code and data may be spread out all over physical memory, and 
even moved around (with help from the hard disk) while the program is 
executing! 
• This helps to explain why the linear addresses are also called virtual addresses, 
since they have no relation to the actual physical memory address used, except 
for the lower 12 bits.
MULTITASKING
Multitasking VS Multithreading
• Tasks are like jobs. So, multi tasking means doing multiple jobs at the 
same time.
• Threads run within a process or task. So, multi threading means many 
sub tasks being done within a main task.
• Like, using Microsoft word and PowerPoint is multi tasking. while 
typing and using the grammar and spell check means you are running 
2 threads within Microsoft word.
MULTITASKING
• Ability to support execution of multiple programs ( Tasks) 
simultaneously
• Actually one program is running at one point in time, but the ability to 
switch the Task to Task at very high speed gives the impression of 
multitasking
The processor defines four data structures for handling task related 
activities:

Task state segment (TSS).
TSS descriptor.
Task register
Task gate descriptor.
• Each task executes for a period of time called TIME SLICE.
• TASK SWITCH is used to switch from one task to another task. But 
rapidly switching from task to task gives the impression that all tasks 
are running at the same time.
1.Task State  Segment:
During the task switch, the contents of all processor register, as well 
as information saved for the task being suspended and new 
information is loaded for the next task.
This information is not saved on the stack, but saved on special 
memory structure called the TASK STATE SEGMANT(TSS)
It contains storage areas for all of Pentiums Registers, segment 
selectors, stack pointers
 When a task is created, the task’s 
LDR, PDBR, Protection level stack, 
T-bit, I/O permission map bit are 
filled in.

 During the task switch, these items 
are not altered. Only the register 
portion EIP to GS is modified 
during task switching.
2. TSS-Defines the various characteristics  of the segments exhibits. TSS utilizes this descriptor.
descriptor
• B – task is currently running or waiting to run.
• P – segment is in memory or not ( sometimes suspended if page fault 
occurs)
• G- determines how the limit field is interpreted.
        Clear-segment size from 1 byte to 1MB.
        Set- Segment size from 4KB to 4GB(in chunks of 4KB)
• If the segment is available for use then AVL bit will be set.
• DPL- indicates privilege level of the segment and is used in protection 
check.
3. Task Register (TR)

1. The task register holds the 16-bit segment selector, 32-bit base address, 32-bit
segment limit, and descriptor attributes for the TSS of the current task
2. The TSS actually in use is accessed through TR (using STR and LTR commands)

TSS descriptor may only be loaded into the 
GDT(global descriptor table). When 
multiple TSS is stored in GDT. The currently 
in use is accessed through the use of TR
• The task register may be loaded with a new TSS selector with the 
LTR(Load Task Register) instruction. LTR requires a 16-bit register or 
memory operand and may only executed in protected mode.
4. Task Gate Descriptor
1. A task switch may results in a privilege violation if the new task has a 
lower priority then the current executing task. Task Gate provides a way 
to facilitate task switching.
2.A task gate descriptor provides an indirect, protected reference to a 
task. A task gate descriptor can be placed in the GDT or LDT.
3. It allows a single busy bit to be used for a segment ( contained in TSS 
descriptor)
4. By this approach it safe guards the processor in facilitating 
Multitasking using DPL and Busy bit.
 
TASK SWITCH
• The following steps take place during task switch
 The new TSS descriptor or task gate must have sufficient privilege 
to allow a task switch.
 The new TSS descriptor must have its present bit set.
 The state of current task is saved.
 The task register is loaded with the selector of the new TSS 
descriptor 
 The state of the new task is loaded from its TSS and execution is 
resumed.
TASK ADDRESSING SPACE

• If paging is not enabled, the linear addresses generated by a task are 
the same as the physical addresses sent to the memory system.
• When paging is enabled it is possible for each task to have its own 
separate, protected addressing space, through the use of PDBR(Page 
Directory Base Register) stored in TSS.
INTERRUPTS AND EXCEPTIONS
• Interrupts typically occur at random times during the execution of a program, 
in response to signals from hardware. They are used to handle events 
external to the processor, such as requests to service peripheral devices. 
• Software can also generate interrupts by executing the INT n instruction.
• Exceptions occur when the processor detects an error condition while 
executing an instruction, such as division by zero. 
• The processor detects a variety of error conditions including protection 
violations, page faults, and internal machine faults.
• When an interrupt is received or an exception is detected, the 
currently running procedure or task is automatically suspended while 
the processor executes an interrupt or exception handler.
• When execution of the handler is complete, the processor resumes 
execution of the interrupted procedure or task. The resumption of the 
interrupted procedure or task happens without loss of program 
continuity
INTERRUPTS
• Non- maskable interrupts (NMIs). These interrupts are received on 
the processor’s NMI# input pin. The processor does not provide a 
mechanism to prevent nonmaskable interrupts.
• Maskable interrupts. These interrupts are received either at the 
processor's INTR# (interrupt) pin from an external, system-based 
interrupt controller (8259A) or as a serial message on the LINT[1:0] 
pins from a system-based I/O APIC. The processor does not act on 
maskable interrupts unless the IF (interrupt-enable) flag in the EFLAGS 
register is set.
• Software-generated interrupts. These are generated by INT n
instruction. The processor does not provide a mechanism for masking 
interrupts generated in this manner.
EXCEPTIONS
• Processor-detected exceptions. These are generated when the 
processor detects program and machine errors. They are further 
classified as faults, traps, and aborts.
• Software-generated exceptions. The INTO, INT3, BOUND, and INTn
instructions generate exceptions. (The INTn instruction generates an 
exception when an exception vector number as an operand.)
• The processor associates an identification number, called a vector, 
with each interrupt and exception.
• The NMI interrupt and the exceptions are assigned vectors in the 
range 0 through 31. Not all of these vectors are currently used by the 
processor. Unassigned vectors in this range are reserved for possible 
future uses.
• The vectors in the range 32 to 255 are provided for maskable 
interrupts, generated either by asserting the INTR pin or by sending 
interrupt messages over the APIC bus. (Advanced Programmable 
interrupt controller)
• External interrupt controllers (such as Intel's 8259A Programmable 
Interrupt Controller) deliver one of these vectors to the processor on 
the system bus during its interrupt-acknowledge cycle.
INTERRUPT DESCRIPTOR TABLE
(IDT)
• Real mode uses a 1KB Interrupt Vector Table(IVT) beginning at address 
00000H. Each 4-byte entry in the IVT.
• Protected mode relies on an Interrupt Descriptor Table(IDT) to support 
interrupts and exceptions.
• IDT comprises 8-byte gate descriptor for task, trap or interrupt gates. The 
IDT has a maximum size of 256 descriptors. The size of IDT is controlled by 
a 16-bit limit value stored in Interrupt Table Descriptor Register(ITDR).
• ITDR is a 48-bit register contains the 32-bit base address for the IDT and 
the 16-bit size limit.
• It can be placed anywhere in physical memory.
IDT DESCRIPTORS
The IDT may contain any of three kinds of gate descriptors:
• Task gate descriptor
• Interrupt gate descriptor
• Trap gate descriptor
• The P-bit in each descriptor stands for present, and indicates whether 
the segment is present in memory.
• The DPL field specifies the descriptor privilege level.
• When fewer interrupts/exceptions are required, the limit field of the 
IDTR is used to specify the addressable limit within the IDT. The 
Pentium will enter shutdown mode if the limit is exceeded.
Interrupt 0—Divide Error Exception
• Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the 
result cannot be represented in the number of bits specified for the 
destination operand.
Interrupt 1—Debug Exception
• Indicates that one or more of several debug-exception conditions has been 
detected. Whether the exception is a fault or a trap depends on the 
condition
• Trap or Fault. The exception handler can distinguish between traps or 
faults by examining the contents of the DR6 register and other debug 
registers.
Interrupt 2—NMI Interrupt
• The non-maskable interrupt (NMI) is generated externally by asserting the 
processor’s NMI pin. This interrupt causes the NMI interrupt handler to be 
called.
Interrupt 3—Breakpoint Exception
• Indicates that a breakpoint instruction (INT3) was executed, causing a 
breakpoint trap to be generated. Typically, a debugger sets a breakpoint by 
replacing the first opcode byte of an instruction with the opcode for the 
INT3 instruction.
• Breakpoint handler is responsible for replacing the original byte of the 
instruction modified.
Interrupt 4—Overflow Exception
Indicates that an overflow trap occurred when an INTO instruction was 
executed. If the OF flag is set, an overflow trap is generated.
Interrupt 5—BOUND Range Exceeded Exception
Indicates that a BOUND-range-exceeded fault occurred when a BOUND 
instruction was executed. It detects the array subscript out of range.
Interrupt 6—Invalid Opcode Exception
Attempted to execute an invalid or reserved opcode.
Interrupt 7—Device Not Available Exception
On earlier 80x86 machines, This exception was used to indicate that there 
was no external floating point coprocessor interfaced to the CPU
Interrupt 8—Double Fault Exception
• Indicates that the processor detected a second exception while calling an 
exception handler for a prior exception.
Interrupt 9—CoProcessor Segment Overrun
• This was previously used to signal the page fault but it is not available in 
Pentium.
Interrupt 10—Invalid TSS Exception
Indicates that a task switch was attempted that referenced an invalid TSS.
Interrupt 11—Segment Not Present
Indicates that the present flag of a segment or gate descriptor is clear. It 
indicates segment is not present in memory.
Interrupt 12—Stack Fault Exception
• A limit violation is detected during an operation that refers to the SS 
register. Operations that can cause a limit violation include stack-oriented 
instructions
Interrupt 13—General Protection Exception
• Indicates that the processor detected one of a class of protection 
violations called “general protection violations.”
Violations like
• Exceeding the segment limit when accessing the CS, DS, ES, FS, or GS 
segments.
• Writing to a code segment or a read-only data segment.
• Reading from an execute-only code segment.
Interrupt 14—Page Fault Exception
It occurs when processor attempts to access a page that is not in memory
Interrupt 16—Floating-Point Error Exception
Indicates that the FPU has detected a floating-point-error exception.
Interrupt 17—Alignment Check Exception
• Indicates that the processor detected an unaligned memory operand 
when alignment checking was enabled.
Interrupt 18—Machine Check Exception
Indicates that the processor detected an internal machine error.

S-ar putea să vă placă și