Sunteți pe pagina 1din 60

Intel 8086/8088 Microprocessors

Intel 8086 and 8088 Microprocessors are the basis of all IBM-PC compatible computers
(8086 introduced in 1978, first IBM-PC released in 1981)

All Intel, AMD and other advanced microprocessors are based on and are compatible with the original 8086/8 At Power Up and Reset time, Pentiums, Athlons etc all look like 8086 processors
06/03/2005 ET4508_p2 (KR) 1

Intel 8086/8088 Microprocessors




Intel 8086 is a 16b microprocessor:




16b data registers, 16b ALU 8086: 16b 8088: 8b

Width of external data bus:


 

 

 

Width of external address bus: 16b+4b=20b Some techniques to optimise the CPU performance when it s executing programs Segment: Offset memory model Little-Endian Data Format
06/03/2005 ET4508_p2 (KR) 2

8086/8088 (1)
 

Original IBM PC used 8088 microprocessor 8088 is similar to the 8086, but it has an external 8b data bus & only 4B-deep queue


For cost reduction reasons

 

We can consider 8086 and 8088 together PC clones often used 8086 for better performance 8-bit bus reduces performance, but meant cheaper computers
06/03/2005 ET4508_p2 (KR) 3

8086/8088 (2)
  

Remember the Fetch-Decode-Execute cycle? Fetching from EXTERNAL MEMORY is SLOW The 8086/8 used an instruction queue to speed up performance While the processor is decoding and executing an instruction, its bus interface can be reading new instructions, since at that time the bus is not actually in use
06/03/2005 ET4508_p2 (KR) 4

8086/8088 Functional Units

Execution Unit (EU)

Bus Interface Unit(BIU) Fetches Opcodes, Reads Operands, Writes Data

8086/8088 MPU
06/03/2005 ET4508_p2 (KR) 5

8086/8088 (3)


8086/8088 consists of two internal units




The execution unit (EU) - executes the instructions The bus interface unit (BIU) - fetches instructions, reads operands and writes results

 

The 8086 has a 6B prefetch queue The 8088 has a 4B prefetch queue
06/03/2005 ET4508_p2 (KR) 6

8086/8088 Internal Organisation


EU BIU Address Bus 20 bits AH BH CH DH SP SS BP ES DI IO BI Internal Communications Registers Bus Control 8088 Bus AL BL CL CS DL DS SUMMATION Data Bus

Temporary Registers Instruction Queue ALU


EU Control

Flags

06/03/2005

ET4508_p2 (KR)

BIU Elements


Instruction Queue: the next instructions or data can be fetched from memory while the processor is executing the current instruction


The memory interface is slower than the processor execution time so this speeds up overall performance CS, DS, SS and ES are 16b registers Used with the 16b Base registers to generate the 20b address Allow the 8086/8088 to address 1MB of memory Changed under program control to point to different segments as a program executes

Segment Registers:
   

Instruction Pointer (IP) contains the Offset Address of the next instruction, the distance in bytes from the address given by the current CS register

8086/8088 20-bit Addresses


CS 16-bit Segnment Base Address 0000

IP 16-bit Offset Address

20-bit Physical Address

06/03/2005

ET4508_p2 (KR)

Exercise: 20-bit Addressing


1.

2.

CS c t i s ,I c t i s CE24h. hat is the res lti hysical address? CS c tai s 5 h, I c tai s 24h. hat is the res lti hysical address?

06/03/2005

ET4508_p2 (KR)

10

8086/8 In Circuit (1)




8086/8 icr r cess rs eed s rt circ its i a icr c ter system 8086/8 multi lex the address and data uses n the same ins This saves ins ut at a rice:


Demulti lexing l gic is needed t uild up separate address and data uses t interface s and s with
ET4508_p2 (KR) 11

06/03/2005

MAXIMUM MODE GND AD14 AD13 AD12 AD11 AD10 AD9 AD8 AD7 AD6 AD5 AD4 AD3 AD2 AD1 AD0 NMI INTR CLK GND
20 21 1 40

MINIMUM MODE

Vcc AD15 A16,S3 A17,S4 A18,S5 A19,S6 /BHE,S7 MN,/MX /RD

8086

/RQ,/GT0 /RQ,/GT1 /LOCK /S2 /S1 /S0 QS0 QS1 /TEST READY RESET

HOLD HLDA /WR IO/M DT/R /DEN ALE /INTA

06/03/2005

ET4508_p2 (KR)

12

MAXIMUM MODE

MINIMUM MODE

MAXIMUM MODE GND A


MINIMUM MODE

GND AD AD AD AD AD AD AD AD AD AD AD AD AD AD AD NMI IN R C K

06/03/2005

GND

V AD A A A A ,S ,S ,S ,S

V A A A A A ,S ,S ,S ,S /SS

A A A A A A AD HO D H DA / R IO/M D /R /DEN A E /IN A AD AD AD AD AD AD AD NMI IN R C K GND

/ HE,S MN,/MX /RD /RQ,/G /RQ,/G / OCK /S /S /S QS QS / ES READY

high MN,/MX /RD /RQ,/G /RQ,/G / OCK /S /S /S QS QS / ES READY RESE

HO D H DA / R IO/M D /R /DEN A E /IN A

RESE

ET4508_p2 (KR)

13

8086/8 In Circuit (2)




In aximum de the 8086/8 needs at least the f ll wing: 8288 us Controller, 8284A Clock Generator, 74HC373s and 74HC245s With the aid of these devices the 8086 egins to look like the ideal microprocessor we looked at earlier

06/03/2005

ET4508_p2 (KR)

14

i8086 Circuit - Maximum Mode


CLK S0# S1# S2# MRDC# MWTC# AMWC# IORC# IOWC# AIOWC# INTA#

Vcc

8284A Clock enerator RDY

CLK READY RESET

8288 Bus Controller


DEN DT/R# ALE

8086 CPU
MN/MX#

LE OE# BHE# AD1 :AD0 A19:A16 INTR 74LS373 x3


A19:A0, BHE#

ADDR/DATA

DIR EN#
D1 :D0

ADDR/Data

74LS24 74LS24 x2 x2

8086/8 Maximum Mode




In maximum mode, the 8288 uses a set of status signals (S0, S1, S2) to rebuild the normal bus control signals of the microprocessor
 

MRDC#, MWTC#, IORC#, IOWC# etc Equivalent to MEMR# etc

Look at some special signals briefly


06/03/2005 ET4508_p2 (KR) 16

RESET# Signal


 

The Active low RESET# signal puts the 8086/8 into a defined state Clears the flags register, segment registers etc. Sets the effective program address to 0FFFF0h (CS=0F000h, IP=0FFF0h) 8086/8 Programs always start at 0FFFF0H after Reset has been asserted and removed Continues into latest generation CPUs
06/03/2005 ET4508_p2 (KR) 17

BHE# Signal (8086 Only)




 

The 8086 processor can address memory a byte at a time Its data bus is 16b wide It uses the BHE# signal and A0 (sometimes called BLE#) to address bytes using its 16b bus

06/03/2005

ET4508_p2 (KR)

18

Use of BHE#/A0(BLE#)
Byte-Wide addressing (8088) FFFFF FFFFE FFFFD FFFFC A19..A1 ODD Addresses (8086) FFFFF FFFFD FFFFB FFFF9 A19..A1 EVEN Addresses (8086) FFFFE FFFFC FFFFA FFFF8

00002 00001 00000

0000 00003 00001

00004 00002 00000

D1 :D8 BHE# A0/BLE#

D7:D0

06/03/2005

ET4508_p2 (KR)

19

Use of BHE#/BLE#
BHE# A0/BLE# Selecti n Whole wor (1 -bits) High byte to/from odd ddress Low byte to/from even ddress No selection

0 0 1 1

0 1 0 1

06/03/2005

ET4508_p2 (KR)

20

ALE and Address/data Bus Multiplexing




 

8086/8 Multiplexes the Address and Data signals onto the same set of pins Need off-chip logic to separate the signals Transparent latches designed just for address demultiplexing

06/03/2005

ET4508_p2 (KR)

21

ALE and 74HC373 Transparent Latch


Clock

Address/ Data Bus

Address Time

Data Time

ALE

Output of 74HC373

Microcomputer AddressBus

74HC373 or equivalent

Address/ Data Bus

In0:In7

Q0:Q7

System Address Bus

ALE

LE OE# TriState Control signal, OE#, sho n connected to ND for simplicity

06/03/2005

ET4508_p2 (KR)

22

Use of ALE (Address Latch Enable)




ALE is used with an external latch (74HC373) to demultiplex the address and data lines 74HC373 is transparent when its LE input (connected to ALE) is high When ALE goes low, the 373 holds the last data until ALE goes high again
06/03/2005 ET4508_p2 (KR) 23

8288 Bus Controller and Bus Transceivers


8288 Bus Controller DEN# DT/R# EN# DIR DIR CPU [D1 :D8] 74HC24 Buffered [D1 :D8] 8288 Bus Controller also generates Direction and Enable signals for BiDirectional Transeivers Supports Buffering the System Data Bus

EN# DIR CPU [D7:D0] 74HC24 Buffered [D7:D0]

06/03/2005

ET4508_p2 (KR)

To Memory and I/O Systems 24

8086 Read Cycle


T1 C K /S0, /S1, /S2 A16..A19, /BHE
S3..S6 Address 1 or 101

T2

T3

T4

Status

A E
float float

AD0..AD15 A0..A19 DT/R DEN /MRDC or /IORC

Address

Valid Data

Valid Address

06/03/2005

ET4508_p2 (KR)

25

8086 Write Cycle


T1 C K /S0, /S1, /S2 A16..A19, /BHE
S3..S6 Address 010 or 110

T2

T3

T4

S a us

A E AD0..AD15 A0..A19 DT/R DEN /M


06/03/2005
Address Valid Da a

Valid Address

TC or /IO C
ET4508_p2 (KR) 26

8086 Read Cycle


T1 CLK /S0, /S1, /S2 A16..A19, /BHE
S3..S6

(1 Wait State)
T2 T3 Tw T4

001 or 101

Address

St t s

ALE 8284 RDY READY AD0..AD15 A0..A19 DT/R DEN /MRDC or /IORC
Address Valid Data
 

fl

fl

Valid Address

06/03/2005

ET4508_p2 (KR)

27

8086/8088 Summary
 

   

First Generation (introduced June 1978) One of the first 16b processors on the market 16b internal registers 16/8b external data bus 20b address bus (1MB addressable) Used in 1st generation IBM PCs (1981)
06/03/2005 ET4508_p2 (KR) 28

80186/80188
  

 

Evolution of 8086/8088 80186/80188 Increased instruction set On-chip system components (Clock generator, DMA, Interrupt, Timers ) Unsuccessful in PCs Popular in embedded systems

06/03/2005

ET4508_p2 (KR)

29

nd 2

Generation Processor 286

   

 

P2 (286) = 2nd Generation Processor Introduced in 1981 CPU behind IBM AT Throughput of original IBM AT (6MHz) was about 500% of IBM PC (4.77MHz) Level of integration: 134k transistors (vs 29k in 8086) Still a 16b processor Available in higher clock frequencies: 25MHz
06/03/2005 ET4508_p2 (KR) 30

nd 2
   

Generation Processors 286

Fully backwards compatible to 8086


80286 runs 8086 software without modification

Improved instruction execution


Average instruction takes 4.5 cycles vs. 12 cycles (8086)

Improved instruction set Real mode and Protected Mode


Multitasking-support. What happens in one area of memory doesn t affect other programs. Protected mode supported by Windows 3.0.

  

16MB addressable physical memory On-chip MMU (1GB virtual memory) Non-multiplexed address-bus and data-bus
06/03/2005 ET4508_p2 (KR) 31

Improving Computer Performance




We ve seen how 16b computer technology based on the 8086 and 80286 processors developed These computers are not powerful enough for today s applications How do you improve the performance of your computer? Let s start with the CPU
ET4508_p2 (KR) 32

06/03/2005

CPU Performance (1)


 

 

MOST OBVIOUS: Processor Clock Frequency Increased frequency increased execution rate State of the Art: >4GHz (03/2005) Memory and I/O access times can be performance bottleneck unless you take some special measures
06/03/2005 ET4508_p2 (KR) 33

CPU Performance (2)




ALU register width




A processor is an n- it processor, where represents the precision of the ALU N can e 4, 8, 16, 32, or 64 The wider the registers the more processing per clock The wider the data us the faster we can transfer data Since the memory and I/ device access times are finite, the more its transferred per cycle the etter

Data us width
 

06/03/2005

ET4508_p2 (KR)

34

CPU Performance (3)


 

 

Address us width Increased address width doesnt provide a speed increase as such CPU can directly address more memory PCs use ig programs, which would not fit in a smaller address space Overcoming small address space takes time


Impacts on overall system performance


ET4508_p2 (KR) 35

06/03/2005

rd 3
      

Generation Processor 386

P3 (386) = 3rd Generation Processor Introduced: 10/1985 Full 32b processor


(32b registers. 32b internal and external databus. 32b address bus)

275k transistors. CMOS. 132-pin PGA package.


(Supply current Icc=400mA. Roughly the same as 8086 !)

Clock speeds: 16-33MHz P3 processors were far ahead of their time:


It took 10 years before 32b operating systems became mainstream!

First 386 PCs early 1987


(COMPAQ)

06/03/2005

ET4508_p2 (KR)

36

rd 3


Generation Processor 386

Modes of operation:


Real. Protected. Virtual Real.

Protected mode of 386 is fully compatible with 286


Protected mode=native mode of operation. Chips are designed for advanced operating systems such as Windows NT

New virtual real mode


Processor can run with hardware memory protection while simulating the 8086 s real-mode operation. Multiple copies of e.g. DOS can run simultaneously, each in a protected area of memory. If a program in one memory area crashes, the rest of the system is protected.

06/03/2005

ET4508_p2 (KR)

37

Intel 32-bit Architecture:IA-32


Address Addressing Unit (AU) Bus Unit (BU) Prefetch Queue Execution Unit (EU) ALU
Control Unit (CU) Registers

Data

Instruction Unit (IU)

The 80386 includes a Bus Interface Unit for reading and providing data and instructions, itha Prefetch Queue, an IU for controlling the EU ith its registers, as ell as an AU for generating memory and I/O addresses

80386 Features
  

    

32b general and offset registers 16B prefetch ueue Memory management unit with segmentation unit and paging unit 32b address and data bus 4GB physical address space 64TB virtual address space i387 numerical coprocessor Implementation of real, protected and virtual 8086 modes
06/03/2005 ET4508_p2 (KR) 39

80386 Operating Modes


 

Protected Mode for Multitasking support Real Mode (native 8086 mode)


Processor powers up in Real Mode Power management or system security Processor switches to separate address space, while saving the entire context of the currently running program or task

System Management Mode


 

06/03/2005

ET4508_p2 (KR)

40

80386 Register Set


Instruction Pointer 31 EIP 16 15 IP 0 EFLAG EFLAG egister 16 15 31 FLAG E0

General-Purpose 31 EAX EBX ECX EDX ESI EDI EBP ESP

egisters 16 15 AH BH CH DH

8 7 AL BL CL DL SI DI BP SP

0 CS SS DS ES FS GS

Segment egisters 15 0

80386 Prefetch Queue

Execution Unit

16-byte deep Instruction Queue

Bus Interface Unit

32-bit Data Bus

Fetching from on-chip Queue is fast

Reading from off-chip Memory is slow

06/03/2005

ET4508_p2 (KR)

42

80386 Prefetch Queue



1.

2.

80386 Prefetch queue is 16B deep The instruction fetch can read from the prefetch ueue faster than from memory The prefetcher can do some work while the execution unit is doing other tasks in parallel

06/03/2005

ET4508_p2 (KR)

43

Coprocessor: i387


The hardware implementation of floating point processing in the i387 means floating point operations run at much higher speed. The i386 can execute all mathematical expressions using software emulation of the i387.

06/03/2005

ET4508_p2 (KR)

44

80386: Classic CISC Processor


    

CISC = Complex Instruction Set Computer Complex instructions ...but code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for memory operations Few, but very useful CPU registers
06/03/2005 ET4508_p2 (KR) 45

80386 Execution Sequence


Coprocessor CISC Processor Register Register Register Execution Unit Register ALU

Prefetch Queue

Decoding Unit

Bus Interface

Microcode ROM

Microcode Queue

Control Unit

In a microprogrammed CISC the processor fetches the instructions via the bus interface into a prefetch queue, hich transfers them to a decoding unit. The decoding unit breaks the machine instruction into many elementary micro-instructions and apples them to a microcode queue. The micro-instructions are transferred from the microcode queue to the control and execution unit hich drives the ALU and the registers

06/03/2005

ET4508_p2 (KR)

46

80386 Complex Instructions




   

CISC drawback: Most instructions are so complicated, they have to be broken into a sequence of micro-steps These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic
06/03/2005 ET4508_p2 (KR) 47

RISC: Less is More


 

RISC = Reduced Instruction Set Computer 20/80 Rule: 20% of the instructions take up 80% of the time Sometimes executing a sequence of simple instructions runs quicker than a single complex machine instruction that has the same effect
06/03/2005 ET4508_p2 (KR) 48

RISC Ideas (1)




Reduce the instruction set to simplify the decoding




Smaller Instruction Set -> Simpler Logic -> Smaller Logic -> Faster Execution

Eliminate microcode hardwire all instruction execution Pipeline instruction decoding and executing do more operations in parallel
06/03/2005 ET4508_p2 (KR) 49

RISC Ideas (2)




Load/Store Architecture only the load and store instructions can access memory


All other instructions work with the processor internal registers This is necessary for single-cycle execution the execution unit can t wait for data to be read/written

06/03/2005

ET4508_p2 (KR)

50

RISC Ideas (3)




Increase number of internal register due to Load/Store Architecture Also registers are more general purpose and less associated with specific functions Compiler designed along with the RISC processor design. Compiler has to be aware of the processor architecture to produce code that can be executed efficiently

06/03/2005

ET4508_p2 (KR)

51

Instruction Pipelining - Operations Can Be Carried Out in Parallel




 

 

Read the instruction from memory or the prefetch ueue (instruction fetch phase) Decode the instruction (decode phase) Where necessary, fetch the operands (operand fetch phase) Execute the instruction (execute phase) Write back the result (write-back phase)
06/03/2005 ET4508_p2 (KR) 52

Pipelined Execution
Instruction Fetch Operand Fetch Write-back Instruction k-4 Instruction k-3 Instruction k-2 Instruction k-1 Instruction k Execution Instruction k-3 Instruction k-2 Instruction k-1 Instruction k Decode Instruction k-1 Instruction k

Cycle n

Instruction k

Instruction k-2 Instruction k-1 Instruction k

Result k-4

Cycle n+1

Instruction k+1 Instruction k+2 Instruction k+3 Instruction k+4

Result k-3

Cycle n+2

Instruction k+1 Instruction k+2 Instruction k+3

Result k-2

Cycle n+3

Instruction k+1 Instruction k+2

Result k-1

Cycle n+4

Instruction k+1

Result k

Superscalar Architecture


The processor may have more than one pipeline (Pentium ) Where possible each pipeline works independently


Not always possible

May achieve average completed execution of more more than one instruction per clock cycle
06/03/2005 ET4508_p2 (KR) 54

Pipeline Challenges


More logic per pipeline stage resource can t be used twice




same

E.g. can t re-use ALU for computing implied addresses Delayed Jump/Branch Data and Register dependency, e.g.
ADD reg1, reg2, reg7 AND reg6, reg1, reg3

Synchronisation Problems
 

06/03/2005

ET4508_p2 (KR)

55

Getting the Benefits of Pipelining




 

Simplified Instruction decoding  Simpler, faster logic On-chip cache memories  Local memory on-chip to avoid memory access bottlenecks Floating Point pipeline for FP coprocessor Speculative Execution to get around pipeline flushes

06/03/2005

ET4508_p2 (KR)

56

Software Implications of RISCs




Optimising Compiler must know how pipeline works


(Compiler must be aware of pipeline delays, and insert NOPs if need be)

Lower code density in RISC because instructions are less efficient




PowerPC code takes up to 30% more code to do the same tasks as an x86 CPU more memory accesses, potential performance impact...
ET4508_p2 (KR) 57

06/03/2005

80486: IA-32 with RISC elements


  

   

 

Introduced 04/ 1 Greatly improved 80386 CPU Hard-wired implementation of frequently used instructions (as in RISCs). On average 2 clock cycles/instruction. 5 stage instruction pipeline Internal L1 Cache Memory (8kB) cache controller On-chip Floating Point coprocessor (FPU) Longer Prefetch ueue (32-bytes as opposed to 16 on the 80386) Higher frequency operation: up to 120MHz >1.2M transistors, 0.8Qm CMOS. 168-pin PGA.
06/03/2005 ET4508_p2 (KR) 58

80486 Block Diagram

A31-A0 D31-D0 Control and Status Signals Bus Interface

Cache (8K bytes)

Paging Unit

Segmentation Unit

Register and ALU

Prefetcher (32-byte queue)

Decoding Unit

Control Unit

Floating Point Unit

i486 CPU

06/03/2005

ET4508_p2 (KR)

59

80486 Pipeline
(memory access) Instruction Fetch Decode 1 Decode 2 Write-back
Write res lt into eax

Cycle n

ADD eax, mem32

Cycle n+1

Decode ADD, fetch mem32

Cycle n+2

Decode ADD (contin ed)

Cycle n+3

Add eax and mem32

Cycle n+4

06/03/2005

ET4508_p2 (KR)

Execution

60

S-ar putea să vă placă și