Sunteți pe pagina 1din 35

I/O Systems II:

CPU I/O Interaction


Busses: Need for diversity Intel Pentium II Xeon

3.2 GB/s 3.2 GB/s


400 MHz 512KB 400 MHz 512KB
CPU core Cache CPU core Cache

800 MB/s

AGP
AGP 2x Intel440 2GB
Graphics 533MB/s AGPset 800MB/s SDRAM
PCI
133MB/s
USB
PCI-SCSI 1.5 MB/s
PCI-ISA
40 MB/s

IDE Camera Mouse


SCSI

33 MB/s
ISA
HDD 2 HDD 1
Ethernet 16.7MB/s
keyboard Audio
What do I/O devices look like?

 Example: Keyboard Register to store whether


a new character has arrived

Controller

Keyboard Status
Reg How to get these
values into computer?
Data
Reg

Input buffer (FIFO)


to store a number of characters Register
(dont want to miss any) to store typed character
How Does CPU Talk to I/O Devices?

 The Mechanics
 The Algorithmics
First, the mechanics:
 Two methods are used to address the device:
 Special I/O instructions
 Memory-mapped I/O

Issues:
 Preventing unauthorized access to devices
 Extensibility / Flexibility
Special I/O instructions:

 Format: device # and word


 Device number: separate device space
 Example:
 Keyboard input buffer <-> device #3

 Command word:
 sent on the I/O buss data lines
 or on memory bus with IO signal indicating IO address space
 from register or memory

 x86:
 64KB IO space (16 bits)
 Instruction examples:

IN $AL, 33
OUT 33, $AL
Memory Mapped I/O

 Portions of address space are assigned to I/O device


 Read and writes to those addresses interpreted
as commands to the I/O devices
 User programs are prevented from issuing I/O operations directly:
 The I/O address space is protected by the address translation
Comparisons

 Memory Mapped Pros:


 Easy to integrate, no special instructions
 Leverage all the power of instructions to access memory
 Protection implemented through virtual memory
 Physical memory locations for I/O are mapped in kernels virtual space

 I/O instructions
 Device space does not consume address space
 Makes address and IO extensibility independent
 Make instructions privileged
 Does not prevent memory mapping
Hardware for Memory or I/O mapping

Address

CPU Data
Control

I/O Interface I/O Interface


address
decoders

I/O Device in out cmd status

I/O Device
The Algorithmics

 The OS needs to know when:


 The I/O device has completed an operation
 The I/O operation has encountered an error

 This can be accomplished in two different ways:


 Polling:
 The I/O device puts information in a status register
 The OS periodically check the status register
 I/O Interrupt:
 Whenever an I/O device needs attention from the processor,
it interrupts the processor from what it is currently doing.
Polling: Programmed I/O

CPU
Is the
data busy wait loop
ready? not an efficient
way to use the CPU
unless the device
yes no is very fast!
Memory read
IOC data
but checks for I/O
completion can be
dispersed among
device store computation
data intensive code
done? no
 Advantage: yes
 Simple: the processor is totally in control and does all the work
 Disadvantage:
 Polling overhead can consume a lot of CPU time
Overhead Examples for Polling

 Assumptions:
 500 MHz processor
 400 clock cycles for a polling operation

 Mouse: polled 30 times per second:


 Clock cycles per second for polling: 30 x 400 = 12,000 cycles per second
 Fraction of processor cycles: 12K / 500M = 0.002%

 Hard disk:
 transfers data in 4-byte word chucks
 transfer rate = 4MB/s
 Need to poll: (4MB/s) / 4 bytes per transfer = 250K times per second
 Clock cycles per second for polling: 250K x 400 = 100M
 Fraction of processor cycles: 100M / 500M = 20%
Interrupt Motivation
 Polling is inefficient for highly active components
 Even for low rate devices, use the time between polling periods

 Polling: are you done yet are you done yet are you.
 Interrupt: Just tell me when you are done, meanwhile Im going to do
something else.

 I/O interrupt is like exceptions except:


 An I/O interrupt is asynchronous
 Further information needs to be conveyed

 I/O interrupt is asynchronous with respect to instruction execution:


 I/O interrupt is not associated with any instruction
 I/O interrupt does not prevent any instruction from completion
Interrupt Driven Data Transfer
add
CPU sub user
(1) I/O and program
interrupt or
nop
(2) save PC
memory
Memory
IOC (3) interrupt
service addr
read
store interrupt
device ... : service
(4) rti routine
 Advantage:
 User program progress is only halted during actual transfer
 Disadvantage, special hardware is needed to:
 Cause an interrupt (I/O device)
 Detect an interrupt (processor)
 Save the proper states to resume after the interrupt (processor)
More on Interrupts
 Convey identity of the device generating the interrupt
 Vectored interrupt
 Status register
 Determines proper ISR to perform
 Convey different urgencies:
 If two things arrive at the same time, which to do first?
 Interrupt request needs to be prioritized
 Dealing with multiple interrupts
 Example: keyboard interrupt followed by disk drive interrupt
 Two techniques
 Masking: Disable interrupt when you enter a ISR
 Nesting: Take an interrupt within an interrupt
 Nested interrupts:
 Disable interrupts with lower priority, service higher priority
Hardware for Interrupts
Ireq

CPU Data
Ack

Wire-OR request
I/O Interface I/O Interface I/O Interface
Daisy chained ack
(take ack if you
requested, otherwise
pass it on.)
I/O Device I/O Device I/O Device
When a requestor
receives ack, provides
data vector (what
caused this interrupt?)
Delegating I/O Responsibility from the CPU: DMA
CPU sends a starting address,
direction, and length count
to DMAC. Then issues "start".
 Direct Memory Access (DMA):
 External to the CPU CPU
 Act as a maser on the bus
 Transfer blocks of data to or from memory
without CPU intervention

Memory DMAC IOC

device
DMAC provides handshake
signals for Peripheral
Controller, and Memory
Addresses and handshake
signals for Memory.
Issues with DMA and Memory Hierarchy

 DMA: Physical or Virtual addresses?


 Virtual:
 DMA now has to do address translation
 If replacement is necessary, needs to inform the CPU
 Physical:
 Difficult to cross page boundaries

 If data is in cache, but is replaced by DMA????


 Stale data
 Coherency problems
Summary of Mechanics

 IO device mapping & communication

 Interrupts

 Data Transfer Mechanisms


Summary of Algorithmics (1)
Programmed IO Interrupt driven IO
Issue Read Issue Read
Do something
Command to IO Command to IO
else

Read Status of Read Status of Interrupt


IO module IO module
Not
Ready
Status? Error?
Ready OK
Read Word from Read Word from
IO module IO module

Write Word to Write Word to


Memory Memory

Done? Done?
Summary of Algorithmics (2)
DMA I/O

Issue Read
Do something
Cmd to DMAC
else

Read Status of Interrupt


DMAC

Error?

Continue
Responsibilities of the Operating System

 The operating system acts as the interface between:


 The I/O hardware and the program that requests I/O
 Three characteristics of the I/O systems:
 The I/O system is shared by multiple programs using the processor
 I/O systems often use interrupts (external generated exceptions) to
communicate information about I/O operations.
 Interrupts must be handled by the OS because they cause a transfer to
supervisor mode
 The low-level control of an I/O device is complex:
 Managing a set of concurrent events
 The requirements for correct device control are very detailed
Operating System Requirements

 Provide protection to shared I/O resources


 Guarantees that a users program can only access the
portions of an I/O device to which the user has rights

 Provides abstraction for accessing devices:


 Supply routines that handle low-level device operation

 Handles the interrupts generated by I/O devices

 Provide equitable access to the shared I/O resources


 All user programs must have equal access to the I/O resources

 Schedule accesses in order to enhance system throughput


OS abstractions

 UNIX: everything is a File:


 Dont see disks at all
 Even other devices (keyboard, floppy, terminal) are files in /dev

 fopen Tell OS that program wants access to file/device


 fread get data from device
 fwrite send data to device
 fseek go to known location in file (no meaning for device?)
System-Level View - More Bandwidth

System Bus
422 MB/s Memory

SCSI Disks
PCI 40 MB/s 10 MB/s
133 MB/s each

 Multiple disks,
multiple busses
SCSI
40 MB/s
Problem: Disk Bandwidth

 Transfer rate:
 Rotational speed
 Number of sectors per track

 Bandwidth hurt by single arm


 Only one access at a time
 Most new accesses will require seek time

 Solution: build virtual disk out of array of disks


 Advantage: multiple arms
 Advantage: greater peak bandwidth
 Advantage: cost
Disk Arrays
 Interleave data across multiple disks
 striping provides aggregate bandwidth
 stripe unit depends on application

 Disadvantage: availability

80 MB/s

10 MB/s each
Reliability and Availability
 Two terms that are often confused:
 Reliability: Is anything broken?
 Availability: Is the system still available to the user?
 Availability can be improved by adding hardware:
 Example: adding ECC on memory
 Reliability can only be improved by:
 Bettering environmental conditions
 Building more reliable components
 Building with fewer components
 Improve availability may come at the cost of lower reliability
RAID - Redundant Arrays of Inexpensive Disks
 Write one unit per drive
 Compute the parity and store it on the eight drives
 Cheaper than mirroring
 reduces overhead to 1/9

parity
Different Levels of RAID

 RAID 1 - mirroring
 uses twice as many disks to shadow the data (better performance high cost)
 RAID 3 - bit interleaved
 reduces cost to 1/N, where N is the number of disks in a group
 RAID 4 - block interleaved
 RAID 5 - block-interleaved, distributed parity
 parity is interleaved across disks in the array to balance load

0 1 2 3 P0 0 1 2 3 P0
4 5 6 7 P1 4 5 6 P1 7
8 9 10 11 P2 8 9 P2 10 11
12 13 14 15 P3 12 P3 13 14 15
16 17 18 19 P4 P4 16 17 18 19
20 21 22 23 P5 20 21 22 23 P5

RAID 4 - Block Interleaved Parity RAID 5 - Distributed Parity


RAID 5 Functions

Fault-Free Read Fault-Free Write

2 4
1 3

D D D P D D D P

Degraded Read Degraded Write

D D D P D D D P
Internet: Just another bus?

 Not by a long shot

 Dynamic: nodes are added and subtracted


 Robust: machines go down, noisy environment
 HUGE...

 If you want to know about these issues, take a network course

 Well look at Ethernet


 Serial bus, one coaxial cable up to 500m long
 10Mbps (originally)
Ethernet: A serial bus

 Theres no clock!!!
 Manchester encoding: 0 0 1 0 1

 Zero bit transmitted as 0, 1


 One bit transmitted as 1,0
 Clock can be recovered

 Packet

64 bits 48 48 16 32 8
Destination Source
Preamble Type Body CRC Postamble
Address Address
Ethernet: Sharing the link

 Shared link: how do we do arbitration?


 There is no arbiter. Arbiters require some knowledge of other bus masters
 Collision detection and roll-back
1. Detect whether there is no other node using the link
2. Begin transfer
3. But what if some other node began transaction at same time?
This is a collision
Each node must listen and detect collisions when they are sending data
4. If collision detected,
Continue transfering for 51.2 microseconds (want all nodes to see collision)
stop transfer and wait a random period of time
re-start transmission
 Poorly utilized link
 The more data that needs to get transferred, the more collisions there will be
Ethernet: Getting data to the right place

 No memory mapping

 Each Ethernet device has a unique ID (address)


 Every manufacturer has different prefix
 Total size is 48 bits = 281 trillion addresses

 Network adapters only take packets addressed with their ID


 How do you know the Ethernet address of your destination?
 How do machines inform other machines where to find them?

 Not questions for this class...


Summary:

 Talking to IO devices:
 Memory and IO spaces
 Polling (Programmed IO)
 Interrupt driven

 Data transfer via DMA


 Reduce the burden on the CPU
 Disk Arrays
 Higher performance, availability
 Ethernet as a serial bus
 Clock recovery
 Packetization
 Addressing
 Arbitration

S-ar putea să vă placă și