Sunteți pe pagina 1din 90

CS6303 COMPUTER

ARCHITECTURE.

8 Great Ideas in Computer


Architecture
. ideas that computer architects have invented in the last
The following are eight great
60 years of computer design.
1. Design for Moores Law.
2. Use Abstraction to Simplify Design
3. Make the common case fast
4. Performance via parallelism
5. Performance via pipelining
6. Performance via prediction
7. Hierarchy of memories
8. Dependability via redundancy

1. Design for Moores Law.


The Number of Transistors in a integrated circuit
doubles approximately every two years. (Gordon Moore, one of the
founders of Intel.)

As computer designs can take years, the resources available per


chip can easily double or quadruple between the start and finish
of the project.
Computer architects must anticipate where the technology will
be when the design finishes rather than design for where it

8 Great Ideas in Computer Architecture.


2. Use Abstraction to Simplify Design
-Incomputer architecture, a computer system is usually represented as
consisting of five abstraction
levels:hardwarefirmware,assembler,operating systemandprocesses.
-Incomputer science, an abstraction level is a generalization of a model
oralgorithm.
-The simplification provided by a good abstraction layer facilitates easy
reuse.
-Both computer architects and programmers had to invent techniques to
make themselves
more productive, for otherwise design time would lengthen as
dramatically as resources
grew by Moore's Law.
-A major productivity technique for hardware and soft ware is to use

3. Make the common case fast

-Making the common case fast will tend to enhance


performance better than optimizing the rare case.
- It implies that you know what the common case is, which is
only possible with careful experimentation and measurement.

4. Performance via parallelism

-Computer architects have offered designs that get more


performance by performing operations in parallel.
-Parallel computing is a form of computation in which many
calculations are carried out simultaneously, operating on the
principle that large problems can often be divided into smaller
ones which are then solved concurrently
( Parallel).
- Parallelism has been employed for many years, mainly in
high performance computing.

5. Performance via pipelining


-Pipelining is a technique used in the design of
computersto increase the instructionthroughput(the
number of instructions that can be executed in a unit of
time).
-The basicinstruction cycleis broken up into a series of
pipeline stages.
--Rather than processing each instruction sequentially (one
at a time, finishing one instruction before starting the next),
each instruction is split up into a sequence of steps so that
different steps can be executedconcurrently(by different
circuitry) and inparallel(at the same time).
--Pipelining increases instruction throughput by performing
multiple operations at the same time (in parallel), but does
not reduce instructionlatency(the time to complete a single
instruction from start to finish) as it still must go through all
steps.

6. Performance via prediction

--To improve the flow and throughput in a instruction


pipeline,Branch predictors play a critical role in achieving
high effective performancein many modernpipelined
microprocessorarchitectures.
--Without branch prediction, the processor would have to
wait until the conditional jump instruction has passed the
execute stage before the next instruction can enter the
fetch stage in the pipeline.
--The branch predictor attempts to avoid this waste of
time by trying to guess whether the conditional jump is
most likely to be taken or not taken.

7. Hierarchy of memories
--Programmers want memory to be fast, large, and cheap,
as memory speed often shapes performance, capacity limits
the size of problems that can be solved, and the cost of
memory today is often the majority of computer cost.
--Architects have found that they can address these
conflicting demands with a hierarchy of memories, with the
fastest, smallest, and most expensive memory per bit at the
top of the hierarchy and the slowest, largest, and cheapest
per bit at the bottom.
--Caches give the programmer the illusion that main
memory is nearly as fast as the top of the hierarchy and
nearly as big and cheap as the bottom of the hierarchy.

Memory Hierarchy.

Levels of Memory Hierarchy


Upper Level

Capacity
Access Time
Cost

faster

CPU Registers
100s Bytes
1s ns

Registers

Cache
K Bytes
4 ns
1-0.1 cents/bit

Cache

Main Memory
M Bytes
100ns- 300ns
$.0001-.00001 cents /bit
Disk
G Bytes, 10 ms
(10,000,000 ns)
-5 -6
10 - 10 cents/bit
Tape
infinite
sec-min
-8
10

Instr. Operands

Blocks
Memory
Pages
Disk
Files
Tape

Larger
Lower Level

8. Dependability via redundancy

-- Computers not only need to be fast; they need to be


dependable.
-- Since any physical device can fail, we make systems dependable
by including redundant components that can take over when a
failure occurs and to help detect failures.
-Examples: Systems designersusually provide failover capability in
servers, systems or networks requiring continuous availability -the used term ishigh availability-- and a high degree ofreliability.

System Software: Software that provides services that are


commonly useful, including operating systems, compilers,
loaders and assemblers.
Operating System: Supervising Program that manages the
resources of a computer for the benefit of the programs that
run on that computer.
Compiler: A Program that translates high-level language
statements into assembly language statements.
binary digit :Also called a bit. One of the two numbers in base
2 (0 or 1)
that are the components of information.
Instruction : A command that computer hardware
understands and obeys.
assembler : A program that translates a symbolic version of
instructions(Assembly language program) into the binary
version.

High level Programming Language: A portable


language such as C, C++, Java or Visual Basic that is
composed of words and algebraic notation that can be
translated by a compiler into assembly language.
Input Device : A mechanism through which the
computer is fed information, such as a keyboard.
output device: A mechanism that conveys the result of a
computation to a user, such as a display, or to another
computer.
liquid crystal display: A display technology using a thin
layer of liquid
polymers that can be used to transmit or block light
according to whether a charge is applied.
active matrix display: A liquid crystal display using a
transistor to
control the transmission of light at each individual pixel.
pixel :The smallest individual picture element. Screens
are composed of hundreds of thousands to millions of

Integrated circuit: Also called a chip. A device combining


dozens to millions of transistors.
central processor unit(CPU) :Also called processor. The
active part of the computer, which contains the datapath and
control and which adds numbers, tests numbers, signals I/O
devices to activate, and so on.
Datapath: The component of the processor that performs
arithmetic operations.
control :The component of the processor that commands the
datapath,
memory, and I/O devices according to the instructions of the
program.
Memory: The storage area in which programs are kept when
they are
running and that contains the data needed by the running
programs.
dynamic random access memory (DRAM): Memory built as

cache memory : A small, fast memory that acts


as a buffer for a slower, larger memory.
static random access memory (SRAM): Also
memory built as an integrated circuit, but faster
and less dense than DRAM.
main memory :Also called primary memory.
Memory used to hold programs while they are
running; typically consists
of DRAM in todays computers.
secondary memory: Nonvolatile memory used to
store programs and data between runs; typically
consists of flash
memory in PMDs and magnetic disks in servers.
magnetic disk :Also called hard disk. A form of
nonvolatile secondary memory composed of
rotating platters coated
with a magnetic recording material. Because they
are rotating mechanical devices, access times are
about 5 to 20 milliseconds and cost per gigabyte in

flash memory: A nonvolatile semiconductor memory. It is cheaper


and slower
than DRAM but more expensive per bit and faster than magnetic
disks.
Access times are about 5 to 50 microseconds and cost per gigabyte
in 2012
was $0.75 to $1.00.
local area network (LAN) :A network designed to carry data within
a geographically
confined area, typically within a single building.
wide area network(WAN): A network extended over hundreds of
kilometers that can
span a continent.

Components of a Computer System.


Five Classic components are:
1.Input
Control.

2. Output

3. Memory

4. Datapath

5.

The processor gets instructions and data from memory. Input


writes data to memory, and output reads data from memory.
Control sends the signals that determine the operations of the
datapath, memory, input, and output.
LCD :The most fascinating I/O device is probably the graphics
display. Most personal mobile devices use liquid crystal
displays (LCDs) to get a thin, low-power display.
Touch screen: While PCs also use LCD displays, the tablets and
smart phones of the Post PC era have replaced the keyboard
and mouse with touch sensitive displays, which has the
wonderful user interface advantage of users pointing directly

Components cont
--While there are a variety of ways to implement a touch
screen, many tablets today use capacitive sensing. Since
people are electrical conductors, if an insulator like glass
is covered with a transparent conductor, touching
distorts the electrostatic field of the screen, which results
in a change in capacitance. This technology can allow
multiple touches simultaneously, which allows gestures
that can lead to
attractive user interfaces.
--The list of I/O devices includes a capacitive multitouch
LCD display, front facing camera, rear facing camera,
microphone, headphone jack, speakers, accelerometer,
gyroscope, Wi-Fi network, and Bluetooth network.
Integrated Circuits (ICs): nicknamed Chips.

Components cont
---The processor is the active part of the computer,
following the instructions of a program. It adds numbers,
tests numbers, signals I/O devices to activate, and so on.
Occasionally, people call the processor the CPU, central
processing unit.
--The datapath performs the arithmetic operations, and
control tells the datapath, memory, and I/O devices what to
do according to the wishes of the instructions of the
program.
--The memory is where the programs are kept when they
are running; it also contains the data needed by the
running programs. The memory is built from DRAM chips.
DRAM stands for dynamic random access memory.
--Cache memory consists of a small, fast memory that acts
as a buffer for the DRAM memory. Cache is built using a
different memory technology, static random access
memory(SRAM). SRAM is faster but less dense, and hence
more expensive, than DRAM. SRAM and DRAM are two
layers of the memory hierarchy.

Components cont
ISA: Instruction set Architecture.
--It is the interface between the hardware and the lowest
level software.
--The ISA includes anything the Programmers need to
know to make a binary language program work
correctly including instructions, I/O devices and so on.
-- The Operating system will encapsulate the details of
doing I/O, allocating memory, and other low level system
functions so that application programmers do not need
to worry about such details.
-- The combination of the basic instruction set and the
OS interface provided for application programmers is
called the Application binary interface.(ABI).

A Safe place to save Data.


--Volatile and Non volatile Memory.
--Main Memory or Primary Memory.
--Secondary Memory
--Flash Memory.
Communicating with other Computers.
--Communication: Information is exchanged between
computers at high speeds.
--Resource sharing: Rather than each computer
having its own I/O devices, computers on the network
can share I/O devices.
--Nonlocal access: By connecting computers over long
distances, users need not be near the computer they
are using.
-- LAN, WAN,
--Bandwidth.

Technologies for building


Processors and Memory.

Technology Cont
--The IC manufacturing process starts with a silicon
crystal ingot.
-- The Ingots are 8-12 inches in diameter and about
12 to inches
long.
--An Ingot is finely sliced into wafers no more than
0.1 inches
thick.
--These wafers then go through a series of
processing steps, during which patterns of
chemicals are placed on each wafer, creating the
Transistors, conductors, and insulators .
-- In figure, One wafer produced 20 dies ,of which
17 passed testing.( X means the die is bad). The
yield of good dies in this case was 17/20. or 85 %.
--These good dies are then bonded into
packages(connected to the input/output pins of a

die :The individual rectangular sections that are cut


from a wafer, more informally known as chips.
yield :The percentage of good dies from the total
number of dies on the wafer.
Transistor: An on/ off switch controlled by an electric
signal.
VLSI : A device(IC) containing millions of transistors.
Silicon: A natural element that is a semiconductor.
Semiconductor: A substance that does not conduct
electricity well.

Performance:
Response time or execution time:The total time required
for the computer to complete a task, including disk accesses,
memory
accesses, I/O activities, operating system overhead, CPU
execution
time, and so on.
Throughput: The total amount of work done in a given
time.
Bandwidth: The amount of data that can be carried from
one point to another in a given time period (usually a
second). This kind of bandwidth is usually expressed in bits
(of data) per second (bps). Occasionally, it's expressed as
bytes per second (Bps).
clock cycles per instruction (CPI): Average number of
clock cycles per
instruction for a program or program fragment.

Performance:
we can relate performance and execution time for a computer
X:

Measuring Performance:
clock cycle: Also called tick, clock tick, clock period,
clock, or cycle.
The time for one clock period, usually of the processor clock,
which
runs at a constant rate.
clock period : The length of each clock cycle.

Example:

Instruction Performance:
Clock cycles per instruction (CPI): Average number of
clock cycles per
instruction for a program or program fragment.

Example:
Suppose we have two implementations of the same
instruction set architecture. Computer A has a clock
cycle time of 250 ps and a CPI of 2.0 for some
program, and computer B has a clock cycle time of
500 ps and a CPI of 1.2 for the same program. Which
computer is faster for this program and by how much?

The Classic CPU Performance


Equation.

Example:

The basic Components of Performance and how


each is Measured.

Addressing Modes: The


method used to identify the
location
ofare
anthe
operand.
The Following
MIPS Addressing
Modes.
1.Immediate addressing
2.Register addressing
3.Base or displacement addressing
4. PC-relative addressing
5.Pseudodirect addressing

Addressing Modes: Cont..

1. Immediate addressing: The operand is a constant


within the instruction itself. i.e. The operand is specified in
the instruction itself.

2. Register addressing: The operand is in a CPU


register. The register is specified in the
instruction.

Addressing Modes: Cont..


3. Base or displacement addressing: The operand is at
the memory location whose address is the sum of a
register and a constant in the instruction.

4. PC-relative addressing: The branch address is


the sum of the PC and a constant in the instruction.

5. Pseudo-direct addressing: The jump


address is the 26 bits of the instruction
concatenated with the upper Four bits of the
PC.

Pseudo-direct addressing

Cont

Addressing Modes Summary


The method used to identify the location of an operand.
The Following are the MIPS Addressing Modes.
1. Immediate addressing, where the operand is a
constant within the instruction itself.
2. Register addressing, where the operand is a register.
3. Base or displacement addressing, where the operand is
at the memory location whose address is the sum of a
register and a constant in the instruction.
4. PC-relative addressing, where the branch address is the
sum of the PC and a constant in the instruction.
5. Pseudodirect addressing, where the jump address is the
26 bits of the
instruction concatenated with the upper Four bits of the
PC.

Uniprocessors To
1. Increasing the clock speed of Uniprocessor has
Multiprocessors:
2.

3.
4.
5.
6.

7.

reached saturation and cannot be increased beyond


a certain limit because of power consumption and
heat dissipation issues.
As the physical size of chip decreased, while the
number of transistors/chip increased, clock speed
increased, which boosted the heat dissipation across
the chip to a dangerous level. Cooling & heat sink
requirement issues were there.
There were limitations in the use of silicon surface
area.
There were limitations in reducing the size of
individual gates further.
To gain Performance within a single core, many
techniques like pipelining, super pipelined,
superscalar architectures are used .
Most of the early dual core processors were running
at lower clock speeds, the rational behind is that a
dual core processor with each running at 1Ghz
should be equivalent to a single core processor
running at 2 Ghz.
The Problem is that this does not work in practice

Uniprocessors To Multiprocessors Cont..


8. In Multi-core processors, the benefit is more on
throughput than on response time.
9. In the past, programmers could rely on innovations in
the hardware, Architecture and compilers to double
performance of their programs every 18 months
without having to change a line of code.
10. Today, for programmers to get significant
improvement in response time, they need to rewrite
their programs to take advantage of multiple
processors and also they have to improve performance
of their code as the number of core increases.
The need of the hour is..
11. Ability to write Parallel programs
12. Care must be taken to reduce Communication and
Synchronization overhead. Challenges in Scheduling,
load balancing have to be addressed.

Power Wall: Power & Energy in


Integrated circuits.
1.Power is the biggest challenge facing the computer
designer for every class of computer.
2. First, power must be brought in and distributed
around the chip which includes hundreds of pins and
multiple interconnection layers just for power and
ground.
3. Second, power is dissipated as heat and must be
removed.
How should a system architect think about
performance, power and energy?
There are three primary concerns.
What is the maximum power a processor ever
requires?
--If it attempts to draws more Power than a Power
supply can provide, by drawing more current, the
voltage will eventually drop which can cause the
device to malfunction
--Modern processors can vary widely in power
consumption with high peak currents. Hence, they

Power & Energy Cont


What is the sustained Power consumption?
--This Metric is called Thermal design power
(TDP), since it determines the cooling
requirement.
--Power supply is usually designed to match or
exceed TDP.
--Failure to provide adequate cooling will allow
the temperature to exceed the maximum value
resulting in device failure.
--Modern processors provide two features to
manage heat.
1. Reduce clock rate, thereby reducing the
power.
2. Thermal overload trip is activated to power
down the chip.
Which metric is the right one for comparing
processors: energy or power?

Power & Energy Cont.


---The dominant technology for integrated circuits is called
CMOS (complementary metal oxide semiconductor). For
CMOS, the primary source of energy consumption is socalled dynamic energythat is, energy that is consumed
when transistors switch states from 0 to 1 and vice versa.
--The dynamic energy depends on the capacitive loading of
each transistor and the voltage applied:

Power & Energy Cont.


Frequency switched is a function of the clock rate. The
capacitive load per transistor is a function of both the
number of transistors connected to an output (called the
fan out) and the technology, which determines the
capacitance of both wires and transistors.
Although dynamic energy is the primary source of energy
consumption in CMOS, static energy consumption occurs
because of leakage current that flows even when a
transistor is off. In servers, leakage is typically
responsible for 40% of the energy consumption. Thus,
increasing the number of transistors increases power
dissipation, even if the transistors are always off. A
variety of design techniques and technology innovations
are being deployed to control leakage, but its hard to
lower voltage further.

Representing instructions in the


Computer.
--Difference between the way humans
instruct computers and the way computers
see instructions.
--Instructions are kept in computer as a
series of high and low electronic signals and
may be represented as numbers.
--Each piece of an instruction can be
considered as an individual number, and
placing these numbers side by side forms the
instruction.
--

Representing instructions in the


Computer Cont..

Instruction format :A form of representation of an


instruction composed of fields of binary numbers.

Memory Operands:

Logical Operations:

Instructions for making decisions:

ALU: Arithmetic and Logic


Unit.
Hardware that performs addition,
subtraction and other Logical operations
such as AND and OR.

Exception : Also called Interrupt .


An unscheduled event that disrupts
program execution; used to detect
overflow.

Addition and Subtraction:

Subtraction:

Four Bit Adder-Subtractor:

Checking Overflow
Note that in the previous slide if the
numbers considered to be signed V
detects overflow. V=0 means no overflow
and V=1 means the result is wrong
because of overflow
Overflow can be happened when adding
two numbers of the same sign (both
negative or positive) and result can not be
shown with the available bits. It can be
detected by observing the carry into sign
bit and carry out of sign bit position. If
these two carries are not equal an
overflow occurred. That is why these two
carries are applied to exclusive-OR gate to
generate V.

Multiplication:

An Improved Version of the


Multiplication Hardware:

MIPS Multiplication Instructions:

Division:

An Improved version of the


division Hardware:

MIPS Division Instructions:

MIPS Multiply & Division Instructions:

S-ar putea să vă placă și