Sunteți pe pagina 1din 37

Scaling Power and the Future of CMOS

Mark Horowitz, EE/CS Stanford University


A Long Time Ago

In a building far away

A man made a prediction

On surprisingly little data

That has defined an industry

2
Moores Law

3
CMOS Computer Performance
100.00
intel 486
Specint 2006 intel pentium
intel pentium 2
intel pentium 3
intel pentium 4
intel itanium
10.00 Alpha 21064
Alpha 21164
Alpha 21264
Sparc
SuperSparc
Sparc64
1.00 M ips
HP PA
Power PC
AM D K6
AM D K7
AM D x86-64
IBM Power
0.10 SUN UltraSPARC
Intel Core 2
AM D Opteron
AM D Phenom

0.01
88 90 92 94 96 98 00 02 04 0
4
CMOS Computer Performance
100.00
intel 486
Specint 2006 intel pentium
intel pentium 2
intel pentium 3
intel pentium 4
intel itanium
10.00 Alpha 21064
Alpha 21164
Alpha 21264
Sparc
SuperSparc
Sparc64
1.00 M ips
HP PA
Power PC
AM D K6
AM D K7
AM D x86-64
IBM Power
0.10 SUN UltraSPARC
Intel Core 2
AM D Opteron
AM D Phenom

0.01
88 90 92 94 96 98 00 02 04 06 08 10
5
Moores Original Issues

Design cost
Power dissipation
What to do with all the functionality possible

ftp://download.intel.com/research/silicon/moorespaper.pdf
6
Scaling MOS Devices

JSSC Oct 74, pg 256


In this ideal scaling
V scales to V, L scales to L
So C scales to C, i scales to i (i/ is stable)
Delay = CV/I scales as
Energy = CV2 scales as 3
7
Processor Power

1000

Watts

100

10

1
88 90 92 94 96 98 00 02 04 06 08 10
8
Power Density
1.00

Watts/mm2

0.10

0.01
85 87 89 91 93 95 97 99 01 03 05 07 09
9
Why Power Increased
10000

Clock Frequency (MHz)

1000

100

10

85 87 89 91 93 95 97 99 01 03 05 07 09

10
Good News

Die growth & super frequency scaling have stopped


100
Cycle in FO4

10
85 87 89 91 93 95 97 99 01 03 05
11
Processor Power

They were high power too


100

10

1
85 87 89 91 93 95 97 99 01 03
12
Bad News

Voltage scaling has


stopped as well
kT/q does not scale
Vth scaling has power
consequences

If Vdd does not scale


Energy scales slowly

Ed Nowak, IBM
13
Technology Scaling Today

Device sizes are still scaling


Cost/device is still scaling down
This is what is driving scaling

Voltages are not scaling very fast


Threshold voltages set by leakage
Gate oxide thickness is set by leakage
This means that the channel lengths are not scaling
Current is increasing by stressing silicon

Now Vdd and Vth are set by optimization

14
Other Technologies

For computing, I am not optimistic

Current problems are set by Physics:


Vdd set by kT/q
Sets the on-off ratio
Wire energy by CVdd2

To get around these limitations


Need to create something very different!

15
Problem with Different Technologies

Design processes have been optimized for silicon


Working on making it better for over 30 years

Silicon has set:


Notions of logic (binary signals), digital design styles
Computing (distinct memory and logic)
Relative size and speed of memory logic

No new technology will fit this mold well


Changing the world is hard
If you build it, generally they dont come
Unless they absolutely have to
16
Maturing of Silicon

Silicon will not disappear


It will still be a huge business
Growth rate is slower, Eventually very slow scaling

Silicon will become like concrete and steel


Basis of a huge industry
Critical to nearly everything
But fairly stable and predictable

Will remain the dominate substrate for computing


And performance be limited by power dissipation

17
Optimizing Energy

Every design is a point on a 2-D plane


Energy

Performance
18
Optimizing Energy

Every design is a point on a 2-D plane


Energy

Performance
19
Optimizing Energy

Every design is a point on a 2-D plane


Energy

Performance
20
Years of Low Power Research

Shown only one design technique to reduce power


Reduce waste

Can waste
Energy (clock gating, leakage control, etc)
Performance
Adding additional constraints to operation flow

If technology scaling has stalled


Need to focus on reducing waste in our systems

Increase in efficiency in our designs will set performance


21
Future Systems

Some simple math


Assume scaling continues
Dies dont shrink in size
Average power/gate must decrease by 2x / gen
Or need to build systems that increase in power

Since gates are shrinking in size


Get 1.4x from capacitive reduction
Where is the other factor of 1.4x ?

22
The Push for Parallelism

i ntel 386
i ntel 486
Watts/Spec*L*Vdd^2

i ntel penti um
i ntel penti um2
i ntel penti um3
i ntel penti um4
0.1 i ntel i tani um
Al pha 21064
Al pha 21164
Al pha 21264
Spar c
Super Spar c
Spar c64
Mi ps
10 parallel HP PA
Power PC
processors AMD K6
0.01 AMD K7
AMD x86-64
1 10 100 1000
Spec2000 *L
23
Exploit Parallelism / Scale Vdd

If you have parallelism

Energy/op
Add more function units
Fill up new die (2x)
Lower energy/op
E/P will decrease
Vdd, sizes, etc will reduce
Build simpler architectures
Performance
Works well when E/P is large
But what happens when that runs out?

24
Problem Reformulation

Best way to save energy is to do less work


Energy directly reduced by the reduction in work
But required time for the function decreases as well
Convert this into extra power gains
Shifts the optimal curve down and to the right
Energy/op

User Performance
25
Exploit Specialization

Optimize execution units for specific applications


Reformulate the hardware to reduce needed work
Can improve energy efficiency for a class of applications

DSP/Vector engines are more efficient than CPUs


Exploit locality, reuse
High compute density

ASICs are more efficient than DSP/Vector engines


If we want efficiency, we need more application optimization

26
ASIC/SOC Design Trends

Rising non-recurring engineering costs


Few
Increasing design complexity
ma
rke
Growing verification complexity
ts ca
Challenging physical design
n ju
Rising mask costs stify
A SIC
NR
E

27
28
ASIC Future Depends on Your Religion

Believe in correct by construction?


Believe in a generic high-level design language?

Historically both have not worked


I believe history is correct

Allowing people to connect complex blocks


Yields a complex validation problem, and a $20M+ design
General SoC, SiP will never be cheap

29
Computings Future:

Create a new universal computing platform


That is more efficient that today/tomorrows CMP
Bill Dally is working on this one

Leverage existing large volume processors for


other applications
GPUs moving into general processing
OMAP being distributed as Unix system

30
Can We Do Better?

Chip design is expensive since chips are complex

But the building blocks are well known


Many of the optimizations are well known too
Designer often do many of the same steps
Part of the reason for off-shoring
Dont need experience

Getting the system to work is hard


There is a lot of turning the crank that is needed

Can we automate some of the crank turning?

31
Chip Generator Idea
Application

$$$
Configure a Configure /
ASIC design
Process: programmable program a virtual
chip chip

Generate
optimized chip
Not
Efficient
Final Custom Configured Semi-custom
System System System
Product:

32
Another Way To Put It

ASIC
CMP Excel perf/watt per app
Performance

Generator Amortized costs


per Watt

Wide app domain


Best perf/watt per app
Highest costs
Specific for a single app
Worse perf/watt per app
Program-
Amortized costs
able Wider app domain

Per App Amortized


Cost Structure

33
Smart Memories - A Pretend Generator

Data Instruction
Cache Cache

M M M M M M M M T D D T D D M M

M M M M M M M M T D D M M M M M

Crossbar Crossbar

Processor Processor Processor Processor


FU FU

Smart Memories Architecture: Single Tile Chip Generator Derivative

34
Looks Promising

Large energy / performance gains are possible:


Use H.264 as example application
Use a GP-CMP chip generator
400-600X initial perf. gap
New And Exciting Challenges

Potential Benefits Process


(Wajahat, Rehan, Megan) (Ofer)
How do the user use a
Is it useful? How much
generator? How would a
better than CMP? How
generator be built? How
much worse than ASIC?
is the chip specified?

Chip
Multiprocessor
Generator
Optimization Verification
(Omid, Pete) (Megan, Ofer)
Given an application
Verification of a chip is
and a target (power /
difficult. How do we
perf.) How do we find
verify a generator?
the best chip plan?

36
Conclusions

The technology engine driving IT is slowing down


Power efficiency is the real problem

Application optimization leads to efficiency


But design is too expensive today to do this

Need to rethink design


Build chip generators not chips
These are virtual programmable chips
Have tools generates the real chips that customers want

37

S-ar putea să vă placă și