Sunteți pe pagina 1din 26

ASICS: THE HEART OF MODERN ROUTERS

Chang-Hong Wu
Distinguished Engineer, Juniper Networks
THE INTERNET EXPLOSION

# Web Sites 130EB/yr


Internet Capacity 162M
# Connected Devices 1B

Total Digitized Information 420EB


# Google Searches/Month 100M
31B/mo

12EB/yr
40M

110EB
4PB/yr 60PB/yr 9.5M
160M
1 25M 2.7B/mo
33K 1.7M

1988 1993 1998 2003 2008

Exponential growth, no matter how you measure it!


The clearest indication of value delivered to end-users

2 Copyright 2010 Juniper Networks, Inc.


DRIVING FORCE BEHIND EXPONENTIAL GROWTH

C S

C S N

C S
Information N
System
N

Digital Stored
Pipelining Microprocessor Multi-core
Computing Program
Computing
Digital Circuit Packet TCP/IP
Transmission Switching Switching HPN
Networking Flash
Digital Core
Disk DRAM
Storage Memory

Storage

3 Copyright 2010 Juniper Networks, Inc.


COMPUTER PERFORMANCE: 1988-2008

228 500,000 X over 20 years


226

224

222

220

218 System CAGR: 1.9x /year


216
Megahertz / MFlops

214

212
Super Computers
210

28

26

24 Microprocessor CAGR: 1.3x /year


22

20
88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08

4 Copyright 2010 Juniper Networks, Inc.


ROUTER PERFORMANCE 1988 2008
1000,000 X over 20 years (2x /year)
224

222
Post-ASIC era: 2.2x /year TX T1600

220
T640

Pre-ASIC era: 1.6x /year M160


218

M40
216
Megabits per second

214

212
Interface CAGR: 1.7x /year
210

28

26

24

22

20
88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08
5 Copyright 2010 Juniper Networks, Inc.
SILICON
THE FOUNDATION OF PERFORMANCE

General-Purpose
Microprocessor
(G)
Services
Engine
(S)
Edge
Engine
Core (E)
Fabric Engine
Engine (C)
(F)

110 10100 1001000 1000100,000 100,000


Instructions/packet
6 Copyright 2010 Juniper Networks, Inc.
COMPARISON OF SILICON TECHNOLOGIES

Technology Advantages Disadvantages Use Cases


Flexibility is
General Poor performance, more important
Very flexible
Purpose CPUs density, and power than
performance

Field Smaller up-front Lower performance, Volume is low;


Programmable development cost; density, and power; Changes are
Gate Arrays Field upgrades High per part price expected

Off-the-shelf Can fall short of


Flexible. Jump
performance, Differentiation is
Network straight into
power, and not important
Processors software design
functionality targets
High
High upfront cost;
Tailor to your performance;
ASICs specification
Long development
Low production
cycle
cost

7 Copyright 2010 Juniper Networks, Inc.


SYSTEM ARCHITECTURE
Market requirements
Performance, density, feature, cost targets

Software/hardware interactions
Functional partitioning

Silicon process technology evaluation


Cost/performance tradeoffs

Memory choices
Stores configuration, FIB tables, etc.
Temporary working buffers

Chip partitioning
IO and logic ratio, die size, interface simplicity

8 Copyright 2010 Juniper Networks, Inc.


ASIC PROCESS TECHNOLOGY
Greater density allows more
features/functionality for the same price
Moores Law: Transistor density doubles
every 18 months
Holding up remarkably well. But how
much longer?
While density is increasing, performance
is starting to level off
The decrease in operating voltage,
hence dynamic power, also slowed
Static power is becoming an issue
NRE costs associated with newer
processes increasing dramatically
Architectural innovations are needed to
continue to provide value to customers

9 Copyright 2010 Juniper Networks, Inc.


NETWORKING ASICS AND MEMORIES

Queues /
Packet Buffers
Link Memory

Input /
Input /
ASICs Output/
Output
Fabric

Control Memories
(FIB, ACL, configs, etc.)

10 Copyright 2010 Juniper Networks, Inc.


MEMORY TECHNOLOGY CHARACTERISTICS
Technology Capacity Frequency Latency Power Cost
Embedded
L H L M H
SRAM
Embedded
M M L+ L M
DRAM
Embedded
L M L+ H H
TCAM
External
M L M H H
SRAM
External
H M M L H
RLDRAM
External
H+ M H L L-
SDRAM
External
L L H H H
TCAM
11 Copyright 2010 Juniper Networks, Inc.
MEMORY CHOICES WITH NETWORKING ASICS

Packet buffering
Need high throughput, high density
Long bursts ok
SDRAM or RLDRAM (Reduced Latency DRAM)

Queuing/Link memory
Need high throughput, low latency
Shorter bursts
SRAM, RLDRAM, or SDRAM

Control memory
Need high throughput, low latency
Even smaller access quantum
SRAM, TCAM, or RLDRAM

12 Copyright 2010 Juniper Networks, Inc.


ARCHITECTURE CHIP PARTITIONING
Fewer chips does not
necessarily mean less overall
cost
Chips get very expensive once Chip yield &
they cross a certain die size PCB layers
$ dominate
Economics of silicon is all about
1 chip
fabrication yield Pins, packaging,
power, PCB area
Goals dominate
2 chips
Balance size of each chip within
packet forwarding engine
Minimize pin-count on each chip
Minimize overall component cost X depends on
Flexibility of support different technology

configs with the same chipset Total # of transistors

13 Copyright 2010 Juniper Networks, Inc.


EXAMPLES OF SILICON PROCESS IMPROVEMENT,
CHIP PARTITIONING, AND MEMORY USAGE

Trio/NISP
65nm
4 Chips
I-Chip
1.2Bn Transistors
IP3
180nm (90nm) 90nm 604Gbps IO
IP1, 2 10 Chips 1 Chip RLDRAM/
250nm 446m Trans 160m Transistors DDR3 SDRAM
4 Chips 412Gbps IO 219Gbps IO
18m Trans RLDRAM/
47Gbps IO
SRAM/
SRAM/ RDRAM DDR2 SDRAM
SDRAM (RLDRAM)

1998 2002 2006 2009

14 Copyright 2010 Juniper Networks, Inc.


EXAMPLES: BENEFITS OF ASIC EVOLUTIONS

M40 M160 T640 T1600

Slot Capacity,
3.0 10 40 100
Gbps
System
40Gbps 160Gbps 640Gbps 1600Gbps
Capacity
Max System
1.5 KW 3.15 KW 4.52 KW 8.35 KW
Draw
EER
13 25 71 96
(Gbps/KW)

FRS 1998 2000 2002 2007

15 Copyright 2010 Juniper Networks, Inc.


MICRO ARCHITECTURE

Xchip
Take each subsystem, divide into
blocks, divide each block into sub-
blocks, design down to the basic X_in X_out
logic elements
Document both functionality and X_in_a X_in_b
architecture
Rigorous peer reviews of all
documents
X_in_b_cntl X_in_b_dp

Control logic Datapath

16 Copyright 2010 Juniper Networks, Inc.


REGISTER TRANSFER LEVEL CODING

Translate micro architecture for all blocks to Register Transfer Level


code.

always @ (sel or a or b)
a begin
out if (sel == 1)
out = a;
b else
out = b;
sel end
OR
assign out = sel ? a : b;

A large chip will have hundreds of thousands of lines of RTL code


Must always keep in mind physical placement and timing during the micro
architecture phase
You pay now or you pay more later

17 Copyright 2010 Juniper Networks, Inc.


SYNTHESIS & TIMING
Synthesis is the exercise of mapping RTL to GATES in the
technology of choice
INPUT
RTL code
Specification of clocks and
cycle-time (frequency)
Input and output constraints
for module being
synthesized
Wire-load models as basis to
model interconnect effects
on gates
Recent trends: physical
synthesis
18 Copyright 2010 Juniper Networks, Inc.
VERIFICATION

Goal: First-time-right silicon


Avoid expensive ASIC respins
Simulations are far easier to debug than real chips

Recipe: At least as many verification engineers as design


engineers per chip
TOOLS
Performed at multiple levels
Test-bench tool
Block level
SystemVerilog
Chip level C/C++, Verilog
Coverage tools
Sub-system level Equivalency checkers
System level Simulators
Waveform viewers
Software/hardware co-simulation

19 Copyright 2010 Juniper Networks, Inc.


PHYSICAL DESIGN

Power and clock planning


Perform high-level floor-planning
Place I/O, SRAMs, & Register Arrays
Random logic placements
Perform congestion analysis
Wire up all the logic and IOs
Run timing with physical placement
Many iterations of all of the above

20 Copyright 2010 Juniper Networks, Inc.


PHYSICAL DESIGN EXAMPLE

1) Memory placement
2) Logic placement & clocks
3) M1 routing
4) M2 routing
5) M3 routing
6) M4 routing
7) M5 routing
8) M6 routing
9) M2/M4/M6 routing
10) M1/M3/M5 routing

21 Copyright 2010 Juniper Networks, Inc.


ASIC TAPEOUT

Criteria for ASIC Tapeout


All functionality complete
All verification complete
Performance simulations meet goals
Chip is error free from a testability perspective
Chip meets timing under all process, temperature and
voltage conditions
Design and verification database is archived

22 Copyright 2010 Juniper Networks, Inc.


MANUFACTURING

After the ASIC is taped out


Masks are generated for photolithography
ASICs are then built layer-by-layer on a silicon substrate wafer

Once the ASIC wafer is complete


Each die is tested in wafer test
Only good die are laser cut for packaging

Once cut die are available


They are put in a package
The packaged devices are then tested again

Tested packaged parts are put on system boards


Test with other hardware and software

23 Copyright 2010 Juniper Networks, Inc.


MANUFACTURING CONTINUED

Copper layers

300mm wafer

300mm wafer fab Packaging


24 Copyright 2010 Juniper Networks, Inc.
SUMMARY

ASIC technology has transformed the network industry


Silicon process technology is evolving at an impressive pace but
architectural innovations are required to keep up with the
demand for increasing performance at lower power
A rigorous architecture, design, and verification process is
required to implement complex networking ASICs
There are a vast amount of architectural and design tradeoffs to
be made so user community should provide feedbacks early and
often

25 Copyright 2010 Juniper Networks, Inc.

S-ar putea să vă placă și