Sunteți pe pagina 1din 48

DDR4: Designing for Power and

Performance

Agenda

Comparison between DDR3 and DDR4


Designing for power
DDR4 power savings

Designing for performance


Creating a data valid window

Good layout practices for DDR4


Board debug tools to minimize issues

Looking ahead and conclusion

Comparison Between DDR3 and DDR4

DRAM Technology Comparison


DDR3

DDR4

GDDR5

Voltage

1.5 V / 1.35 V

1.2 V

1.5 V / 1.35 V

Strobe

Bi-directional differential

Bi-directional differential

Free-running differential
WRITE clock

Strobe Configuration

Per byte

Per byte

Per word

READ Data Capture

Strobe based

Strobe based

Clock data recovery

Data Termination
Address/Command
Termination
Burst Length

VDDQ/2

VDDQ

VDDQ

VDDQ/2

VDDQ/2

VDDQ

BC4, 8

BC4, 8

Bank Grouping

No

No

Command / address parity

On-Chip Error Detection

CRC for data bus

CRC for data bus

Configuration

x4, x8, x16

x4, x8, x16

x16, x32

Package

78-ball / 96-ball FBGA

78-ball / 96-ball FBGA

170-ball FBGA

Data Rate (Mbps/Pin)

800 2,133

1,600 3,200+

4,000 7,000

Component Density

1 GB 8 GB

512 MB 2 GB

Stacking Options

DDP, QDP

2 GB 16 GB
Up to 8H (128-GB stack);
single load

No

DDR4 Power Savings

DDR4 Power Savings Features

DDR4 voltage is 1.2 V (up to 40% savings)


Lower voltage than DDR3 (1.5 V)
On-die VREF
Pseudo-open drain I/Os

Manages refreshes (up to 20% savings)

Based on temperature
New DDR4 low-power auto self-refresh (LPASR) capability
Changes refresh rate based on temperature

Only refreshes parts of array that is in use


Controller must allow fine-granularity refresh based on memory utilization

Supports data bus inversion

Limits number of signals transitioning, reducing simultaneous switching

output (SSO) and saving power

Creating a Data Valid Window

Timing Margins Are Shrinking

Shrinking Timing Margins in Picoseconds


DRAM Margin

Package/board
Package / BoardMargin
Margin

Chip Margin

2,500

DDR1
DDR2
DDR3
DDR4

Data Valid Window

Data Valid
Window

DRAM
Margin

Package/
Board
Margin

Chip
Margin

2,500
938
469
313

900
425
188
125

800
256
140
93

800
256
140
93

938
469

DDR1

400 Mbps

DDR2

DDR3

313

DDR4

3,200 Mbps

Shrinking the Window Even More:


DDR4 VREF Training (1/2)

DDR4 VREF training

Training: sweep VREF setting, find maximum passing window


Lump sum of DCD, RX offset, etc.
Resolution error is the combination of (VREF, PI, or delay chain)

Margin loss calculation


VREF step size: from 0.5% VDDQ to 0.8% VDDQ
VREF set tolerance: 1.625% or 0.15%
Calibration error: 1 step size
0.8% * VDDQ = 0.8% * 1.2V = 9.6 mV
Margin loss (due to VREF calibration error)
9.6 mv * 2 / slew_rate = 4.8 ps (assume slew rate = 4 V/ns)
Calibration error = half step size

Vref Step Size


Vref Set Tolerance

10

Vref step
Vref_set_tol

0.50%

0.65%

0.80%

VDDQ

-1.625%

0.00%

1.625%

VDDQ

3, 4, 6

-0.15%

0.00%

0.15%

VDDQ

3, 5, 7

Shrinking the Window Even More:


DDR4 VREF Training (2/2)

11

Discussion with JEDEC members


RDDR4 specification section 13.4: any DRAM component level variation
must be accounted for within the DRAM RX mask. This means that the
VREF calibration error is included in VdlVW_total.
VREF_DQ internal aligns to VCENT_DQs with training. VCENT_DQs
has variation. VREF_DQ training error should increase with this variation
and internal voltage noise etc.

Shrinking the Window Even More:


Duty Cycle Error

DDR4 specification is +/-2% tCK = +/- 0.04 UI


IPD current budget +/-3% tCK

Margin loss is 4% tCK


With proper link timing calibration

+/-2%
DQS

2% tCK margin loss

+/-2%

Assume same for read

DQ
Timing Parameters by Speed Bin for DDR4-2400 to DDR4-3200
Speed

DDR4-2400

DDR4-2666

DDR4-3200

Units

NOTE

22

Parameter

Symbol

MIN

MAX

MIN

MAX

MIN

MAX

Minimum Clock Cycle Time (DLL Off Mode)

tCK (DLL_OFF)

Average Clock Period

tCK (avg)

Average High Pulse Width

tCH (avg)

0.48

0.52

0.48

0.52

0.48

0.52

tCK (avg)

Average Low Pulse Width

tCL (avg)

0.48

0.52

0.48

0.52

0.48

0.52

tCK (avg)

Clock Timing

12

TBD

Shrinking the Window Even More:


Calculating the PLL Jitter
Current Profile : I(f) PDN Impedance : Z(f)

Jitter Sensitivity : S(f)

PSRR of PLL: P(f)

TIE Jitter : j(t)

Jitter Spectrum J(f)


iFFT

p-p jitter
f

jTIE (t )
I ( f ) Z ( f ) S ( f ) P ( f ) = J ( f ) iFFT

13

DDR4 Bank Group Timing

Different timing within a group and between groups (tCCD, tWTR, tRRD)
Long timing: bank-to-bank within a group
Short timing: access to different bank groups

Maintain array timing requirements within bank group


Maintain speed between different bank groups
Bank 2

Bank 3

Bank Group 0
Bank 0

Bank 1

Bank 2

Bank 3

Bank Group 1
Bank 0

Bank 1

Bank 2

Bank 3

Short Timings
Long Timings
Bank 2

Bank 3

Bank Group 2
Bank 0

Bank 1

Bank 2

Bank 3

Bank Group 3
Bank 0

Bank 0

Bank 1

Bank 1

Bank Group 1
14

Calibration Is Critical to Shrinking Margins

0.5

Margin (ns)

0.4
0.3

External
Effects

Calibration Calibration
Effects
Uncertainty

0.2
0.1
0
-0.1

15

FPGA Effects

No Margin Without
Calibration

What is Calibration?
Capture Calibration (De-skew)
Before de-skew
DQ0
DQ1
DQ2
DQ3
DQ4
DQ5
DQ6
DQ7

15

30

small valid capture window

45

60

After de-skew maximize valid capture window

DQs
75 90 105 120 135 150 165 180

DQ0
DQ1
DQ2
DQ3
DQ4
DQ5

DQs
0 15 30 45 60 75 90 105 120 135 150 165 180

Benefit: Reduce skew between data group More capture margin

Resync Calibration
Benefit: Accurate strobe placement
More resync margin

DQ0
DQ1
DQ2
DQ3
*
*
DQ70
DQ71

0 15 30 45 60

315 330 345 360

Valid data window

VT Compensation
Data shifts
due to VT
variations

Voltage and
temperature
tracking
Benefit: Dynamic phase adjustment to match shifting
data valid window Robust over VT

16

High-Level Output Topology


CLK

DQS OUT1 Delay

ptap control

DQS OUT2 Delay

DQS

DQS out dtap1 DQS out dtap2


control
control

X+90 phase
X phase

DQ OUT1 Delay

DQ OUT2 Delay

DQ out dtap1
control

DQ out dtap2
control

DQ

Calibration knobs

DQ-out1 and DQ-out2 delay : Control the delay applied to outgoing DQ

pins
DQS-out1 and DQS-out2 delay : Control the delay applied to outgoing DQS
pins
Write leveling output : Changes the delay on both DQ and DQS relative to
the memory clock-in phase taps

17

High-Level Input Topology


dqs_en ptap
control

vfifo control

X phase

VFIFO

DDIOin

LFIFO

Lfifo control

Calibration knobs

18

DQS
Enable

DQS IN Delay

DQS en dtap
control
DQS En Delay

DQS

DQS Delay Chain

DQS in dtap
control

DQ

DQ IN Delay

DQ in dtap
control

DQ-in delay: Control the delay applied to incoming DQ pins


DQS-in delay: Control the delay applied to incoming DQS pins
LFIFO : Controls number of cycles after read command that data is read out of
the LFIFO
DQS-En phase: Control the delay on DQS En in phase taps
DQS-En delay: Control the delay on DQS En in dtaps
VIFO : Adjusts the delay in cycles applied to controller-provided DQS burst signal
to generate DQS enable

Calibration Stages

Calibrate DQS relative to read command (read leveling)

Calibrate DQ versus DQS (per-bit deskew) for reads

LFIFO training

Calibrate DQ versus DQS (per-bit deskew) for writes

Address/command training (leveling and deskew)


Calibrate CS, CAS, RAS, and ODT versus memory clock

VREF training (FPGA and memory)

19

Calibrate DQS and DM to write command (write leveling)

Write data deskew

Initialize INST/AC ROM


for all pins on this
Mem Interface
Initialize the memory
(Mode Registers etc.)
Calibration loop
Calibrate
the Mem Interface

Calibrate LFIFO delay cycles (read latency)

Write leveling

Track DQS-enable across temperature variation

Wait for PLL/DLL locking

Read data deskew

Calibrate DQS enable (delayed read data valid) relative to DQS

Post-amble tracking

Start

DQS-enable calibration

Calibrates receiver voltage threshold


(for DDR4 with pseudo open drain DQs)

All Mem Interfaces


calibrated?
Y
User command
found in DPRIO?

Process DPRIO
user command
User mode loop

N
User command
found in RAM?
N

Process RAM
user command

Calibration Is Critical to Shrinking Margins

0.5

Margin (ns)

0.4
0.3

External
Effects

Calibration Calibration
Effects
Uncertainty

0.2
0.1
0
-0.1

20

FPGA Effects

No Margin Without
Calibration

Good Layout Practices for DDR4

21

DDR4 Output Driver

DDR3 Push-Pull

DDR4 Pseudo Open Drain

Content Courtesy of Micron


22

Unadjusted, Non-Terminated Data Eye

Overshoot

VDD

VSS

Undershoot
Jitter
Content Courtesy of Micron

23

Terminated Data Eye

Overshoot

VIHdc

VIHac

Hi-Ringback

Vref

Lo-Ringback

VILdc

VILac

Undershoot

Content Courtesy of Micron


24

OCT from the Controller Standpoint

DQ and CA pins are terminated differently in DDR4


Specification

DDR3

DDR4

Density / Speed

512 Mb ~ 8 GB
1.6 ~ 2.1 Gbps

2 GB ~ 16 GB
1.6 ~ 3.2 Gbps

Voltage
(VDD / VDDQ / VPP)

1.5 V / 1.5 V / NA
(1.35 V / 1.35 V / NA)

1.2 V / 1.2 V / 2.5 V

VREF

External VREF (VDD / 2)

Internal VREF (need training)

Data I/Os

CTT (34 ohm)

POD (34 ohm)

CMD/ADDR I/Os

CTT

CTT

Strobe

Bi-directional / differential

Bi-directional / differential

Number of banks

16 (4 GB)

Page size (x4 / x8 /


x16)

1 KB / 1 KB / 2 KB

512 B / 1 KB / 2 KB

Number of prefetch

8 bits

8 bits

Added function

RESET / ZQ / Dynamic ODT

+ CRC / DBI / Multi preamble

Package type / balls


(x4, x8 / x16)

78 / 96 BGA

78 / 96 BGA

DIMM type

R, LR, U, SoDIMM

+ ECC SoDIMM

DIMM pins

240 (R, LR, U) / 204 (So)

284 (R, LR, U) / 256 (So)

Interface

Core
Architect

Physical

25

OCT Calibration Scheme to Support DDR4

OCT can calibrate 2 times with 2 sets of pins (DQ/CA)


DQ and CA pins will have 2 different sets of codes in DDR4
DDR4

26

DDR3

General Layout Concerns

Avoid crossing splits in the power plane


SSO on controller collapsed strobes/clocks
Separate supplies and/or flip-chip packaging helps

Low-pass VREF filtering on controller helps


Minimize VREF noise
Minimize intersymbol interference (ISI)
Minimize crosstalk

Content Courtesy of Micron


27

Layout and Termination (1/12)

Signal integrity review

Importance of transmission line theory


Todays clock rates are too fast to ignore
Matched impedance line is important for good signaling
Mismatched impedance lines result in reflections
Termination schemes are used to reduce / eliminate reflections
Good power bussing is paramount to reducing SSO
SSO reduce voltage and timing margins
Decoupling capacitors needs and requirements

Content Courtesy of Micron


28

Layout and Termination (2/12)

Signal integrity analysis is paramount to developing


cost-effective high-speed memory systems
Develop timing budget for proof of concept
Use models to simulate
Board skews are important and should accounted for
ISI, crosstalk, VREF noise, path length matching, Cin and RTT mismatch

employ industry practices and assumptions

Model vias too


Eliminate return path discontinuities (RPDs)
Minimize SSO affects
Difficult to model

Content Courtesy of Micron


29

Layout and Termination (3/12)

DRAM and controller package parasitics are fixed


SSO effects already contained in their specified timings
However, these are to test conditions with specific decoupling

Power delivery network (PDN) for the controller and


DRAM need to be properly designed

Lowering power supply inductance minimizes signaling


variations between devices
Use power and ground planes wherever possible
Make all power and ground traces as fat as possible
Couple power and ground as much as possible
Lowers inductance (mutual effects)

Content Courtesy of Micron


30

Layout and Termination (4/12)

SSO

Timing and noise issues generated due to rapid changes in voltage and

current caused by multiple circuits switching simultaneously in the same


direction

Problems caused by SSO

False triggers due to power/ground bounce


Reduced timing margin due to SSO induced skew
Reduced voltage margin due to power/ground noise
Slew rate variation

Content Courtesy of Micron


31

Layout and Termination (5/12)

Good power bussing is paramount to reducing SSO


dI
V = L
dt

Reduce L (power delivery effective inductance)


Use planes for power and ground distribution
Proper routing of power and ground traces to devices
Proper use of decoupling capacitance
Locate as close as possible to the component pins

Reduce dI/dt (switching current slew rate)

Use the slowest drive edge that will work


Use reduced drive strength instead of full drive where possible

Content Courtesy of Micron


32

Layout and Termination (6/12)

RPDs induce board noise and are difficult to model


Splits/holes in reference planes
Connector discontinuities

Split Return Path

Layer changes

Avoid RPDs if at all possible

Avoid crossing holes/splits in reference plane


Route signals so they reference the proper domain
Add power/ground vias to board
Especially in dense layer-change areas
Place decoupling capacitors near connectors

Content Courtesy of Micron


33

Solid Return Path

Layout and Termination (7/12)

VREF noise

Induces strobe to data skews and reduces voltage margins


Power/ground plane noise
Crosstalk

Minimize VREF noise

Use widest trace practical to route


From chip to decoupling capacitor
Use large spacing between VREF and neighboring traces

Content Courtesy of Micron


34

Layout and Termination (8/12)

ISI

Occurs when data is random


Clocks do not have ISI

Multiple bits on the bus at the same time

Bus cannot settle from bit #1 before bit #2, etc.

Signal edges jitter due to previous bits energy still on the bus

Ringing due to impedance mismatches

Low pass structures can cause ISI

Minimize ISI

Optimize layout

Keep board/DIMM impedances matched

Drive impedance should be same as Zo of transmission line

Terminate nets

Termination values should be the same as Zo of transmission line

Select high-quality connector

Matched to board/DIMM impedance


Low mutual coupling

Content Courtesy of Micron


35

Layout and Termination (9/12)

Crosstalk

Coupling on board, package, and connector from other signals, including

RPDs

Inductive coupling is typically stronger than capacitive coupling

When aggressors fire at the same time as victim (e.g. data-to-data coupling)
Victim edge speeds up or slows down, causing jitter
When aggressors do not fire at the same time as victim (e.g. data-to-

command/address coupling)

Noise couples onto victim at time of aggressor switching

Content Courtesy of Micron


36

Layout and Termination (10/12)

Minimize crosstalk

Keep bits that switch on same clock edge routed together


Route data bits next to other data bits; never next to CMD/ADDR bits

Isolate sensitive bits (strobes)

If need be, route next to signals that rarely switch

Separate traces by at least two to three {preferred} conductor widths


(more accurately, one would define by trace pitch and height above
reference plane)

Example: 5-mil trace located 5 mils from a reference plane should have a 15-mil gap

to its nearest neighbors to minimize crosstalk

Choose a high-quality connector


Run traces as stripline (as opposed to microstrip)
Not at the cost of additional vias

Maintain good references for signals and their return paths


Avoid RPDs
Keep driver, BD Zo, and ODT selections well matched

Content Courtesy of Micron


37

Layout and Termination (11/12)

Cin mismatch

Differing input capacitances on receiver pins


Adds skew to input timings

RTT mismatch

Termination resistors not at nominal value


Internal ODT on data pins have smaller variation than on DDR2
They are calibrated (so is DRAMs Ron)
External termination resistor variation must be accounted for
Consider one-percent resistors

Content Courtesy of Micron


38

Layout and Termination (12/12)

High-speed signals must maintain a solid reference


plane
Reference plane may be either VDD or ground
For DDR3 UDIMM systems, the DQ busses are referenced to ground while

the ADDR/CMD and clock are referenced to VDD


All signals may be referenced to ground if the layout allows

Best signaling is obtained when a constant reference


plane is maintained
If this is not possible try to make the transitions near decoupling capacitors
Signal
Power Plane
Ground Plane

Content Courtesy of Micron


39

Cap

Board Debug Tools to Minimize


Issues

40

TimeQuest DDR Timing: Read Capture

Before calibration
is the out
standard
Calibrating
some
timing
analysis
of the process
Calibrating
to the

variation
in the
FPGA
variations
memory
(deskew
+
pessimism removal)

Errors in the
calibration
algorithm
Effects
of
temperature and
voltage changes on
the calibration

Total margin after calibration

41

EMIF Debug Toolkit Features

Reports results of the last calibration to the user

Reports interface details, margins observed before calibration, settings

made during calibration, and post-calibration margins


In the case of a calibration failure, toolkit reports the stage at which
calibration failed and the group

Provides eye monitor support


Provides loopback support
Allows user interaction with memory interface

Send commands to the memory interface to recalibrate, mask groups and

ranks
Eye monitor support of data valid window
Loopback support for bit error rate (BER) testing

42

TimeQuest-Like GUI interface

Reports section

Tasks section

Commands run
Shown in console

43

On-Chip EMIF Debug Toolkit

Core access to calibration data

Access same calibration data as the EMIF toolkit, now via FPGA logic
Via Avalon Memory-Mapped (Avalon-MM) interface

44

Looking Ahead and Conclusion

45

Will There Be a DDR5?

Very unlikely

SI for a parallel bus of 2 GHz and above would be very difficult


Timing budget would be consumed in the package
PDN noise
Package skew

Transition to stack memory

Hybrid Memory Cube and serialized memory


3D memories integrated into ASICs

46

Conclusion

DDR4 has many ways to reduce overall system power


~50% lower power than DDR3 at 1.5 V

DDR4 is 33% faster than DDR3 2133


But there are challenges..
Shrinking data valid window

Increase signal integrity and power integrity concerns

These can be overcome by good controller design


Innovative calibration
Good ODT
Careful board design
Good board debug tools

47

Thank You

S-ar putea să vă placă și