Sunteți pe pagina 1din 3

The Rambus Memory System

James A. Gasbarro
Rambus Inc.

Abstract 2.0 The Rambus Channel


This paper describes a revolutionary new technology The RambusTMChannel is a high-speed memory inter-
for building high-performance DRAM memory systems face that operates up to 10 times faster than conventional
that operate up to I O times faster than conventional sys- DRAM interfaces. The Channel uses a small number of
tems. With only a 9-bit wide interface, devices are capable very high speed signals to carry all address, data, and con-
oftransferring data at over 500 MBytes per second. This trol information, greatly reducing the pin count and hence
technology is implemented using standard CMOS process, cost, while maintaining high performance levels. The
package, and printed circuit fabrication techniques and is Channel is a byte wide bus that operates at over 500
suitable for cost sensitive volume applications. MBytes/sec. Multiple Channels can be operated in parallel
to achieve even higher throughput.

1.0 Introduction A Rambus Channel contains 13 controlled impedance,


matched transmission lines: 9 data, 2 clock, and 2 control.
During the last two decades, DRAM technology has As shown in the Figure 1, the Channel has a bus topology
changed dramatically. While device densities have with the master device (microprocessor or ASIC control-
increased by nearly six orders of magnitude, DRAM ler) at one end, terminators at the other end, and slaves
access times have only improved by EO factor of 10. Over @DRAMsm) in between.
the same period microprocessor performance has jumped
by a factor of 100. This has created a significant perfor- Several key features of the system can be noted from
mance gap between computing elements and their associ- the figure. The &st important feature is the master, which
ated memory devices. This gap has been traditionally is located at one end of the bus. In the Rambus protocol,
filled by relatively expensive application specific memo- direct data transfers occur only between master and slave
ries, such as SRAM caches for processors, or multiport devices, not between slaves. This allows signals to be ter-
VRAMs for graphics subsystems. Rambus Inc. has devel- minated at only one end of the Channel, resulting in con-
oped a new high-bandwidth DRAM interface that can sup siderable savings in U 0 power dissipation. In operation,
ply the data rates needed for advanced multimedia and data driven by the master propagates past all slaves, allow-
graphics computing applications with substantialcost sav- ing all slaves to correctly sense the data. The matched ter-
ings. minator prevents any reflections. Data driven by a slave
initially propagates in both directions along the bus at half

BusData[B:Ol
I I I I I I I
BusCtrl, BusEnable I
ClkFromMaster
ClkToMaster I I I I I I I I I I l l I
I I I I I I

Figure 1: Master and Slaves Connect to Terminated Transmission

94
0-8186-7102-5/95 $04.00 0 1995 IEEE
the voltage swing. The wavefront travelling toward the PCBs carrying Rambus Channel signals are designed
terminator stops when it reaches the terminating resis- using standard FR-4construcition techniques. The micro-
tance. The signal travelling toward the master, on the other strip transmission lines are targeted for 50 ohm nominal
hand, encounters an open circuit. This causes the wave- impedance. Due to the capacitive loading effect of the
front to double in amplitude, supplying full logic level RDRAMs, the loaded impedance can be as small as 20
inputs to the master. ohms.

Output drivers on the Channel are current source driv- The Rambus Interface is a special logic block contained
ers, rather than the more common voltage source drivers. in any device that connects to the Channel. The interface is
Current source drivers present a high impedance to the implemented in the same standard CMOS processes that
waveform reflecting from the master end of the bus. This are in common use for today's ASICs and DRAMS. Con-
prevents secondary reflections from occurring due to the tained in the interface are two delay-locked loops for
active slave driver. Thus, the worst case bus settling time deskewing transmit and receive clocks, as well as level
is 2 Tf(Tf is the time-of-flight on the bus) when the slave converters and shift register logic to convert the external
nearest the terminator is transmitting. The worst case data low-swing, byte-wide 500 Mllz bus to an internal8-byte-
delivery time however, between any master-slave pair is wide CMOS bus that operates at 62.5 MHz. In an ASIC,
only 1 Te the interface appears as a diffused cell in the pad ring. The
ASIC designer only has to deid with the lower speed bus,
A second feature of the Channel topology is the clock and does not need to know the details of the low-speed to
distribution scheme. The clock routing begins at the slave high-speed transformation.
end of the Channel and propagates to the master end as
ClkToMaster, where it loops back as ClkFromMaster to In the RDRAM, the interface is located along one edge
the slave end and terminates. This clock topology allows of the die and replaces the existing DRAM U 0 structures.
clock and data to always travel in the same direction to The DRAM core is of conventional design, with an inter-
minimize skew. A slave always sends data to the master nal CAS cycle time of 16 ns. [n addition to the bus trans-
synchronously with ClkToMaster, and the master always formation function, the RDRAM interface also contains
sends data to the slaves synchronously with ClkFromMas- the registers and logic necessary to implement the bus pro-
ter. Because the transmission lines are matched, the clock tocol.
and data signals remain synchronized as they travel to
their destination.
4.0 Transactions
In a Rambus Channel, both edges of the clock are used
Channel transactions are initiated with a Request
for data transfer. Thus 500 MByte/sec operation is attained
Packet by the Master, as shown in Figure 2. The Request
with only a 250 MHz clock source. This limits the maxi-
Packet contains the address of the R D M M and memory
mum toggle rate for any signal, data or clock to 250 MHz.
location to be accessed, as well as byte-count and opcode
All high-speed signals on the Channel operate with volt-
fields. For a write request, the write data immediately fol-
age swings of about 800 mV, and are driven with con-
lows the request. For reads, the access time from request
trolled rise-time drivers. These features combine to
to the first data word is 28 ns. Up to 256 bytes of data can
minimize electromagnetic interference problems.
be streamed in a single transaction.

3.0 Packaging
The slave packaging is crucial to maintaining a uniform
transmission line environment for the Channel. Since there
can be many RDRAMs in a system, the stub introduced by
the leads of the device must be kept as small as possible. Read Ra Read Data
RDRAMs are designed such that the internal bonding pads
of the die are pitch matched to the Channel traces on the
Write Rq Write Dada
printed circuit board (PCB). This allows the leadframe to
have a uniform length of about 2 mm, which in turn
enables pin parasitics of less than 2 pf and 3 nH. Figure 2: Channlel Transactions

95
Each RDRAM is broken down into two independent into one of its cache lines. Other than that, the Channel
banks of memory. Each of these banks has a 2 KByte and all other RDRAMs will still be available for use.
cache line associated with it that is built out of large sense Instead of waiting for the first D R A M to finish loading
amplifier arrays. These caches work by holding the last its cache, a transaction to another RDRAM can be initi-
accessed row of their associated bank in the sense amplifi- ated.
ers allowing further accesses to the same row of memory
to result in cache hits. With the row already stored in the This can be used in various ways. In system where
cache, data can be accessed with very low latency. Each memory accesses can be queued, a transaction can take
RDRAM added to a system adds two cache lines to the place for any pending access residing in a different
memory system helping to increase cache hit rates. D R A M . When that transaction is complete, the first
transaction can be retried.
A cache miss results when a row is accessed that is not
currently stored in one of the cache lines When this hap- &touching can be used in systems where memory
pens, the requesting master is sent a Negative Acknowl- accesses are predictable, such as video applications. This
edge packet indicating the requested row is not yet is done when an application is finished with a particular
available. The RDRAM then loads the requested row into RDRAM and about to access a different one. If the next
the cache line and waits for the master to submit a retry of access to the original RDRAM is known in advance, a
the previous request. dummy transaction can be first generated to cause a row
m i s s and prepare it for its next access. Transactions to
Address mapping hardware is provided to increase other devices can then take place while a cache fill is tak-
cache hit rates by allowing system designers to easily per- ing place. When the original device is next accessed, the
form n-way RDRAM interleaving. In a non-interleaved required row of data will already be loaded in the cache
memory system, contiguous blocks of addresses follow line and a cache hit will occur.
each other in sequence in one D R A M which is then fol-
lowed by the next RDRAM. By using address mapping,
contiguous blocks of addresses are split across several
6.0 System level
RDRAMs, and therefore across several cache lines. In a
At the system level, the Rambus architecture has a
typical system containing, for example, eight RDRAMs,
number of important advantages over conventional mem-
m i s s rates could be expected to be as low as 5%.
ory system approaches. In the graphics arena, for example,
only one 16Mb DRAM is needed for building a high reso-
5.0 Concurrency lution frame buffer. Commodity DRAMScannot supply
the bandwidth needed with only one device. A single
Read and write transactions to RDFtAMs are not lim- RDRAM though, can supply all the bandwidth needed for
ited to simple sequential operations. Non-contiguous display refresh and still have a large fraction of the total
blocks of memory can be accessed through the use of the available for display update. In main memory applica-
read and write non-sequential operations. With these com- tions, memory granularity is a big issue due to system
mands, multiple eight-byte blocks of data within a DRAM cost. Conventional memory systems use multiple devices
cache line can be accessed in a non-sequential fashion. in parallel to meet the bandwidth needs of the processor.
The address for the next data block is transmitted as a This implies that incremental memory cost is two to four
serial address packet on one of the control signals. Succes- times more than what can be had using RDRAIvfs, which
sive serial address packets continue to specify new can be added in units of a single DRAM. With 8 MByte
addresses within the cache while data is continuously chips just a few years away, this will soon become a criti-
transferred until the access is complete. Non-sequential cal issue.
accesses are useful in applications such as graphics,where
data is often accessed in a localized but non-linear fashion,
or in main memory applications when performing func-
7.0 Conclusion
tions such as write gathering.
The new technology introduced by the Rambus Chan-
nel and the RDRAM represent a substantial leap forward
Concurrent transactions can be used to optimize
in processor to memory interconnect. This technology will
RDRAM utilization in high performance applications by
enable a host of new applications where high bandwidth
taking advantage of available Channel bandwidth during
and low cost are the driving factors.
cache miss latency periods. When a m i s s in one RDRAM
takes place, that device will be busy loading a new row Rambus and RDRAM are trademarks of Rambus Inc.

96

S-ar putea să vă placă și