Memory Hierarchy

1
Memory Technologies
In principle, there are Static, Dynamic, and Non-volatile Memory technologies.

Non Volatile Memory
What all Non-volatile Memories have in common is that they retain their contents, i.e.,
the stored data, even if the power supply is switched off. They are randomly accessible
and the memory cannot be changed (there are a few caveats to this statement).

ROM

Bipolar MOS

Mask PROMS mask PROMs EPROMs
ROMs ROMs
EEPROMs
Flash

ROM or mask programmable ROM
This is a semiconductor type of read-only memory whose stored information is
programmed during the manufacturing process. In essence this type of ROM is an array
of possible storage cells with a final layer of metal forming the interconnections to
determine which of the cells will hold 0s and is. Although the contents may be read they
may not be written. It is programmed by the manufacturer at the time of production using
a specially constructed template. All masked ROMs are therefore very similar, since most
of their layers are the same with the differences only being introduced in the final layer
metal mask.
Masked ROMs are typically manufactured in high volumes in order to minimise costs.
They can cram a lot of data onto a relatively small chip area; unfortunately, because the
information is programmed by the design of a mask which is used during the
manufacturing process, it cannot be erased. Also, as the final metal layer is deposited at
the chip factory, any repair of a masked ROM requires a long turnaround time. The
process of repair consists of identifying the error(s), notifying the chip firm, and altering
the mask before finally fabricating new chips. Any detected bug also has the unwanted
effect of leaving the user with lots of worthless parts. Most commentators usually refer to
maskprogrammable ROMs as simply ROMs.

PROM A programmable readonly memory (PROM) is part of the readonly
memory (ROM) family. It is a programmable chip which once written cannot be erased
and rewritten, i.e. it is a read only after written memory. To implant programs, or data,
into a PROM a programming machine (called a PROM programmer) is used to apply
the correct voltage for the proper time to the appropriate addresses selected by the
2
programmer. As the PROM is simply an array of fusible links the programming
machine essentially blows the various unwanted links within the PROM leaving the
correct data patterns, a process which clearly cannot be reversed. Like the ROM, the
PROM is normally used as the component within a computer used to carry any
permanent instructions that the system may require.

EPROM An erasable programmable readonly memory (EPROM) is a special
form of semiconductor read only memory that can be completely erased by exposure to
ultraviolet light. The device is programmed in a similar way to the programmable
readonly memory (PROM); however, it does not depend on a permanent fusible link
to store information, but instead relies on charges stored on capacitors in the memory
array. The capacitors determine the on/off state of transistors, which in turn determine
the presence of is or 0s in the array.
The EPROM is so arranged that the information programmed into it can be erased, if
required, by exposing the top surface of the package to ultraviolet radiation. This brings
about an ionizing action within the package, which causes each memory cell to be
discharged. EPROMs are easily identified physically by the clear window that covers
the chip to admit the ultraviolet light. Once an EPROM has been erased, it can be
reprogrammed with the matrix being used again to store new information. The user can
then completely erase and reprogram the contents of the memory as many times as
desired.
Intel first introduced the EPROM in 1971; however, the storage capacity has
increased dramatically with improving IC technology. Current EPROMs can store
multiple megabytes of information.

EEPROM An electrically erasable programmable ROM (EEPROM) is a closely
related device to the erasable programmable ROM (EPROM) in that it is programmed in
a similar way, but the program is erased not with ultraviolet light but by the use of
electricity. Erasure of the device is achieved by applying a strong current pulse, which
removes the entire program, thus leaving the device ready to be reprogrammed. The
voltages necessary to erase the EEPROM can either be applied to the device outside or
(more often) from within the host system, thereby allowing systems to be reprogrammed
regularly without disturbing the EEPROM chips. In this way electrical eras ability does
yield certain benefits; however, this comes at the cost of fewer memory cells per chip and
lower density, than on a standard ROM or EPROM.

Flash Memory
A characteristic of Flash Memories is that individual bytes can be addressed and read out,
whereas write and delete processes operate on blocks of addresses at a time. Read access
times, currently about l00ns, are about double those of Dynamic Memories. The number
of programming and delete cycles is limited to about 100,000. In general, the retention of
data is guaranteed over a period of 10 years. Among the various forms of Flash Memories
available are SIMM, PC Card (PCMCIA), Compact Flash (CF) Card, Miniature Card
(MC), and Solid State Floppy Disc Card (SSFDC). Over and above their exterior
appearance, there are two main types of Flash Memory modules:
3
Linear Flash and ATA Flash. Linear Flash modules have a linear address space and any
address can be directly accessed from outside. On the other hand, for the ATA Flash
cards address conversion takes place internally, so that the addressing procedure is
similar to that of a disk drive, a fact that may for instance simplify driver programming.
Examples of the application of Flash modules are mass or program memories in
notebooks, network router, printers, PDAs, and digital cameras.

RAM (Random Access Memory)
Static RAM
This memory is based on transistor technology, and does not require refreshing. It is
random access and is volatile i.e. it loses its data if the power is removed. It consumes
more power (thus generates more heat) than the dynamic type, and is significantly faster.
It is often used in high speed computers or as cache memory. Another disadvantage is
that the technology uses more silicon space per storage cell than dynamic memory, thus
chip capacities are a lot less than dynamic chips. Access times of less than 15ns are
currently available whereas dynamic RAM has access times of greater than 30ns.

Dynamic Random Access Memory (DRAM)

Basic DRAM operation

In order to store lots of small things we can divide the storage space up into small bins
and stick one item in each bin. If each item we're storing is unique and we're ever going
to be able to retrieve a specific item, we need an organisational scheme to order the
storage space. Sticking a unique, numerical address on each bin is the normal approach.
The addresses will start at some number and increment by one for each bin. If we wanted
to search the entire storage space, we'd start with the lowest address and step through
each successive one until we get to the higher address.
Now, once we've got the storage space organised properly, we'll need a way to get the
items into and out of it. For RAM storage, the data bus is what allows us to move stuff
into and out of storage. And of course, since the storage space is organised, we need a
way to tell the RAM exactly which location contains the exact data that we need; this is
the job of the address bus. To the CPU, the RAM looks like one long, thin line of storage
cells, each with a unique address. If the CPU wants a piece of data from RAM, it first
places the address of the location on the address bus. It then waits a few cycles and listens
on the data bus for the requested information to show up.

4

Fig 1 Simple Model of DRAM

The round dots in the middle are memory cells, and each one is hooked into a unique
address line. The address decoder takes the address off of the address bus and identifies
which cell the address is referring to. It then activates that cell, and the data in it drops
down into the data interface where it's placed on the data bus and sent back to the CPU.
The CPU sees those cells as a row of addressable storage spaces that hold 1 byte each, so
it understands memory as a row of bytes. The CPU usually grabs data in 32-bit or 64-bit
chunks, depending on the width of the data bus. So if the data bus is 64-bits wide and the
CPU needs one particular byte, it'll go ahead and grab the byte it needs along with the 7
bytes that are next to it. It grabs 8 bytes at a time because:

a) it wants to fill up the entire data bus with data every time it makes a request and
b) it'll probably end up need those other 7 bytes shortly.

Memory designers organise the cells in a grid of individual bits and split up the address
into rows and columns, which can be used to locate the individual bits needed. This way,
if you wanted to store, say, 1024 bits, you can use a 32 x 32 grid to do so. RAM chips
don't store whole bytes, but rather they store individual bits in a grid, which can be
addressed one bit at a time.

When the CPU requests an individual bit it would place an address in the form of a string
of 22 binary digits (for the x86) on the address bus. The RAM interface would then break
that string of numbers in half, and use one half as an 11 digit row address and one half as
an 11 digit column address. The row decoder would decode the row address and activate
the proper row line so that all the cells on that row become active. Then the column
decoder would decode the column address and activate the proper column line, selecting
5
which particular cell on the active row is going to have its data sent back out over the
data bus by the data interface. Also, note that the grid does not have to be square, and is
usually a rectangle where the number of rows is less than the number of columns

Figure 2 Row and Column Addressing

The cells are comprised of capacitors and are addressed via row and column decoders,
which in turn receive their signals from the RAS and CAS clock generators. In order to
minimise the package size, the row and column addresses are multiplexed into row and
column address buffers. For example, if there are 11 address lines, there will be 11 row
and 11 column address buffers. Access transistors called sense amps are connected to
the each column and provide the read and restore operations of the chip.

DRAM Read
6

1) The row address is placed on the address pins via the address bus.

2) The /RAS pin is activated, which places the row address onto the Row
Address Latch.

3) The Row Address Decoder selects the proper row to be sent to the sense
amps

4) The Write Enable (not pictured) is deactivated, so the DRAM knows that
it's not being written to.

5) The column address is placed on the address pins via the address bus.

6) The /CAS pin is activated, which places the column address on the Column
Address Latch.

7) The /CAS pin also serves as the Output Enable, so once the /CAS signal has
stabilised the sense amps place the data from the selected row and column on the
Data Out pin so that it can travel the data bus back out into the system.

8) /RAS and /CAS are both deactivated so that the cycle can begin again.

Figure 3 DRAM Read
One of the problems with DRAM cells is that they leak their charges out over time, so
that charge has to be. Reading from or writing to a DRAM cell refreshes its charge, so the
7
most common way of refreshing a DRAM is to read periodically from each cell. This
isn't quite as bad as it sounds for a couple of reasons. First, you can sort of cheat by only
activating each row using /RAS, which is how refreshing is normally done. Second, the
DRAM controller takes care of scheduling the refreshes and making sure that they don't
interfere with regular reads and writes. So to keep the data in DRAM chip from leaking
away the DRAM controller periodically sweeps through all of the rows by cycling RAS
repeatedly and placing a series of row addresses on the address bus.
A RAM grid is always organised as a rectangle, and not a perfect square. With DRAMs,
it is advantageous to have fewer rows and more columns because the fewer rows you
have, the less time it takes to refresh all the rows.
Even though the DRAM controller handles all the refreshes and tries to schedule them for
maximum performance, having to go through and refresh each row every few
milliseconds can seriously get in the way of reads and writes and thus impact the
performance of DRAM. EDO, Fast Page, and the various other types of
DRAM are mainly distinguished by the ways in which they try to get around this
potential bottleneck.

Each of the cells in an SRAM or DRAM chip traps only a 1 or a 0. Also, the early
DRAM and SRAM chips only had one Data In and one Data Out pin apiece. Now, the
CPU actually sees main memory as a long row of 1-byte cells, not 1-bit cells.
Therefore to store a complete byte just stack eight, 1-bit RAM chips together, and have
each chip store one bit of the final byte. This involves feeding the same address to all
eight chips, and having each chip's 1-bit output go to one line of the data bus. The
following diagram should help you visualise the layout. (To save space, a 4-bit
configuration is shown, this could be extended to eight bits by just adding four more
chips and four more data bus lines. Just imagine that the picture below is twice as wide,
and that there are eight chips on the module instead of four.

8

Figure 4 DRAM Organisation

By combining eight chips on one printed circuit board (PCB) with a common
address and data bus would make an 8-bit RAM module.
In the above picture, it is assumed that the address bus is 22 bits wide and the data bus is
8 bits wide. This means that each single chip in the module holds 2
22
or 4194304 bits.
When the eight chips are put together on the module, with each of their 1-bit outputs
connected to a single line of the 8-bit data bus, the module appears to the CPU to hold
4194304 cells of 8 bits (1 byte) each (or as a 4MB chip). So the CPU asks the module for
data in 1 byte chunks from one of the 4194304 virtual 8-bit locations. In RAM notation,
we say that this 4MB module is a 4194304 x 8 module (or alternatively, a 4M x 8
module. Note that the M in 4M is not equal to MB or megabyte, but to Mb or megabit.)
9

Figure 5 SIMM Module

The CPU likes to fill up its entire 32-bit or 64-bit data bus when it fetches data. So,
instead of stacking the outputs of multiple chips together on one module, the outputs of
multiple modules are stacked together into one RAM bank, Figure 6 shows you one bank
of four, 8-bit modules. Assume that each chip in each module is a 4194304 x 1 chip,
making each module a 4194304 x 8 (4 MB) module. The following bank then, with the 8-
bit data buses from each module combined gives a bus width of 32 bits.

10

Figure 6 RAM Bank

The 16MB of memory that the above bank represents is broken up between the
modules so that each module stores every fourth byte. So, module 1 stores byte
1, module 2 stores byte 2, module 3 stores byte 3, module 4 stores byte 4,
11
module 1 stores byte 5, module 2 stores byte 6, and so on up to byte 16,777,216.This is
done so that when the CPU needs a particular byte, it can not only grab the byte it needs
but it can also put the rest of the adjacent bytes on the data bus, too, and bring them all in
at the same time.
To add memory to a system like this, you can do one of two things. The first option
would be to increase the size of the bank by increasing the size of each individual module
by the same amount. Say you wanted 32MB of memory; you'd increase the amount of
storage on each module from 4MB to 8MB. The other option would be to add more
banks. The example above shows what a RAM bank on some i486 systems would
actually have looked like, with each of the modules being a 30-pin, single-sided SIMM.

While 8 bits worth of data pins in a DRAM bank actually makes the memory
organisation of a single SIMM a bit simpler and easier to understand, putting 16 or more
bits worth of data pins on a single chip can actually make things more confusing.
The DIMM in Figure 7 is the Texas Instruments TM124BBJ32F. The TM124BBJ32F is a
4MB, 32-bit wide DRAM, which has only two RAM chips on it. This means that each
chip is 16 bits wide and holds 2 MB. Externally, however, to the system as a whole, the
module appears to be made up of four, 1M x 8-bit DRAM chips. Each of those 2M x 16-
bit DIMMs is almost like a mini DRAM module, with an upper and lower half of 1M
apiece, where each half has its own CAS and RAS signals.
Memory Latency
There are two main types of delays that we have to take into account. The first type
includes the delays that have to take place between successive DRAM reads. You can't
just fire off a read and then fire off another one immediately afterwards. Since a DRAM
read involves charging and recharging capacitors, and various control signals have to
propagate hither and thither so that the chip will know what it's doing, you have to stick
some space in between reads so that all the signals can settle back down and the
capacitors can recharge.
Of this first type of in-between-reads delay, there's only one that's going to concern us
really, and that's the /RAS and /CAS precharge delay. After /RAS has been active and
you deactivate it, you've got to give it some time to charge back up before you can
activate it again. Figure 8 shows this.

Figure 8 Asynchronous DRAM timing
The same goes for the /CAS signal as well, and in fact to visualise the /CAS precharge
delay just look at the above picture and replace the term RAS with CAS.
12
The /RAS and /CAS precharge delays can be thought of in light of the list of DRAM read
steps, it is this rest period which limits the number of reads that can occur in a given
period of time. Specifically, step 8 dictates that you've got to deactivate /RAS and /CAS
at the end of each cycle, so the fact that after you deactivate them you've got to wait for
them to precharge before you can use them again means you have to wait a while in
between reads (or writes, or refreshes, for that matter).

This precharge time in between reads isn't the only thing that limits DRAM operations
either. The other type of delay that concerns us is internal to a specific read. Just like the
in-between-reads delay is associated with deactivating /RAS and /CAS, the inside-the-
read delay is associated with activating /RAS and /CAS. For instance, the row access
time (tRAC), is the minimum amount of time you have to wait between the moment you
activate RAS and the moment the data you want can appear on the data bus. Likewise,
the column access time (tCAC) is the minimum delay between the moment you activate
/CAS and the moment the data can appear on the data bus.
Think of tRAC and tCAC as the amount of time it takes the chip to fill an order you just
placed at the drive-in window. You place your order (the row and column address of the
data you want), and it has to go and fetch the data for you so it can place it on the data
pins. Figure 8 shows how the two types of delays work

Figure 8 Row Access Time

Figure 9 shows both types of delay in action in a series of DRAM reads.

13

Figure 9 Complete DRAM timing diagram
Latency
There are two important types of latency ratings for DRAMs: access time and cycle time,
where access time is related to the second type of delays we talked about (those internal
to the read cycle) and cycle time is related to the first (those in between read cycles).
Both ratings are given in nanoseconds.
For asynchronous DRAM chips, the access time describes the amount of time it takes in
between when you drop that row address on the address pins and when you can expect
the
data to show up at the data pins. Going back to our drive-in analogy, the access time is
the time in between when you place your order and when your food shows up at the
window. So a DIMM with a 60ns latency takes at least 60ns to get your data to you after
you've placed the row address (which is of course followed by the column address) on the
pins.

Cycle time is the amount of time you have to wait in between successive read operations.
Minimising both cycle time and access time are what the next two types of DRAM.

FPM DRAM.
Fast Page Mode DRAM is so called because it squirts out data in 4-word bursts (a word
is whatever the default memory chunk size is for the DRAM, usually a byte), where the
four words in each burst all come from the same row, or page. For the read that fetches
the first word of that four word burst, everything happens like a normal read--the row
address is put on the address pins, /RAS goes active, the column address is put on the
address pins, /CAS goes active, etc.. It's the next three successive reads that look kind of
14
strange. At the end of that initial read, instead of deactivating /RAS and then reactivating
it to take the next row address, the controller just leaves /RAS active for the next three
reads. Since the four words all come from the same row but different columns, there's no
need to keep sending in the same row address. The controller just leaves /RAS active so
that to get the next three words all it has to do is send in three column addresses.

To sum up, you give the FPM DRAM the row and column addresses of the initial word
required, and then access three more words on that same row by simply providing three
column addresses and activating /CAS three times for each new column.

Figure 10 FPM Timing

It can be seen from figure 10, that FPM is faster than a regular read because it takes the
delays associated with both /RAS (tRAC and the /RAS precharge) and the row address
out of the equation for three of the four reads. All you have to deal with are /CAS-related
delays for those last three reads, which makes for less overhead and faster access and
cycle times. The first read takes a larger number of CPU cycles to complete (say, 6), and
the next three take a smaller number of cycles (say, 3). For an FPM DRAM where the
initial read takes 6 cycles and the successive three reads take 3 cycles.
One important thing to notice in the FPM DRAM diagram is that you can't latch the
column address for the next read until the data from the previous read is gone. Notice that
the Column 2 block doesn't overlap with the Data 1 block, nor does the Column 3 block
overlap with the Data 2 block, and so on. The output for one read has to be completely
finished before the next read can be started by placing the column address on the bus, so
there's a small delay imposed as depicted in figure 11.

15

Figure 11

EDO RAM

EDO RAM unlike FPM can start a read before the previous read's data is gone. With
EDO DRAM, you can hold output data on the pins for longer, even if it means that the
data from one read is on the pins at the same time that you're latching in the column
address for the next read.

Figure 12

16
A new access cycle can be started while keeping the data output of the previous cycle
active. This allows a certain amount of overlap in operation (pipelining).

When EDO first came out, there were claims of anywhere from 20% to 40% better
performance.
Since EDO can put out data faster than FPM, it can be used with faster bus speeds. With
EDO, you could increase the bus speed up to 66MHz without having to insert wait states.

SDRAM

A major technology change occurred in around 1997, when SDRAM (Synchronous
DRAM) first entered the marketplace. This was a completely new technology, which
operates synchronously with the system bus.
Data can (in burst mode) be fetched on every clock pulse. Thus the module can operate
fully synchronised with (at the same beat as) the bus without so-called wait states
(inactive clock pulses). Because they are linked synchronously to the system bus,
SDRAM modules can run at much higher clock frequencies.

Synchronous dynamic random access memory (SDRAM) is made up of multiple arrays
of single-bit storage sites arranged in a two-dimensional lattice structure formed by the
intersection of individual rows (Word Lines) and columns (Bit Lines). These grid-like
structures, called banks, provide an expandable memory space allowing the host control
process and other system components with direct access to main system memory to
temporarily write and read data to and from a centralised storage location.
When associated in groups of two (DDR), four (DDR2) or eight (DDR3), these banks
form the next higher logical unit, known as a rank.

17

Figure 13

Figure 13 shows the typical functional arrangement of SDRAM memory space. It shows
a dual-sided dual-rank 2GB SDRAM, which contains a total of 16 ICs, eight per side.
Each IC contains eight banks of addressable memory space comprising 16K pages and
1K column address starting points with each column storing a single 8-bit word. This
brings the total memory space to 128MB (16,384 rows/bank x 1,024 columns
addresses/row x 1 byte/column address x 8 stacked banks) per IC. And since there are
eight ICs per rank, Rank 1 is 1GB (128MB x 8 contiguous banks) in size, with the same
for Rank 2, for a grand total of 2GB per module.
In all the types of RAM covered so far the /RAS and /CAS have to be precharged before
they can be used again after being deactivated. In an SDRAM module with two banks,
18
you can have one bank busy precharging while the other bank is being used. This is
known as interleaving. Interleaving allows banks of SDRAM to alternate their refresh
and access cycles. One bank will undergo its refresh cycle while another is being
accessed. This improves performance of the SDRAM by masking the refresh time of each
bank.

SDRAM_control

Not only does an SDRAM's organisation into banks distinguish it from other types of
DRAMs, but so does the way it's controlled. Since asynchronous DRAM doesn't share
any sort of common clock signal with the CPU and chipset, the chipset has to manipulate
the DRAM's control pins based on all sorts of timing considerations. SDRAM, however,
shares the bus clock with the CPU and chipset, so the chipset can place commands (or,
certain predefined combinations of signals) on its control pins on the rising clock edge.

Figure 14

Figure 14 shows the steps required, broken down by clock cycle.

Clock 1: ACTIVATE the row by turning on /CS and /RAS. The row address is placed on
the address bus to determine which row to activate.
19

Clock 3: READ the column required from the activated row by turning on /CAS while
placing the column's address on the address bus.

Clocks 5-10: The data from the row and column that you gave the chip goes out onto the
Data Bus, followed by a BURST of other columns, the order of which depends on which
BURST MODE has been set.

While asynchronous DRAM like EDO and FPM are designed to allow you to burst data
onto the bus by keeping a row active and selecting only columns, SDRAM take this a
step further by giving the facility to program a chip to deliver data bursts in predefined
sequences.

SDRAM CAS timing
The last aspect of SDRAM that bears looking at is CAS latency.

When looking at memory data sheets a number of numbers separated by dashes gives the
latency of the device. e.g. the data sheet refers to: 9-9-9-24 (2T) for a memory chip.
These refer to CAS-tRCD-tRP-tRAS and CMD(respectively) and these values are
measured in clock cycles. To understand what these mean:

CAS Latency (1
st
number)
Since data is often accessed sequentially (same row), the CPU only needs to select
the next column in the row to get the next piece of data. In other words, CAS
Latency is the delay between the CAS signal and the availability of valid data on the
data pins. Therefore, the latency between column accesses (CAS), plays an
important role in the performance of the memory. The lower the latency, the better
the performance. However, the memory modules must be capable of supporting low
latency settings.
tRCD (2
nd
number)
There is a delay from when a row is activated to when the cell (or column) is
activated via the CAS signal and data can be written to or read from a memory cell.
i.e RAS to CAS delay. This delay is called tRCD. When memory is accessed
sequentially, the row is already active and tRCD will not have much impact.
However, if memory is not accessed in a linear fashion, the current active row must
be deactivated and then a new row selected/activated. It is this example where low
tRCD's can improve performance.
tRP (3
rd
number)
tRP is the time required to switch between rows. Therefore, in conjunction with
tRCD, the time required (or clock cycles required) to switch banks (or rows) and
select the next cell for either reading, writing or refreshing is a combination of tRP
and tRCD.

tRAS (4
th
number)
20
Memory architecture is like a spreadsheet with row upon row and column upon
column with each row being 1 bank. In order for the CPU to access memory, it must
first determine which Row or Bank in the memory that is to be accessed and activate
that row via the RAS signal. Once activated, the row can be accessed over and over
until the data is exhausted. This is why tRAS has little effect on overall system
performance but could impact system stability if set incorrectly.
Command Rate
The Command Rate is the time needed between the chip select signal and the when
commands can be issued to the RAM module IC. Typically, these are either 1 clock
or 2.

Bank interleaving
SDRAM divides memory into two to four banks for simultaneous access to more data.
This division and simultaneous access is known as interleaving. Using a notebook
analogy, two-way interleaving is like dividing each page in a notebook into two parts and
having two assistants to each retrieve a different part of the page. Even though each
assistant must take a break (be refreshed), breaks are staggered so that at least one
assistant is working at all times. Therefore, they retrieve the data much faster than a
single assistant could get the same data from one whole page, especially since no data can
be accessed when a single assistant takes a break. In other words, while one memory
bank is being accessed, the other bank remains ready to be accessed. This allows the
processor to initiate a new memory access before the previous access completes and
results in continuous data flow.

In an interleaved memory system, there are still two physical banks of DRAM, but
logically the system sees one bank of memory that is twice as large. In the interleaved
bank, the first long word of bank 0 is followed by the first long word of bank 1, which is
followed by the second long word of bank 0, which is followed by the second long word
of bank 1, and so on. Figure 3 shows this organisation for two physical banks of N long
words. All even long words of the logical bank are located in physical bank 0 and all odd
long words are located in physical bank 1.

21

DDR DRAM
22
DDR DRAM is basically just a more advanced version of SDRAM, with an added twist
at the data pins. Now SDRAM transfers its commands, addresses, and data on the rising
edge of the clock. Like regular SDRAM, DDR DRAM transfers its commands and
addresses on the rising edge of the clock, but unlike SDRAM it contains special circuitry
behind its data pins that allows it to transfer data on both the rising and falling edges of
the clock. So DDR can transfer two data words per clock cycle, as opposed to SDRAM's
one word per clock cycle, effectively doubling the speed at which it can be read from or
written to under optimal circumstances. Thus the DDR in DDR DRAM stands for Double
Data Rate a name that it gets from this ability to transfer twice the data per clock as an
SDRAM.

There are presently three generations of DDR memories:
1. DDR1 memory, with a maximum rated clock of 400 MHz and a 64-bit (8 bytes)
data bus is now becoming obsolete and is not being produced in massive
quantities.
2. DDR2 memory is the second generation in DDR memory. DDR2 starts with a
speed of 400 MHz to the lowest, while the 400-MHz speed is actually the highest
speed for DDR1. Therefore, DDR2 takes, where DDR1 leaves off. It's a bit
strange, but due to the different latency 400MHz DDR1 will exceed one 400MHz
DDR2, but the advantage back to> DDR2 when the speed is achieved, the next
step is 532MHz, the DDR1 can not reach.
3. DDR3 is the third generation in DDR memory. DDR3 memory provides a
reduction in power consumption of 30% compared to DDR2 modules due to
DDR3's 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V. The
main benefit of DDR3 comes from the higher bandwidth made possible by
DDR3's 8-burst-deep prefetch buffer, in contrast to DDR2's 4-burst-deep or
DDR's 2-burst-deep prefetch buffer.
DDR3 modules can transfer data at a rate of 8002133 MT/s (Memory Transfers)
using both rising and falling edges of a 4001066MHz I/O clock.

23
Memory Voltages
The industry-standard operating voltage for computer memory components was
originally 5 volts. However, as cell geometries decreased, memory circuitry became
smaller and more sensitive. Likewise, the industry-standard operating voltage decreased.
Today, computer memory components can operate as low 1.5 volts, which allows them to
run faster and consume less power.

Bandwidth
The bandwidth capacity of the memory bus increases with its width (in bits) and its
frequency (in MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz,
SDRAM increases memory bandwidth to 800 MB/s, 50 percent more than EDO DRAMs
(533 MB/s at 66 MHz).

DIMM error detection/correction technologies
Memory modules used in servers are inherently susceptible to memory errors. Memory
cells must be continuously recharged (refreshed) to preserve the data. The operating
voltage of the memory device determines the level of the electrical charge. However, if a
capacitors charge is affected by some external event, the data may become incorrect.
Depending on the cause, a memory error is referred to as either a hard or soft error. A
hard error is caused by a broken or defective piece of hardware, so the device consistently
returns incorrect results. For example, a memory cell may be stuck so that it always
returns 0 bit, even when a 1 bit is written to it. Hard errors can be caused by DRAM
defects, bad solder joints, connector issues, and other physical issues. Soft errors are
more prevalent. They occur randomly when an electrical disturbance near a memory cell
alters the charge on the capacitor. A soft error does not indicate a problem with a memory
device because once the stored data is corrected (for example, by a write to a memory
cell), the same error does not recur.
Two trends increase the likelihood of memory errors in servers:

expanding memory capacity and
increasing storage density.

Also two parameters of DRAM are inextricably tied together:

the storage density of the DRAM chips and
the operating voltage of the memory system.

As the size of memory cells decreases, both DRAM storage density and the memory-cell
voltage sensitivity increase. Initially, industry-standard DIMMs operated at 5 volts.
However, due to improvements in DRAM storage density, the operating voltage
decreased first to 3.3 V, then 2.5 V, and then 1.8 V to allow memory to run faster and
consume less power. Because memory storage density is increasing and operating voltage
is shrinking, there is a higher probability that an error may occur.

Basic ECC Memory
24
Parity checking detects only single-bit errors. It does not correct memory errors or detect
multi-bit errors. Every time data is written to memory, ECC (Error Correction Codes)
uses a special algorithm to generate values called check bits. The algorithm adds the
check bits together to calculate a checksum, which it stores with the data. When data is
read from memory, the algorithm recalculates the checksum and compares it with the
checksum of the written data. If the checksums are equal, then the data is valid and
operation continues. If they are different, the data has an error and the ECC memory logic
isolates the error and reports it to the system. In the case of a single-bit error, the ECC
memory logic can correct the error and output the corrected data so that the system
continues to operate

In addition to detecting and correcting single-bit errors, ECC detects (but does not
correct) errors of two random bits and up to four bits within a single DRAM chip.

25

Memory Hierarchy

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Memory Hierarchy

Încărcat de

Drepturi de autor:

Formate disponibile

1

S-ar putea să vă placă și