Documente Academic
Documente Profesional
Documente Cultură
SEMINAR REPORT
ON
Modern Era of Computing
(22nm Process)
By
Sonam Kumari
Guided by
Ashish Sharma
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
JODHPUR NATIONAL UNIVERSITY
2013 - 2014
CERTIFICATE
This is to certify that the seminar entitled Modern Era of Computing has been
carried out by Sonam Kumari under my guidance in partial fulfillment of the
degree of Master of Technology in Computer Science & Engineering of Jodhpur
National University, Jodhpur during the academic year 2013-2014. To the best of
my knowledge and belief, this work has not been submitted elsewhere for the
award of any other degree.
Guide
Examiner
II
HOD
ACKNOWLEDGEMENT
Even a burning desire cant be sustained in the absence of a proper
infrastructure. One has to have a direction to proceed in, a correct path to
follow. I attempt to thank all those people who have helped me in this small
endeavor.
I would like to articulate my sincere gratitude to Dr. V.P Gupta, Dean,
FE&T, Jodhpur National University & Prof. D.K.Mehta, Chariperson,
CSE deptt. for providing me right environment alongwith all the required
facilities.
I am fortunate enough to worked under the able guidance of Ashish
Sharma. I wish to express my sincere sense of gratitude to her. Her painstaking
guidance despite very busy schedule, her inspiring supervision & keen interest,
invaluable and tireless devotion, scientific approach and brilliant technological
acumen have been a source of tremendous help.
Finally, I extend my thanks towards all those sources which provided me
information related to my topic which proved very helpful for my seminar
report.
Sonam Kumari
III
PAGE INDEX
S.No.
1.
TOPICS
Page No.
ABSTRACT
VI
INTRODUCTION. 1
1.1 Moores
Law.
1.2 History
2.
GENERATIONS 16
3.
45 nm Process..... 23
4.
32 nm Process. 33
5.
36
5.1 Introduction. 36
5.2 Applications.
6.
40
20 nm Process..... 49
6.1
49
Introduction..
6.2 Comparison between 32nm Process and 22nm
50
Process
IV
7.
6.3 Microarchitecture..
54
6.4 Facts
64
References..
.
65
FIGURE INDEX
Figure
1.1
1.2
1.3
1.4
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
5.1
5.2
5.3
6.1
6.2
6.3
6.4
6.5
6.6
6.7
No.
Figure Caption
Moores Law chart..
Design Rule.
Transistor Count by Year Graph.
Node Size v/s nm process Graph
NMOS and PMOS Transistor.
Stress Graph of Transistor..
iOn & iOff Graphn of 10% iDsat Benefit..
iOn & iOff Graphn of 6% iDsat Benefit
iOn & iOff Graphn of Various iDsat Benefit..
RO Gain Data.
RO Gain Data v/s 65 nm Results
Microscopic Cross Section of 65nm process.
Carbon Nano tubes Structure.
Carbon Nano tubes Structure.
Carbon Nano tubes as transistors
Traditional Planner Transistor...
22nm Tri-gate Transistor
22nm Tri-gate Transistor: 2D.
22nm Tri-gate Transistor: 3D.
Microscopic Image of 32nm Planner Transistor
Microscopic Image of 22nm Planner Transistor
Microarchitecture ..
Page No
1
10
12
13
26
27
27
28
29
29
30
30
38
46
47
50
50
51
51
52
52
53s
Abstract:
Intel has deployed a fundamentally different technology for future
microprocessor families: 3D transistors manufactured at 22 nm. These new
transistors enable Intel to continue to relentlessly pursue Moore's Law and to
ensure that the pace of technology advancement consumers expect, can
continue for years to come.
Previously, transistors, the core of microprocessors, were 2D (planar) devices.
Intel's 3D Tri-Gate transistor, and the ability to manufacture it in high volume,
mark a dramatic change in the fundamental structure of the computer chip.
Learn more about the history of transistors.
This also means Intel can continue to lead in powering products, from the
world's fastest supercomputers to very small mobile handhelds.
Smaller is Better
Transistor size and structure are at the very center of delivering the benefits of
Moore's Law to the end user. The smaller and more power efficient the
transistor, the better. Intel continues to predictably shrink its manufacturing
technology in a series of "world firsts": 45 nm with high-k/metal gate in 2007;
32 nm in 2009; and now 22 nm with the world's first 3D transistor in a high
volume logic process beginning in 2011.
With a smaller, 3D transistor, Intel can design even more powerful processors
with incredible power efficiency. The new technology enables innovative
microarchitectures, System on Chip (SoC) designs, and new productsfrom
servers and PCs to smart phones, and innovative consumer products.
The new transistors are so impressively efficient at low voltages they allow the
Intel Atom processor design team to innovate new architectural
approaches for 22 nm Intel Atom microarchitecture. The new design
specifically maximizes the benefit of the extremely low-power 3D Tri-Gate
transistor technology. And, Intel's future SoC products based on the 22 nm 3D
Tri-Gate transistors will hit sub 1 mW idle powerfor incredibly low-power
SoCs.
VII
VIII
CHAPTER-1
INTRODUCTION
Moore's law
Moore's law is the observation that, over the history of computing hardware,
the number of transistors in a dense integrated circuit doubles approximately
every two years. The law is named after Gordon E. Moore, co-founder of the
Intel Corporation, who described the trend in his 1965 paper. His prediction
has proven to be accurate, in part because the law is now used in the
semiconductor industry to guide long-term planning and to set targets for
research and development. The capabilities of many digital electronic
devices are strongly linked to Moore's law: quality-adjusted microprocessor
prices, memory capacity, sensors and even the number and size of pixels in
digital cameras. All of these are improving at roughly exponential rates as
well.
History
For the 35th anniversary issue of Electronics Magazine which was published
on April 19, 1965, Gordon E. Moore, who was currently working as the
Director of R&D at Fairchild Semiconductor, was asked to predict what was
going to happen in the semiconductor components industry over the next 10
years. His response was a brief article entitled, "Cramming more
components onto integrated circuits". Within his editorial, he speculated that
by 1975 it would be possible to contain as many as 65,000 components on a
single quarter-inch semiconductor.
The complexity for minimum component costs has increased at a rate of
roughly a factor of two per year. Certainly over the short term this rate can
be expected to continue, if not to increase. Over the longer term, the rate of
increase is a bit more uncertain, although there is no reason to believe it
will remain nearly constant for at least 10 years.
His reasoning was a log-linear relationship between device complexity
(higher circuit density at reduced cost) and time:
In 1975 Moore slowed his forecast regarding the rate of density-doubling,
stating circuit density-doubling would occur every 24 months. During the
2
opportunity to predict the future trend of digital camera price, LCD and LED
screens and resolution.
The great Moore's law compensator (TGMLC), generally referred to as
bloat, and also known as Wirth's law, is the principle that successive
generations of computer software acquire enough bloat to offset the
performance gains predicted by Moore's law. In a 2008 article in InfoWorld,
Randall C. Kennedy] formerly of Intel, introduces this term using successive
versions of Microsoft Office between the year 2000 and 2007 as his premise.
Despite the gains in computational performance during this time period
according to Moore's law, Office 2007 performed the same task at half the
speed on a prototypical year 2007 computer as compared to Office 2000 on a
year 2000 computer.
Library expansion was calculated in 1945 by Fremont Rider to double in
capacity every 16 years, if sufficient space were made available. He
advocated replacing bulky, decaying printed works with miniaturized
microform analog photographs, which could be duplicated on-demand for
library patrons or other institutions. He did not foresee the digital technology
that would follow decades later to replace analog microform with digital
imaging, storage, and transmission mediums. Automated, potentially lossless
digital technologies allowed vast increases in the rapidity of information
growth in an era that is now sometimes called an Information Age.
The Carlson Curve is a term coined by The Economist to describe the
biotechnological equivalent of Moore's law, and is named after author Rob
Carlson. Carlson accurately predicted that the doubling time of DNA
sequencing technologies (measured by cost and performance) would be at
least as fast as Moore's law. Carlson Curves illustrate the rapid (in some
cases hyperexponential) decreases in cost, and increases in performance, of a
variety of technologies, including DNA sequencing, DNA synthesis and a
range of physical and computational tools used in protein expression and in
determining protein structures.
As a target for industry and a self-fulfilling prophecy
Although Moore's law was initially made in the form of an observation and
forecast, the more widely it became accepted, the more it served as a goal
for an entire industry. This drove both marketing and engineering
departments of semiconductor manufacturers to focus enormous energy
aiming for the specified increase in processing power that it was presumed
6
one or more of their competitors would soon actually attain. In this regard, it
can be viewed as a self-fulfilling prophecy.
Moore's second law
As the cost of computer power to the consumer falls, the cost for producers
to fulfill Moore's law follows an opposite trend: R&D, manufacturing, and
test costs have increased steadily with each new generation of chips. Rising
manufacturing costs are an important consideration for the sustaining of
Moore's law. This had led to the formulation of Moore's second law, also
called Rock's law, which is that the capital cost of a semiconductor fab also
increases exponentially over time.
Major enabling factors and future trends
Numerous innovations by a large number of scientists and engineers have
helped significantly to sustain Moore's law since the beginning of the
integrated circuit (IC) era. Whereas assembling a detailed list of such
significant contributions would be as desirable as it would be difficult,
below just a few innovations are listed as examples of breakthroughs that
have played a critical role in the advancement of integrated circuit
technology by more than seven orders of magnitude in less than five
decades:
The foremost contribution, which is the raison detre for Moore's law, is the
invention of the integrated circuit itself, credited contemporaneously to Jack
Kilby at Texas Instruments and Robert Noyce at Fairchild Semiconductor.
The invention of the complementary metaloxidesemiconductor (CMOS)
process by Frank Wanlass in 1963. A number of advances in CMOS
technology by many workers in the semiconductor field since the work of
Wanlass have enabled the extremely dense and high-performance ICs that
the industry makes today.
The invention of the dynamic random access memory (DRAM) technology
by Robert Dennard at I.B.M. in 1967. that made it possible to fabricate
single-transistor memory cells, and the invention of flash memory by Fujio
Masuoka at Toshiba in the 1980s, leading to low-cost, high-capacity memory
in diverse electronic products.
The invention of chemically amplified photoresist by C. Grant Willson,
Hiroshi Ito and J.M.J. Frchet at IBM c.1980, that was 10-100 times more
sensitive to ultraviolet light. IBM introduced chemically amplified
photoresist for DRAM production in the mid-1980s.
7
describes a control gate around a silicon nanowire that can tighten around
the wire to the point of closing down the passage of electrons without the
use of junctions or doping. The researchers claim that the new junctionless
transistors can be produced at 10-nanometer scale using existing fabrication
techniques.
In April 2011, a research team at the University of Pittsburgh announced the
development of a single-electron transistor 1.5 nanometers in diameter made
out of oxide based materials. According to the researchers, three "wires"
converge on a central "island" which can house one or two electrons.
Electrons tunnel from one wire to another through the island. Conditions on
the third wire result in distinct conductive properties including the ability of
the transistor to act as a solid state memory.
In February 2012, a research team at the University of New South Wales
announced the development of the first working transistor consisting of a
single atom placed precisely in a silicon crystal (not just picked from a large
sample of random transistors). Moore's law predicted this milestone to be
reached in the lab by 2020.
In April 2014, bioengineers at Stanford University developed a new circuit
board modeled on the human brain. 16 custom designed "Neurocore" chips
simulate 1 million neurons and billions of synaptic connections. This
Neurogrid is claimed to be 9,000 times faster and more energy efficient than
a typical PC. The cost of the prototype was $40,000; however with current
technology a similar Neurogrid could be made for $400.
The advancement of nanotechnology could spur the creation of microscopic
computers and restore Moore's Law to its original rate of growth.
Ultimate limits of the law
Atomistic simulation result for formation of inversion channel (electron
density) and attainment of threshold voltage (IV) in a nanowire MOSFET.
Note that the threshold voltage for this device lies around 0.45 V. Nanowire
MOSFETs lie towards the end of the ITRS roadmap for scaling devices
below 10 nm gate lengths.
On 13 April 2005, Gordon Moore stated in an interview that the law cannot
be sustained indefinitely: "It can't continue forever. The nature of
exponentials is that you push them out and eventually disaster happens". He
also noted that transistors would eventually reach the limits of
miniaturization at atomic levels:
In terms of size [of transistors] you can see that we're approaching the size
of atoms which is a fundamental barrier, but it'll be two or three generations
before we get that farbut that's as far out as we've ever been able to see.
We have another 10 to 20 years before we reach a fundamental limit. By
then they'll be able to make bigger chips and have transistor budgets in the
billions.
In January 1995, the Digital Alpha 21164 microprocessor had 9.3 million
transistors. This 64-bit processor was a technological spearhead at the time,
even if the circuit's market share remained average. Six years later, a state of
the art microprocessor contained more than 40 million transistors. It is
predictions that Moore's law will collapse in the next few decades [2040
years]".
One could also limit the theoretical performance of a rather practical
"ultimate laptop" with a mass of one kilogram and a volume of one litre.
This is done by considering the speed of light, the quantum scale, the
gravitational constant and the Boltzmann constant, giving a performance of
5.42581050 logical operations per second on approximately 1031 bits.
Then again, the law has often met obstacles that first appeared
insurmountable but were indeed surmounted before long. In that sense,
Moore says he now sees his law as more beautiful than he had realized:
"Moore's law is a violation of Murphy's law. Everything gets better and
better."
Futurists and Moore's law
Kurzweil's extension of Moore's law from integrated circuits to earlier
transistors, vacuum tubes, relays and electromechanical computers.
If the current trend continues to 2020, the number of transistors would reach
32 billion.
Futurists such as Ray Kurzweil, Bruce Sterling, and Vernor Vinge believe
that the exponential improvement described by Moore's law will ultimately
lead to a technological singularity: a period where progress in technology
occurs almost instantly.
Although Kurzweil agrees that by 2019 the current strategy of ever-finer
photolithography will have run its course, he speculates that this does not
mean the end of Moore's law:
Moore's law of Integrated Circuits was not the first, but the fifth paradigm to
forecast accelerating price-performance ratios. Computing devices have
been consistently multiplying in power (per unit of time) from the
mechanical calculating devices used in the 1890 U.S. Census, to [Newman]
relay-based "[Heath] Robinson" machine that cracked the Lorenz cipher, to
the CBS vacuum tube computer that predicted the election of Eisenhower, to
the transistor-based machines used in the first space launches, to the
integrated-circuit-based personal computer.
Kurzweil speculates that it is likely that some new type of technology (e.g.
optical, quantum computers, DNA computing) will replace current
integrated-circuit technology, and that Moore's Law will hold true long after
2020.
11
3.4% per year in 1997-2004, outpacing the 1.6% per year during both 19721996 and 2005-2013. As economist Richard G. Anderson notes, Numerous
studies have traced the cause of the productivity acceleration to
technological innovations in the production of semiconductors that sharply
reduced the prices of such components and of the products that contain them
(as well as expanding the capabilities of such products).
Intel transistor gate length trend. Transistor scaling has slowed down
significantly at advanced (smaller) nodes.
While physical limits to transistor scaling such as source-to-drain leakage,
limited gate metals, and limited options for channel material have been
reached, new avenues for continued scaling are open. The most promising of
these approaches rely on using the spin state of electron spintronics, tunnel
junctions, and advanced confinement of channel materials via nano-wire
geometry. A comprehensive list of available device choices shows that a
wide range of device options is open for continuing Moore's law into the
next few decades. Spin-based logic and memory options are actively being
developed in industrial labs as well as academic labs.
Another source of improved performance is in microarchitecture techniques
exploiting the growth of available transistor count. Out-of-order execution
and on-chip caching and prefetching reduce the memory latency bottleneck
at the expense of using more transistors and increasing the processor
complexity. These increases are empirically described by Pollack's Rule
which states that performance increases due to microarchitecture techniques
are square root of the number of transistors or the area of a processor.
For years, processor makers delivered increases in clock rates and
instruction-level parallelism, so that single-threaded code executed faster on
newer processors with no modification. Now, to manage CPU power
dissipation, processor makers favor multi-core chip designs, and software
has to be written in a multi-threaded manner to take full advantage of the
hardware. Many multi-threaded development paradigms introduce overhead,
and will not see a linear increase in speed vs number of processors. This is
particularly true while accessing shared or dependent resources, due to lock
contention. This effect becomes more noticeable as the number of processors
increases. There are cases where a roughly 45% increase in processor
transistors have translated to roughly 1020% increase in processing power.
On the other hand, processor manufactures are taking advantage of the 'extra
space' that the transistor shrinkage provides to add specialized processing
units to deal with features such as graphics, video and cryptography. For one
14
example, Intel's Parallel JavaScript extension not only adds support for
multiple cores, but also for the other non-general processing features of their
chips, as part of the migration in client side scripting towards HTML5.
A negative implication of Moore's law is obsolescence, that is, as
technologies continue to rapidly "improve", these improvements can be
significant enough to rapidly render predecessor technologies obsolete. In
situations in which security and survivability of hardware or data are
paramount, or in which resources are limited, rapid obsolescence can pose
obstacles to smooth or continued operations. Because of the toxic materials
used in the production of modern computers, obsolescence if not properly
managed can lead to harmful environmental impacts.
Moore's law has significantly impacted the performance of other
technologies: Michael S. Malone wrote of a Moore's War following the
apparent success of shock and awe in the early days of the Iraq War.
Progress in the development of guided weapons depends on electronic
technology. Improvements in circuit density and low-power operation
associated with Moore's law have also contributed to the development of
Star Trek-like technologies including mobile phones and replicator-like 3D
printing.
15
CHAPTER-2
GENERATIONS
8086
Introduced June 8, 1978
,Number of transistors 29,000 at 3 m
The first x86 CPU.
Later renamed the iAPX 86[4]
8088
Introduced June 1, 1979
External bus Width 8 bits data, 20 bits address
Number of transistors 29,000 at 3 m
Addressable memory 1 megabyte
80186
Introduced 1982
Number of transistors ~78,999 2 m
80286
Introduced February 2, 1982
Number of transistors 134,000 at 1.5 m
80386DX
Introduced October 17, 1985
Number of transistors 275,000 at 1 m
80386SX
Introduced June 16, 1988
Number of transistors 275,000 at 1 m
Later renamed Intel386TM SX
16
80386SL
Introduced October 15, 1990
Number of transistors 855,000 at 1 m
80486DX
Introduced April 10, 1989
Number of transistors 1.2 million at 1 m; the 50 MHz was at 0.8 m
80486SX
Introduced April 22, 1991
Number of transistors 1.185 million at 1 m and 900,000 at 0.8 m
80486SL
Introduced November 9, 1992
Number of transistors 1.4 million at 0.8 m
80486DX4
Introduced March 7, 1994
Number of transistors 1.6 million at 0.6 m
P5 0.8 m process technology
Introduced March 22, 1993
Number of transistors 3.1 million
The only Pentium running on 5 Volts
P54 0.6 m process technology
Number of transistors 3.2 million
Introduced October 10, 1994
P54CQS 0.35 m process technology
Number of transistors 3.2 million
Introduced March 27, 1995
P54CS 0.35 m process technology
Number of transistors 3.3 million
Introduced June 12, 1995
17
18
19
Pentium 4F
Prescott-2M built on 0.09 m (90 nm) process technology
Introduced February 20, 2005
Cedar Mill built on 0.065 m (65 nm) process technology
Introduced January 16, 2006
Pentium D
Smithfield 90 nm process technology (2.663.2 GHz)
Introduced May 26, 2005
Presler 65 nm process technology (2.83.6 GHz)
Introduced January 16, 2006
Pentium D 945
Smithfield 90 nm process technology (3.2 GHz)
Presler 65 nm process technology (3.46, 3.73)
Xeon
Introduced 200
Dempsey 65 nm process (2.67 3.73 GHz)
Introduced May 23, 2006
Intel Core 2
Conroe 65 nm process technology
Desktop CPU (SMP support restricted to 2 CPUs)
Two cores on one die
Introduced July 27, 2006
SSSE3 SIMD instructions
Number of transistors: 291 million
Pentium Dual-Core
Allendale 65 nm process technology
Desktop CPU (SMP support restricted to 2 CPUs)
Two cores on one die
Introduced January 21, 2007
SSSE3 SIMD instructions
Number of transistors 167 million
20
Intel Pentium
Clarkdale 32 nm process technology
Introduced January 2010
Core i3
Clarkdale 32 nm process technology
Introduced January, 2010
Core i5
Lynnfield 45 nm process technology
4 physical cores
Introduced January, 2010
Core i7
Bloomfield 45 nm process technology
4 physical cores
781 million transistors
Introduced November 17, 2008
Celeron
Sandy Bridge 32 nm process technology
2 physical cores/2 threads (500 series), 1 physical core/1 thread (model
G440) or 1 physical core/2 threads (models G460 & G465)
2 MB L3 cache (500 series), 1 MB (model G440) or 1.5 MB (models G460
& G465)
Introduced 3rd quarter, 2011
Pentium
Sandy Bridge 32 nm process technology
2 physical cores/2 threads
624 million transistors
Introduced May, 2011
Core i3
Sandy Bridge 32 nm process technology
2 physical cores/4 threads
624 million transistors
Introduced January, 2011
21
Core i3
Ivy Bridge 22 nm Tri-gate transistor process technology
2 physical cores/4 threads
32+32 Kb (per core) L1 cache
Core i5
Sandy Bridge 32 nm process technology
4 physical cores/4 threads (except for i5-2390T which has 2 physical cores/4
threads)
995 million transistors
Introduced January, 2011
Core i7
Sandy Bridge 32 nm process technology
4 physical cores/8 threads
995 million transistors
Introduced January, 2011
Sandy Bridge-E 32 nm process technology
Core i7 Haswell
2270 million transistors
Introduced November, 2011
Sandy Bridge 22 nm process technology
Coming Processors
8000+ million transistors
To be Introduced in, 2015
8-4nm Carbon Tubes process technology
22
CHAPTER-3
45 nm PROCESS
Per the International Technology Roadmap for Semiconductors, the 45
nanometer (45 nm) technology node should refer to the average half-pitch of
a memory cell manufactured at around the 20072008 time frame.
Matsushita and Intel started mass-producing 45 nm chips in late 2007, and
AMD started production of 45 nm chips in late 2008, while IBM, Infineon,
Samsung, and Chartered Semiconductor have already completed a common
45 nm process platform. At the end of 2008, SMIC was the first China-based
semiconductor company to move to 45 nm, having licensed the bulk 45 nm
process from IBM.
Many critical feature sizes are smaller than the wavelength of light used for
lithography (i.e., 193 nm and 248 nm). A variety of techniques, such as
larger lenses, are used to make sub-wavelength features. Double patterning
has also been introduced to assist in shrinking distances between features,
especially if dry lithography is used. It is expected that more layers will be
patterned with 193 nm wavelength at the 45 nm node. Moving previously
loose layers (such as Metal 4 and Metal 5) from 248 nm to 193 nm
wavelength is expected to continue, which will likely further drive costs
upward, due to difficulties with 193 nm photoresists.
High-k dielectrics
Chipmakers have initially voiced concerns about introducing new high-k
materials into the gate stack, for the purpose of reducing leakage current
density. As of 2007, however, both IBM and Intel have announced that they
have high-k dielectric and metal gate solutions, which Intel considers to be a
fundamental change in transistor design. NEC has also put high-k materials
into production.
Key Points
At IEDM 2007, more technical details of Intel's 45 nm process were
revealed.
Since immersion lithography is not used here, the lithographic patterning is
more difficult. Hence many lines have been lengthened rather than
shortened. A more time-consuming double patterning method is used
explicitly for this 45 nm process, resulting in potentially higher risk of
product delays than before. Also, the use of high-k dielectrics is introduced
23
for the first time, to address gate leakage issues. For the 32 nm node,
immersion lithography will begin to be used by Intel.
1. 160 nm gate pitch (73% of 65 nm generation)
2. 200 nm isolation pitch (91% of 65 nm generation) indicating a
slowing of scaling of isolation distance between transistors
3. Extensive use of dummy copper metal and dummy gates
4. 35 nm gate length (same as 65 nm generation)
5. 1 nm equivalent oxide thickness, with 0.7 nm transition layer
6. Gate-last process using dummy polysilicon and damascene metal gate
7. Squaring of gate ends using a second photoresist coating
8. 9 layers of carbon-doped oxide and Cu interconnect, the last being a
thick "redistribution" layer
9. Contacts shaped more like rectangles than circles for local
interconnection
10.Lead-free packaging
11.1.36 mA/um nFET drive current
12.1.07 mA/um pFET drive current, 51% faster than 65 nm generation,
with higher hole mobility due to increase from 23% to 30% Ge in
embedded SiGe stressors
In a recent Chipworks reverse-engineering, it was disclosed that the trench
contacts were formed as a "Metal-0" layer in tungsten serving as a local
interconnect. Most trench contacts were short lines oriented parallel to the
gates covering diffusion, while gate contacts where even shorter lines
oriented perpendicular to the gates.
It was recently revealed that both the Nehalem and Atom microprocessors
used SRAM cells containing eight transistors instead of the conventional six,
in order to better accommodate voltage scaling. This resulted in an area
penalty of over 30%.
Process
Two key process features that are used to make 45nm generation metal gate
+ high-k gate dielectric CMOS transistors are highlighted in this paper. The
first feature is the integration of stress-enhancement techniques with the dual
metal-gate + high-k transistors. The second feature is the extension of
193nm dry lithography to the 45nm technology node pitches. Use of these
features has enabled industry-leading transistor performance and the first
high volume 45nm high-k + metal gate technology.
High-k + metal gate transistors have been incorporated into our 45nm logic
technology to provide improved performance and significantly reduced gate
leakage . Hi-k + Metal gates have also been shown to have improved
24
variability at the 45nm node. The transistors in this work feature 1.0nm EOT
high-k gate dielectrics with dual workfunction metal gate electrodes and
35nm gate lengths. The addition of new gate materials is complicated by the
need to mesh the process requirements of the metal gate process with the
uniaxial strain-inducing components that have become central to the
transistor architecture. The resultant process flow needs to ensure that the
performance benefits of both elements are fully realized.
The standard scaling requirements for the strained silicon components and
for the gate and contact pitches also needs to be addressed at the 45nm node.
Using 193nm dry lithography for critical layers at the 45nm technology node
is preferred over moving to 193nm immersion lithography due to lower cost
and greater maturity of the toolset. In order to achieve the 160nm gate and
contact pitch requirements, unique gate and contact patterning process flows
have been implemented.
Strain + Metal Gate: Key process considerations/results
The most commonly used techniques for implementing strain in the
transistors include embedded SiGe in the PMOS S/D, stress memorization
for the NMOS and a nitride stress capping layer for NMOS and PMOS
devices. The two common methods for introducing a metal gate to the
standard CMOS flow include, either gate-first or gate-last process. Most
comparisons of these two process flows focus on the ability to select the
appropriate workfunction metals, the ease of integration or the ability to
scale but typically fail to comprehend the interaction with the straininducing techniques.
25
26
27
By this process PMOS transistors before and after 50% of increase in Stress
to 1.2GPa resulting in heating of processor core, red sections shows highest
stress area ~1.2GPa and black areas shows lowest stress area ~0.8GPa.
The Ge concentration of the SiGe stressors was increased from 22% in our
65nm technology to 30% in 45nm. The combined impact of the increased
Ge fraction and the strain enhancement from the gate last process allow for
1.5x higher hole mobility compared to 65nm despite the scaling of the
transistor pitch from 220nm to 160nm.
Two methods of stress enhancement have been employed on the NMOS in
this technology. First, the loss of the nitride stress layer benefit due to
scaling the pitch from 65nm has been overcome by the introduction of
trench contacts and tailoring the contact fill material to induce a tensile
stress in the channel. The NMOS response to tensile vs. compressive contact
fill materials is shown in figure.
The trench contact fill material impact on the PMOS device is mitigated by
use of the raised S/D inherent to the embedded SiGe S/D process. The S/D
component of stress memorization is compatible with the gate-last flow
28
while the poly gate component would be compromised. The poly gate
component is replaced by Metal Gate Stress (MGS): modifying the metalgate fill material to directly induce stress in the channel. By introducing a
compressive stress gate fill material the performance of the NMOS device is
enhanced and additive to the contact fill technique.
By use of a dual-metal process with PMOS 1st, the stress of the NMOS gate
is decoupled from the PMOS gate through optimization of the PMOS gate
stack to buffer the stress. Through the strain enhancement and elimination of
poly depletion both the saturation and linear drive currents improved.
29
The table in figure breaks out the RO gains between Idsat, Idlin and the gate
and junction capacitances.
193nm Dry Patterning @ 45nm.
The gate patterning process uses a double patterning scheme. Initially the
gate stack is deposited including the polysilicon and hardmask deposition.
The first lithography step patterns a series of parallel, continuous lines. Only
discrete pitches are allowed, with the smallest at 160nm. A second masking
step is then used to define the cuts in the lines. The 2-step process enables
abrupt poly endcap
regions allowing
tight CTG design
rules
30
CHAPTER-4
32 nm PROCESS
The 32 nanometer (32 nm) node is the step following the 45 nanometer
process in CMOS semiconductor device fabrication. "32 nanometer" refers
to the average half-pitch (i.e., half the distance between identical features) of
a memory cell at this technology level. Intel and AMD both produced
commercial microchips using the 32 nanometer process in the early 2010s.
IBM and the Common Platform also developed a 32 nm high-k metal gate
process. Intel began selling its first 32 nm processors using the Westmere
architecture on 7 January 2010. The 32 nm process was superseded by
commercial 22 nm technology in 2012.
Technology demos
Prototypes using 32 nm technology first emerged in the mid-2000s. In 2004,
IBM demonstrated a 0.143 m2 SRAM cell with a poly gate pitch of 135
nm, produced using electron-beam lithography and photolithography on the
same layer. It was observed that the cell's sensitivity to input voltage
fluctuations degraded significantly at such a small scale. In October 2006,
32
34
CHAPTER-5
CARBON NANO TUBES
Introduction
The Amazing and Versatile Carbon Chemical basis for life
With an atomic number of 6, Carbon is the 4th most abundant
element in the Universe by mass after (Hydrogen Helium and
Oxygen). It forms more compounds that any other element, with
almost 10 million pure organic compounds. Abundance, together
with the unique diversity of organic compounds and their unusual
polymer forming ability at the temperatures commonly
encountered on Earth makes the element the chemical basis of all
known life.
Carbon Nanotubes
35
Batteries
41
Electronic Heterogeneity
One problem with nanotube production for electronics is that batches of
nanotubes are heterogeneous mixtures of metallic and semiconducting tube
types. Electrical devices typically require these types to be separated, but so
far it has been difficult to tune production in this regard. There also remain
issues with doping, or tuning conductivity, and electrical behavior at contact
points. (Rogers, UIUC).
Orientation
A problem with nanotubes where recent progress has been made is
controlling their orientation. Nanotubes are commonly grown in a chaotic
organization (affectionately known as a "rat's nest"), which are difficult to
use in microprocessors. Recently John Rogers and his team at the University
of Illinois, Urbana Champagne, discovered that carefully growing nanotubes
on quartz wafers can lead to a highly organized configuration.
Size and Density
The size of manufactured nanotubes typically varies widely. For commercial
use, nanotube manufacturers will need to make size more consistent. Though
nanotubes are very narrow, nanotube matrices typically have quite large
(100nm) spacing between tubes.
Export Policy
As with other multi-use technologies, nanotubes may be subject to export
controls. Finding information about this was difficult, multiple sources were
unaware of restrictions but in at least one case a foreign researcher was
denied access to nanotubes. Adelina Santos, a Brazilian nuclear scientist,
says a U.S. based supplier refused to ship him nanotubes due to federal
regulations. However, restrictions seem difficult to enforce; Santos had a
friend smuggle a gram of nanotubes to him Current customs protocols
probably do not place a priority on detecting nanotubes.
Export restrictions could slow adoption of nanotube technologies and
prevent standardization. Regulation of commerce in nanotube technology
will increase costs.
Environmental concerns
The environmental risks of nanotubes are still unclear[6]. Naturally
occurring carbon is fairly benign, and is largely unregulated, but nanotubes
interact with the environment differently. There have been several studies
performed to test the effects of carbon nanotubes on living systems.
1. Fruit fly larvae fed a diet containing nanotubes appeared to develop
normally.
43
45
46
embedded in the holes and start to form nanotubes that are "templated"
from the shape of the tunnel. It turns out that the carbon nanotubes grow
very long and very well aligned, in the angle of the tunnel.
The advantages of this method are that the yield is very high, the alignment
of the nanotubes is consistent (which is crucial for creating particular types
of nanotubes, e.g. semiconductor or metallic), and the size of the growth
area is theoretically arbitrary.
The main disadvantage is that, though the size of the growth area is basically
arbitrary, large sized areas (several millimeters) tend to crack, shrink, and
otherwise warp. The substrates need to be dried very thoroughly to prevent
against this.
n-hexane Pyrolysis
Researchers developed a method to synthesize large, long single walled
nanotube bundles in a vertical furnace by pyrolyzing hexane molecules.
These n-hexane molecules are mixed with certain other chemicals that have
been shown independently to help with growth of nanotubes. These are
burned (pyrolyzed) at a very high temperature in a flow of hydrogen and
other optional gases. According to the paper, using a different hydrocarbon
or using a different gas prevented the formation of long nanotubes.
The primary advantage of this method is that it produces macroscopic
nanotube bundles ("microtubes"): their diameters are typically larger than
that of human hair, and their length is several centimeters. The disadvantage
is that the alignment is not as produced from other methods, making it viable
for creating "microcables", but not nanotubes with precise electrical
47
CHAPTER-6
22 nm PROCESS
The 22 nanometer (22 nm) is the process step following the 32 nm in CMOS
semiconductor device fabrication. The typical half-pitch (i.e., half the
distance between identical features in an array) for a memory cell using the
process is around 22 nm. It was first introduced by semiconductor
companies in 2008 for use in memory products, while first consumer-level
CPU deliveries started in April 2012.
48
49
50
51
52
53
Microarchitecture
54
Aspects of microarchitecture
Intel 80286 microarchitecture
The pipelined datapath is the most commonly used datapath design in
microarchitecture today. This technique is used in most modern
microprocessors, microcontrollers, and DSPs. The pipelined architecture
allows multiple instructions to overlap in execution, much like an assembly
line. The pipeline includes several different stages which are fundamental in
microarchitecture designs. Some of these stages include instruction fetch,
instruction decode, execute, and write back. Some architectures include
other stages such as memory access. The design of pipelines is one of the
central microarchitectural tasks.
Execution units are also essential to microarchitecture. Execution units
include arithmetic logic units (ALU), floating point units (FPU), load/store
units, branch prediction, and SIMD. These units perform the operations or
calculations of the processor. The choice of the number of execution units,
their latency and throughput is a central microarchitectural design task. The
size, latency, throughput and connectivity of memories within the system are
also microarchitectural decisions.
56
Chip area/cost
Power consumption
Logic complexity
Ease of connectivity
Manufacturability
Ease of debugging
Testability
Microarchitectural concepts
Instruction cycle
Main article: instruction cycle
been slower than the processor itself. Step (2) often introduces a lengthy (in
CPU terms) delay while the data arrives over the computer bus. A
considerable amount of research has been put into designs that avoid these
delays as much as possible. Over the years, a central goal was to execute
more instructions in parallel, thus increasing the effective execution speed of
a program. These efforts introduced complicated logic and circuit structures.
Initially, these techniques could only be implemented on expensive
mainframes or supercomputers due to the amount of circuitry needed for
these techniques. As semiconductor manufacturing progressed, more and
more of these techniques could be implemented on a single semiconductor
chip. See Moore's law.
Instruction pipelining
One of the first, and most powerful, techniques to improve performance is
the use of the instruction pipeline. Early processor designs would carry out
all of the steps above for one instruction before moving onto the next. Large
58
portions of the circuitry were left idle at any one step; for instance, the
instruction decoding circuitry would be idle during execution and so on.
Pipelines improve performance by allowing a number of instructions to
work their way through the processor at the same time. In the same basic
example, the processor would start to decode (step 1) a new instruction
while the last one was waiting for results. This would allow up to four
instructions to be "in flight" at one time, making the processor look four
times as fast. Although any one instruction takes just as long to complete
(there are still four steps) the CPU as a whole "retires" instructions much
faster.
RISC make pipelines smaller and much easier to construct by cleanly
separating each stage of the instruction process and making them take the
same amount of time one cycle. The processor as a whole operates in an
assembly line fashion, with instructions coming in one side and results out
the other. Due to the reduced complexity of the Classic RISC pipeline, the
pipelined core and an instruction cache could be placed on the same size die
that would otherwise fit the core alone on a CISC design. This was the real
reason that RISC was faster. Early designs like the SPARC and MIPS often
ran over 10 times as fast as Intel and Motorola CISC solutions at the same
clock speed and price.
Pipelines are by no means limited to RISC designs. By 1986 the top-of-theline VAX implementation (VAX 8800) was a heavily pipelined design,
slightly predating the first commercial MIPS and SPARC designs. Most
modern CPUs (even embedded CPUs) are now pipelined, and microcoded
CPUs with no pipelining are seen only in the most area-constrained
embedded processors. Large CISC machines, from the VAX 8800 to the
modern Pentium 4 and Athlon, are implemented with both microcode and
pipelines. Improvements in pipelining and caching are the two major
microarchitectural advances that have enabled processor performance to
keep pace with the circuit technology on which they are based.
Cache
It was not long before improvements in chip manufacturing allowed for even
more circuitry to be placed on the die, and designers started looking for
ways to use it. One of the most common was to add an ever-increasing
59
Branch prediction
One barrier to achieving higher performance through instruction-level
parallelism stems from pipeline stalls and flushes due to branches. Normally,
whether a conditional branch will be taken isn't known until late in the
pipeline as conditional branches depend on results coming from a register.
From the time that the processor's instruction decoder has figured out that it
has encountered a conditional branch instruction to the time that the deciding
register value can be read out, the pipeline needs to be stalled for several
cycles, or if it's not and the branch is taken, the pipeline needs to be flushed.
As clock speeds increase the depth of the pipeline increases with it, and
some modern processors may have 20 stages or more. On average, every
fifth instruction executed is a branch, so without any intervention, that's a
high amount of stalling.
Techniques such as branch prediction and speculative execution are used to
lessen these branch penalties. Branch prediction is where the hardware
makes educated guesses on whether a particular branch will be taken. In
reality one side or the other of the branch will be called much more often
60
than the other. Modern designs have rather complex statistical prediction
systems, which watch the results of past branches to predict the future with
greater accuracy. The guess allows the hardware to prefetch instructions
without waiting for the register read. Speculative execution is a further
enhancement in which the code along the predicted path is not just
prefetched but also executed before it is known whether the branch should
be taken or not. This can yield better performance when the guess is good,
with the risk of a huge penalty when the guess is bad because instructions
need to be undone.
Superscalar
Even with all of the added complexity and gates needed to support the
concepts outlined above, improvements in semiconductor manufacturing
soon allowed even more logic gates to be used.
In the outline above the processor processes parts of a single instruction at a
time. Computer programs could be executed faster if multiple instructions
were processed simultaneously. This is what superscalar processors achieve,
by replicating functional units such as ALUs. The replication of functional
units was only made possible when the die area of a single-issue processor
no longer stretched the limits of what could be reliably manufactured. By the
late 1980s, superscalar designs started to enter the market place.
In modern designs it is common to find two load units, one store (many
instructions have no results to store), two or more integer math units, two or
more floating point units, and often a SIMD unit of some sort. The
instruction issue logic grows in complexity by reading in a huge list of
instructions from memory and handing them off to the different execution
units that are idle at that point. The results are then collected and re-ordered
at the end.
Out-of-order execution
The addition of caches reduces the frequency or duration of stalls due to
waiting for data to be fetched from the memory hierarchy, but does not get
rid of these stalls entirely. In early designs a cache miss would force the
cache controller to stall the processor and wait. Of course there may be some
other instruction in the program whose data is available in the cache at that
61
Register renaming
Register renaming refers to a technique used to avoid unnecessary serialized
execution of program instructions because of the reuse of the same registers
by those instructions. Suppose we have two groups of instruction that will
use the same register. One set of instructions is executed first to leave the
register to the other set, but if the other set is assigned to a different similar
register, both sets of instructions can be executed in parallel (or) in series.
63
64
References:
[1]. http://www.intel.com : Photos and Process details.
[2]. http://www.wikipedia.com : Definitions.
65