Device Technology Innovation For Exascale Computing

1-3 (Invited)
Device Technology Innovation for Exascale Computing

Tze-chiang (T.C.) Chen
IBM Fellow, VP of Science and Technology

IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
chentz@us.ibm.com
Abstract
For the past 40 years, the scaling of CMOS device technology has
enabled system performance to double every two years. However,
emerging classes of applications for which network-speed
processing and data-intensive modeling are integral components will
demand a much faster rate of improvement, such as 2x/year in order
to reach exaflop capabilities (100x-1000x over present systems) by
the end of the next decade. These applications represent a significant
growth opportunity and require continued innovation in silicon
device scaling to increase intrinsic transistor performance/power and
density. In addition, new system architectures will take advantage of
3D chip technology to enable a higher level of hybrid integration,
new memory technology such as Phase Change Memory (PCM) will
allow implementation of a new level of memory architecture, and
silicon photonics on the processor will meet ultra-low power, low
cost and high density communications needs. These and other
innovations will lead to significant improvement in systems
integration, performance, and power efficiency.
Introduction
The continuous and systematic increase in transistor density and
performance, as described in Moores Law [1] and guided by
CMOS scaling theory [2], has been remarkably successful in the last
four decades. The IT industry enjoyed system-level performance
increases of 2X every two years from the scaling of devices.
Innovations in the areas of channel mobility increase from stress and
high-k/metal gate (HKMG) continue to play a vital role in device
scaling. However, as technology is approaching the 22nm node and
beyond, silicon scaling is slowing down due to several fundamental
limitations in the transistor structure. Despite the slowdown in
device performance enhancement, system-level performance
demand is continuing to increase and is expected to accelerate to 4X
every two years (Figure 1). In order to deliver this increased
demand of system-level performance, new architectures will emerge
solution-optimized systems, which are an aggregation of compute
intensive, network optimized, mainstream commercial, applicationspecific appliances, and storage sub-systems.
These new
architectures that will lead to exascale computing will be enabled by
new technology innovations. Examples of these new technologies
and the architectural issues that they provide solutions to include:
Phase Change Memory (PCM) which addresses issues with memory
capacity and power consumption; Three Dimensional (3D)
integration which addresses issues with memory bandwidth, latency
and capacity, and modular/heterogeneous integration; and Optical
Interconnects (OI) which addresses issues with communication data
rates and power. A combination of innovations in device scaling
and newer technology elements necessitated by system architecture
changes will help accelerate system-level performance enhancement.
Silicon CMOS Scaling: Continued Device Improvement
The history of CMOS development is marked by a series of
challenges which have been overcome by ingenuity and hard work.
Scaling of silicon transistor technology progressed relentlessly until
the 90nm node, as governed by the scaling rules proposed by
978-4-86348-009-4
Dennard et al., four decades ago. The basic concept of scaling is to

reduce the physical dimensions as well as the power supply voltage,
Vdd, by a scaling factor. Miniaturization of devices following these
scaling rules continued for many generations while increasing
intrinsic performance and reducing power per circuit. In doing so,
scaling of the transistor gate dielectric and the channel length was at
the heart of this process, enabled by continual advances in
semiconductor processing technology. However, as the gate
dielectric approaches the atomistic and quantum-mechanical limit of
one nanometer in physical thickness, the inability to further reduce
gate insulator thickness has prevented channel length scaling,
leading to a crisis in the control of common device phenomena such
as static leakage and short channel effect. Inevitably, the continued
growth of active and static power arose from the inability to scale,
causing chip power and power density to become a major challenge.
Mobility enhancement techniques such as stress
engineering have been attractive for performance enhancements
beyond those derived from device scaling alone. The electron and
hole mobility in silicon can be modulated by strain in the channel.
The stress can be compressive or tensile, and it can be biaxial or
uniaxial, either parallel or perpendicular to the direction of current
flow. This allows for many combinations, several of which have
been realized since 90nm technology to improve device performance.
Dual stress liner (DSL), with compressive stress for p-FET and
tensile stress for n-FET, has been implemented in IBMs 90nm
technology node [3] as shown in Figure 2. Local strain can also be
applied to the channel through the Stress Memorization Technique
(SMT) [4]. Epitaxially-grown strained SiGe (e-SiGe) can be
embedded in the p-FET S/D regions to induce uniaxial compressive
strain in the channel. This structure can produce significant p-FET
hole mobility improvement [5]. Most of the stress elements,
however, are geometry dependent and do not scale well with
shrinking dimension [6]. In addition, the geometry dependence of
the stress effect necessitates accurate modeling to capture layout
dependent stress effects on device performance. Though challenging
to scale and complex to model, Si channel stress modulation to
increase transistor performance has found a permanent space in
silicon technology, and any future change in device structure is
expected to incorporate channel stress.
Though semiconductor technology was able to achieve
device performance without gate length scaling for a few
generations, a re-start of gate length scaling began with the
introduction of high-k/metal gate technology starting at 45nm. The
key challenge was to achieve the desired electrical properties while
maintaining compatibility with standard CMOS fabrication
processes. Serious challenges in materials and engineering were
overcome by innovations ranging from the choice of interface layer,
high-k film composition/deposition techniques, metal gate material
composition/deposition technique, work-function setting materials
and integration in conventional semiconductor process while
meeting or exceeding CMOS technology requirements. Figure 3
depicts a HRTEM cross-section and EELS analysis of the n-FET
band-edge HKMG transistor showing that La-containing films have
intermixed with the SiOx/HfO2 dielectrics to provide the dipoleinduced shift while maintaining structural integrity. The addition of
2009 Symposium on VLSI Technology Digest of Technical Papers
these dipole shifting layers did not significantly affect the other key
parameters of the devices and this was part of the final piece of the
puzzle that allowed CMOS devices to simultaneously exhibit
acceptable mobility, Tinv, reliability, and threshold voltage [7].
Figure 4 shows a cross section of fully processed HKMG Transistor.
The AC performance benefit (Figure 5) using HKMG is realized
with a 40% RO delay improvement (at a fixed leakage) over
conventional 45nm SiON/Poly-Si process [8], which is a direct
reflection of increased drive current due to Tinv scaling and reduced
capacitance penalty due to Lgate scaling. Next generation HKMG
processes showing further reduction in Tinv targeted for 22nm are
also being reported [9].
As gate length scaling continues to dominate the
requirements for newer technology generations, it may become very
challenging for the high-k/metal gate process alone to provide
sufficient short-channel control for the 15nm technology node and
beyond. Device structures such as FinFET shown in Figure 6 and
ultra-thin-body SOI provide improved device electrostatics and offer
additional scaling of gate length [10]. A significantly improved
DIBL characteristic can also be achieved in a FinFET device making
it superior to a planar device for the same high-k/metal gate process.
The three dimensional nature of the fin changes the behavior of
typical device parasitics Rext (external resistance) and Cgd (gate to
drain capacitance). Careful engineering is required to minimize this
external resistance and stray capacitance. These alternate device
structures are strong contenders for replacing conventional planar
device structures by the 15nm technology node.
A nanowire FET with a wrapped-around gate conductor
provides the ultimate control over the channel. The coaxial-like
geometry also opens a path to a less aggressive scaling of the gate
dielectric due to the logarithmic dependency of the gate capacitance
on the gate dielectric thickness. Strain is easily coupled into the
channel since the nanowire diameter is of comparable dimension to
the gate dielectric and gate conductor. Both longitudinal and radial
strain need to be considered. The mobility enhancement for n-FETs
and mobility reduction for p-FETs has been demonstrated for
nanowire channels with a constant height and a variable width [11].
Reducing variability in these devices is a key challenge in making
nanowire FETs a viable technology. Figure 7 shows a TEM crosssection parallel to the FET gate line in nanowires. Although
nanowire sidewalls comprise several crystallographic planes, the
measured Id-Vg characteristics exhibit a low density of interface
states with subthreshold slopes ranging between 63-75 mV/dec. The
carrier transport as a function of the channel diameter has also been
investigated for nanowires fabricated with bottom-up and top-down
methods [12,13].
Carbon Electronics: Extending beyond Silicon
Graphene is a two-dimensional material comprising a monolayer of
carbon atoms arranged in a honeycomb lattice as shown in Figure
8(a). Since graphene was discovered in 2005 [14,15], its exceptional
electronic properties have attracted much attention. High carrier
mobility up to 200,000 cm2/Vs has been experimentally
demonstrated in suspended graphene [16,17]. Owing to its 2D nature,
ultimate electrostatic short channel control can be expected in
graphene. Graphene has symmetrical energy dispersion for electrons
and holes, so equivalent p-type and n-type FET behaviors are
expected for CMOS type applications. Experiments have already
demonstrated the ability to take advantage of existing lithographical
and patterning techniques to fabricate graphene as shown in Figure
8(b). Perhaps the most interesting and unique point about graphene
is the possibility to have different band gaps on the same graphene
sheet. [18,19].
Graphene also provides an opportunity to explore a

previously inaccessible electronic regime the quantum capacitance
limit. For systems like graphene, the total capacitance will be
determined by a small intrinsic capacitance of the material itself the quantum capacitance. Ultimately scaled devices will unavoidably
work in this regime, once the gate oxide capacitance becomes larger
than this intrinsic capacitance. It is believed that operating in the
quantum capacitance limit may actually result in reduced power
consumption. Graphene devices could thus not only operate as fast
switches but could improve the power delay product at the same
time.
Phase Change Memory: Enabling New Memory Hierarchy
One big problem of exascale computing is the movement of data
among possibly hundreds of millions of processor cores running
massively parallel algorithms. To remove this impediment to
exascale computing, significant improvements in memory device
technologies and memory architectures are necessary.
One
emerging memory technology that warrants much attention is Phase
Change Memory (PCM) [22-25]. PCM is a proven technology used
as the storage media in ubiquitous optical data storage devices.
Figure 9 depicts the exceptional properties of PCM technology as an
electronic memory. PCM stores data as a resistance variable of the
phase change memory element. PCM has been demonstrated to
scale far beyond 22nm [26]. The large dynamic range in the
resistivity of phase change materials positions PCM to provide the
necessary bandwidth and memory capacity to complement main
memory as well as storage class memory.
PCM cells can be configured with planar MOSFET [27],
PNP bipolar transistor [28], PN diode [24], vertical MOSFET or
stackable 3D diode as the select device shown in Fig. 10. Memory
cell areas range from about 25F2 with planar NMOSFET to 4F2 with
stackable 3D diode. Memory bandwidth increases by way of
multiple bit memory cells [29] to multiply memory capacity without
the usual expense of power and space. Only one physical memory
cell is accessed to read or store 4 bits thus reducing the power in
memory access compared to conventional main memory. Progress
in stackable diode PCM [30] will facilitate the integration of
extremely dense monolithic 3-dimensional memories for both
embedded main memory and stand-alone storage class memory
applications. New memory system architectures will have to be
designed to take advantage of the extra density and enhanced
performance.
Silicon Photonics
Scaling computing systems to exaflops will require a tremendous
increase in communications bandwidth but with greatly reduced
power consumption per communicated bit as compared to todays
petaflop machines. A new class of photonic devices is made of
silicon side-by-side with silicon transistors using the same tools and
processes as CMOS [31]. Substantial progress has been made
recently in demonstrating such ultra-compact micron-scale silicon
nanophotonic devices as wavelength-division multiplexers [32,33],
low-power modulators [34], temperature-insensitive low-latency
switches [35] and photodetectors [36]. These silicon nanophotonic
devices have a strong potential to increase tremendously the IO
capacity for inter- and intra-chip connectivity up to terabit/sec data
rates as well as to significantly lower the overall IO power
consumption below 1mW per gigabit/sec.
Figure 11 illustrates a concept of a 3D integrated [37]
microprocessor chip [31] with a lower layer having hundreds of
processing cores, an intermediate memory layer (or multiple
memory layers) and an optical interconnect layer on top of the 3D
stack. The optical layer consists of 1000s of front-end CMOS
nanophotonic detectors, modulators, wavelength filters and optical

switches as well as corresponding analogue and digital CMOS
circuits. The function of this photonic layer is to provide very highbandwidth and ultra-low power conversion from electrical to optical
signaling, transporting data in the form of optical pulses across the
whole chip as well as off the chip, and converting optical signals
back to electrical to deliver messages between chips or between the
distant cores on the same chip.
3-D silicon: TSV-Stacking for High Density Integration
As the advantages of traditional 2D scaling slow, three dimensional
(3D) integration offers a new means by which system-level
performance can continue to increase. The advantages of 3D
integration are particularly important as microprocessors move from
single-core to multi-core architectures. For multi-core architectures,
3D integration can increase the total bandwidth available to cache
memory [38], allow integrated voltage regulation [39], and reduce
overall power consumption by reducing interconnect capacitance
[40]. 3D integration will also be important in future DRAM
applications by increasing memory capacity within a specified form
factor. Finally, 3D offers tremendous potential for new applications
by heterogeneous integration of differing materials and technologies.
3D integration typically covers a broad definition from
silicon carriers [41] to dense wafer-level 3D integrated circuits [42].
Die-on-die and die-on-wafer processing will likely lead early
adoption (Figure 12), as through-silicon-via (TSV) technology
advances from tens of microns to micron diameters, with TSV pitch
and silicon thickness likely under 50 microns. Memory devices
such as high performance DRAM and SRAM will likely evolve to
die stacks with TSV in leading server and multimedia applications
due to increasing device performance requirements. As 3D
technologies become more established, wafer-level 3D processes
[43,44,45] (Figure 13) will become more widespread, particularly
for high-volume applications, in order to increase throughput and
decrease cost.
Conclusion
Emerging classes of applications demand doubling of system level
performance improvement compared to historical rates. Innovations
in device technology are required as silicon scaling is reaching
fundamental limits. The new system architecture to double
performance improvements will drive innovations in newer
technology areas such as 3D integration, new memory technology
such as Phase Change Memory (PCM) and optics on the processor
to meet ultra-low power, low cost and high density communications
needs. Innovations in areas emerging out of system architecture
changes will help break the historical barrier of 2x/2years and enable
significantly improved system performance enhancements
(4x/2years) for exascale computing.
Acknowledgements
The author acknowledges contributions from members of the
Science and Technology Department at IBM Research, in particular
Mukesh Khare, James Stathis, Vijay Narayanan, Timothy Dalton,
Wilfried Haensch, Chung Lam, Supratik Guha, Guy Cohen, Yurij
Vlasov, Steve Koester, and John Knickerbocker, for their valuable
discussions and suggestions.
10
978-4-86348-009-4
References:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
G. E. Moore, Electronics, 38, p. 114 (1965); IEDM Tech Dig. (1975).

R.H. Dennard et al., IEEE J. Solid-State Circuits, SC-9, p.256 (1974).
H. S. Yang, et al., IEDM Tech. Digest, p. 1075 (2004).
C.-H. Chen et al. , Symp. VLSI Tech. Dig., p. 56 (2004).
S. Thompson, et al., IEDM Tech. Digest, p. 61 (2002).
J. Sleight, et al., IEDM Tech. Digest, p. 697 (2006).
V. Narayanan, et al., Symp. VLSI Tech. Dig., p. 224 (2008).
D.G. Park, et al., in press, VLSI TSA (2009).
K. Choi, et al., in press, Symp. VLSI Tech., Kyoto, Japan (2009).
W. Haensch, et al., IBM J. Res. & Dev. 50, p. 339 (2006).
O. Gunawan, et al., Nano Lett., 8, p. 1566 (2008).
G. M. Cohen, et al., Device Res. Conf., UC SB, 2008, p.187.
S.D. Suk, IEDM Tech. Digest., p. 891 (2007).
K.S. Novoselov, et.al., Nature, Vol. 438, p.197 (2005).
Y. Zhang, Nature, Vol. 438, p. 201 (2005).
X. Du, Nature Nanotechnology, Vol. 3, p.491 (2008).
K.I. Bolotin, et.al., Solid State Comm., Vol. 146, p.351 (2008).
Z. Chen, et al, Physica E, Vol. 40, p.228 (2007).
M.Y. Han, Phy. Rev. Lett. Vol. 98, 206805 (2007).
Z. Chen, st al., IEDM Tech Dig., p.509 (2008).
J. Knoch, et al., IEEE EDL-29, p.372 (2008).
S.R. Ovshinsky, Phys. Rev. Lett. 21, p.1450 (1968).
S. Lai, IEDM Tech. Dig. 10.1 (2003).
J. H. Oh, et al. IEDM Tech. Dig. 2.6 (2006).
S. Raoux, et al., IBM J. Res. & Dev. 52, p.465 (2008).
Y. C. Chen, et al., IEDM Tech. Dig. 30.3 (2006).
M. Breitwisch, et al., Symp. VLSI Tech. Dig. p.100 (2007).
B. Rajendran, et al., in press, VLSI-TSA,s (2009).
T. Nirschl, et al., IEDM Tech. Dig. 17.5 (2007).
K. Gopalakrishnan, US Patent 7382647.
http://www.research.ibm.com/photonics.
F. Xia, et al., Optics Express, vol. 15, p. 11934, (2007).
S.Assefa et al, paper OMR4, Optical Fiber Communications
Conference, San Diego, March 2009.
W. M. J. Green, et al., Optics Express, vol. 15, p. 17106, (2007).
Y. Vlasov, et al., Nature Photonics, vol. 2, p. 242, (2008).
S.Assefa, et al., paper 1384, Optical Fiber Communications
Conference, 2009.
Special issue of the IBM J. Res. & Dev., V.52, No. 6, (2008).
P.G. Emma, et al., IBM. J Res & Dev 52, No 6, p. 541 (2008).
M. Shapiro, 5th Int. Conf. On 3-D Architectures for Semiconductor
Integration and Packaging, San Francisco, CA, Nov. 17-19, 2008.
J. D. Meindl, et al., IBM J. Res. & Dev., 46, p. 245 (2002).
J. U. Knickerbocker, et al., IBM J. Res. & Dev., 52, p. 553 (2008).
S. J. Koester, et al., IBM J. Res. & Dev. 52, p. 583 (2008).
J. A. Burns, et al, IEEE Trans. Elect. Dev. 53, p. 2507 (2006).
F. Liu, et al., IEDM Tech. Digest, p. 599 (2008).
R. R. Yu, et al., in press, 2009 Symp. on VLSI Tech Dig., (2009).
4x / 2 years
1000x
System Improvement
(consolidation,
integration, performance,
power efficiency, cost /
performance)
High-Performance
Enterprise Systems
30x
Historic trend:
2x / 2 years
Now
+5 yrs
+10 yrs
Fig. 1 Acceleration in system performance level.
T iN
30000
H fO
La
Ti
N
L a /S i O
O
Counts
T iN
H fO x
S iO x
20000
10000
Fig. 2 Stress Engineering applied in

90nm: Dual Stress Liner (DSL) [3].
10
P o s itio n ( n m )
15
Fig. 3 HRTEM and EELS of an n-FET band-edge gate stack process [7]. Fig. 4 TEM of a fully processed
HKMG Transistor.
HfSiON
nanowire
SiO2
Contact
Silicide
poly-Si
gate
Final Spacer
Gate
T iN g a te
Al
S/D
n a n o w ir e
BOX
Original Fin
under gate
S iO 2
Epi Si
o x id e
Fig. 5 AC Performance improvement Fig. 6 TEM of a FinFET

device.
by the use of HKMG [8].
Fig. 7 TEM of nanowire FET.
Fig. 9 Properties of phase change materials and Phase Change Memory.
Fig. 8 Graphene
A: Two dimensional honeycomb
B: Lithography patterned nano-ribbons.
Fig. 10 Configurations of PCM with select

devices and array architectures.
W TSV
Cooling
Substrate
Decoupling Capacitors
25 Pm
Die Stack
TSV
Top wafer
Vertical
Interconnection
Si Pkg or Pkg
Substrate or PWB
Fig. 11 A concept of a 3D-integrated

microprocessor chip with multicore processing
layer, memory layer and photonic network layer.
TSV
TSV
20 Pm
Bottom wafer
Fig. 12 Schematic cross section for A: 3D Fig. 13 3DI process on 300mm wafers [44,45].
silicon package and B: 3D die stacks.
11

Device Technology Innovation For Exascale Computing

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Device Technology Innovation For Exascale Computing

Încărcat de

Drepturi de autor:

Formate disponibile

1-3 (Invited)

Device Technology Innovation for Exascale Computing

IBM Fellow, VP of Science and Technology

Dennard et al., four decades ago. The basic concept of scaling is to

2009 Symposium on VLSI Technology Digest of Technical Papers

Graphene also provides an opportunity to explore a

2009 Symposium on VLSI Technology Digest of Technical Papers

nanophotonic detectors, modulators, wavelength filters and optical

G. E. Moore, Electronics, 38, p. 114 (1965); IEDM Tech Dig. (1975).

Fig. 1 Acceleration in system performance level.

2009 Symposium on VLSI Technology Digest of Technical Papers

Fig. 2 Stress Engineering applied in

Fig. 5 AC Performance improvement Fig. 6 TEM of a FinFET

Fig. 7 TEM of nanowire FET.

Fig. 9 Properties of phase change materials and Phase Change Memory.

Fig. 10 Configurations of PCM with select

Fig. 11 A concept of a 3D-integrated

2009 Symposium on VLSI Technology Digest of Technical Papers

S-ar putea să vă placă și