Sunteți pe pagina 1din 0

I nt e

I nt e
l
l

I t ani um
I t ani um

Ar c hi t ec t ur e
Ar c hi t ec t ur e
28-J an-2003
Herbert Cornelius
Technical Marketing Manager
Intel EMEA, Munich
herbert.cornelius@intel.com
2
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Useful URLs
IntelItanium2 Processor:
- www.intel.com/ products/ server/ processors/ server/ itanium2/ index.htm
IntelSoftware Products:
- www.intel.com/ products/ software/
IntelDeveloper Services:
- www.intel.com/ ids/
IntelTechnology J ournal:
- www.intel.com/ technology/ itj/ index.htm
High-Performance Computing:
- www.intel.com/ ebusiness/ trends/ hpc.htm
3
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Agenda
IntelI taniumArchitecture
IntelI taniumProcessor
IntelI tanium2 Processor
Platforms
Software Tools
Some Tuning Tips
4
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
5
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
6
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Extending Intel

Architecture
All dates specified are target dates provided for planning purposes only and are subject to change. ( **Codename)
P
e
r
f
o
r
m
a
n
c
e
,

s
c
a
l
a
b
i
l
i
t
y
,

m
i
s
s
i
o
n

c
r
i
t
i
c
a
l

Madison**
(Perf)
Madison** Madison**
((Perf Perf))
Deerfield**
(Price/ Perf)
Deerfield** Deerfield**
(Price/ (Price/ Perf Perf))
02
00
01
.

.

.

.
.

.

.

.
.

.

.

.
.

.

.

.
Outstanding
Performance for
Volume Applications
Extends IA for the Most
Demanding Applications
(I A (I A- -32) 32)
03
Gallatin**
Gallatin** Gallatin**
7
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Itanium

Processor
First Implementation of the
IntelItaniumArchitecture
using innovative EPIC** Technology
**Explicit Parallel Instruction Computing
8
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Itanium

2 Processor
Second Generation of the
IntelItaniumArchitecture
using an enhanced Micro-Architecture
9
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Product Features
400 MHz, 128-bit wide
6.4 GB/s bandwidth
System Bus
IntelE8870 chipset
OEM custom chipsets
Chipset
Based on EPIC architecture
Enhanced Machine Check Architecture (MCA)
with extensive Error Correcting Code (ECC)
Operating system support: HP-UX*, Linux*,
Windows*
Features
Level 3: integrated 3 MB or 1.5 MB
Level 2: 256 KB
Level 1: 32 KB
Cache
1GHz
900MHz
Available Speeds
Desc r i pt i on Feat ur e
10
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Block Diagram
Schematic overview
11
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium

2 Systems
High-end I tanium 2-based systems
>2X more than I tanium !
Racksaver
DP/1U
1H 2003
Intel
4P/4U
2P/ 2U
Q4 2002/ Q2 2003
Unisys
16P
Q4 2002
NEC
32P
Shipping
SGI
64/512P
Early 2003
IBM
4P/8P/16P
Early 2003
HP
DP/2U
Shipping
HP 2P WS
Shipping
12
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I nitial I tanium2 Application Areas
Enterprise solutions deployed on I tanium2
based systems focus on the following:
Applications for Business Intelligence
Mechanical ComputerAided Engineering (MCAE)
Electronic Design Automation (EDA)
Computeintensive custom applications
Enterprise Resource Planning (ERP)
Supply Chain Management (SCM)
High Performance Computing (HPC)
Large databases
Security transactions
13
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I taniumApplication Areas
Large Memory Needs
(>4GB direct memory access)
Large SMP Systems
Complex high-end F.P. Apps
64-bit Integer Applications
Customized Applications
Vector and Parallel Applications
Enterprise Unix* Needs
14
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium2 Processor
Micro-Architecture Enhancements
Itanium2 processor builds on Itanium
processor features
Increased Clock Frequency
Shorter Pipeline
Expanded Functional Units
Faster Floating Point
Improved Cache
Greater addressability
Enhanced TLB and ALAT
Improved System Bus
Long Branch Instruction
Enhanced Thermal Management
15
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
308,620 tpmC at $14.96/tpmC 32-way server TPC transactions
13940 MFLOPS Linpack-10K (4-way system)
Per f or manc e Number Benc hmar k
40,621 tpmC at $5.72/tpmC 2-way server TPC-C transactions
101770 MFLOPS Linpack-HPC (32-way system)
1520 simultaneous connections SPECweb99*_SSL
80,495 tpmC at $4.83/tpmC 4-way server TPC-C transactions
600 SD users SAP 2-tier SD 4-way server
3534 MFLOPS Linpack-1000 (single processor)
3700 MB/s Stream TRIAD
1356 SPECfp*_base2000
810 SPECint*_base2000
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of
Intelproducts as measured by those tests. Any difference in system hardware or software design or configuration may affect actual
performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering
purchasing. For more information on performance tests and on theperformance of Intel products, reference
http://www.intel.com/procs/perf/limits.htmor call (U.S.) 1-800-628-8686 or 1-916-356-3104.
Performance Data
16
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium

2 Processor
Record setting Performance
1 Source: Itanium 2 processor results measured on
HP Server rx5670 using 4 Itanium 2 processors
1GHz with integrated 3MB L3 cache, 24GB of
memory, 528GB disk space, HP-UX 11.23, SAP rev
4.6D, Oracle 9i V.2
2 Source www.tpc.org: Itanium 2 processor
measurements done on a HP Server rx5670 using 4
Itanium 2 processors 1GHz with integrated 3MB L3
cache, 48GB memory, HP-UX 11.23, Oracli 9iV.2, at
$4.83 per tpmC
3 Source: Itanium 2 processor
measurements done on a NEC Server
TX7/i9510 using 32 Itanium 2 processors
1GHz with integrated 3MB L3 cache, 128GB
memory, Linux OS.
5 Source: Itanium 2 processor measurements
done on a SGI Scalable Linux System using 64
Itanium 2 processors, 128GB memory, Linux
OS.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any
difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or
components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/procs/perf/limits.htm or call (U.S.) 1-800-
628-8686 or 1-916-356-3104
BENCHMARK
SCALE
RESULT
SAP (2 Tier) SAP (2 Tier)
11
Sales and Sales and
Distribution Distribution
600
600
USERS USERS
WORLD
RECORD
WORLD WORLD
RECORD RECORD
4 4- -way way
TPC TPC--CC
22
Transaction Transaction
Processing Processing
80.4K
80.4K
tpmC tpmC
WORLD
RECORD
WORLD WORLD
RECORD RECORD
4 4- -way way
Linpack
3
High
Performance
Computing
101
GFLOPS
WORLD
RECORD
WORLD WORLD
RECORD RECORD
32-way
TPC TPC--CC
44
Transaction Transaction
Processing Processing
308K
308K
tpmC tpmC
IA SMP
RECORD
IA SMP IA SMP
RECORD RECORD
32 32- -way way
Stream
5
Platform
Bandwidth
120
GB/sec
WORLD
RECORD
WORLD WORLD
RECORD RECORD
64-way
4 Source: Itanium 2 processor measurements done on a
NEC TX7/i9510 Server using 32 Itanium 2 processors with
integrated 3MB L3 cache, 256GB memory, Windows .NET
Server 2003, Datacenter Edition, Microsoft SQL Server 2000
Enterprise Edition (64-bit) beta version, Availability date
12/31/02.
17
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

I tanium

Processor Family
800MHz
4MB L3-Cache
460GX Chip-set
OEM Chip-sets
180nm
1GHz
3MB iL3-Cache
E8870 Chip-set
OEM Chip-sets
180nm
1.5GHz
6MB iL3-Cache
E8870 Chip-set
OEM Chip-sets
130nm
>1.5GHz
larger L3-Cache
Enhanced Dual-Core
E8870 Chip-set
OEM Chip-sets
90nm
Madison**
Montecito**
**codename
2001 2002 2003 2005
All dates specified are target dates, are provided for planning purposes only and are subject to change
common platform
Enhanced Core
2004
>1.5GHz
9MB iL3-Cache
E8870 Chip-set
OEM Chip-sets
130nm
18
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
A new Architecture for
Business Computing
RISC
Technology
CISC
Technology
New Architectural features
EPIC
Predication
Speculation
Enhanced floating point
performance
Massive Resources
64-bit instruction set, registers
& addressing
Enhanced
reliability
features
IA-32
Enterprise class
OS
19
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
64-Bit
Is it new ? Is it good or bad ?
I A-32 already has 64-bit and more
- 64-bit buses
- 64-bit F.P. with 80-bit registers
- 64-bit Integer
- 64/128-bit MMX/XMM registers
- but only 32-bit address registers
Itaniumhas 64-bit address HW
- It is one of many features
How fast and how many data can you transfer/ store
- 32-bit data items
- 64-bit data items
20
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
64-Bit Addressing
32-bit Addressing
- 1 cm
- one CD cover height
64-bit Addressing
- 429496 km
- distance between
Earth and Moon
32-bit .
64-bit
21
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
ItaniumProcessor Architecture
Selected Features
64-bit Addressing Flat Memory Model
I nstruction Level Parallelism (6-way)
Large Register Files
Automatic Register Stack Engine
Predication
Software Pipelining Support
Register Rotation
Loop Control Hardware
Sophisticated Branch Architecture
Control & Data Speculation
Powerful 64-bit I nteger Architecture
Advanced 82-bit Floating Point Architecture
Multimedia Support (MMXTechnology)
22
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
User Benefits
More Capacity and Capability
l Big in-memory data structures and DB
l Large file system and data files
l Efficient large integer calculations
l Fast 64-bit F.P. calculations
l Fast Security processing
l More and faster transactions
l More services
l Higher throughput
l I mproved availability and manageability
23
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Broad Industry Investment
~20 OEMs worldwide shipping Itanium

based systems today with 2Xgrowth in


high-end systems (8-32P+) expected with
Itanium 2 processor
7 operating system versions available today
from Windows* to HP-UX* and Linux, with
more versions coming in 03/04
More than 100 applications/tools available
today with 100s more in development for
high-end enterprise and technical computing
(Founder) (Founder)
(Langchao) (Langchao)
OEMs
OpenVMS OpenVMS, ,
NNonStop onStop Kernel Kernel
OSVs
ISVs
I tanium Architecture has established broad industry
investment providing solution choice to high-end computing
24
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Linux* Supercomputer
1,400 next-generation
IntelItaniumFamily
Processors that are code-
named McKinley and
Madison, the new HP
supercomputer will have an
expected total peak
performance of more than
8.3 teraflops.
April 16, 2002
http://www.pnl.gov/news/2002/computer.htm
25
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Performance Scaling
Itanium2 running I taniumprocessor binaries
L
i
n
p
a
c
k
1
0
0
0
S
e
c
u
r
i
t
y
1
L
i
n
p
a
c
k
1
0
0
0
0
-
4
P
S
p
e
c
I
n
t
2
0
0
0
S
p
e
c
F
p
2
0
0
0
C
A
E
E
R
P
S
e
c
u
r
i
t
y
2
S
p
e
c
J
B
B
2
0
0
0
I
M
D
B
Performance tests and ratings are measured using specific comput er systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system
hardware or software design or configuration may affect actual p erformance. Buyers should consult other sources of information t o evaluate the performance of systems or components they are considering
purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/procs/perf/limits.htmor call (U.S.) 1-800-628-8686 or 1-916-356-3104
G
A
M
E
S
S
Performance Scaling %
Itanium 800MHz/4MB to Itanium 2 1GHz/3MB
Itanium2 delivers an average of
1.5-2X performance improvement
Source: Intel Labs
1.00
1.25
1.50
1.75
2.00
2.25
26
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I taniumProcessor Family Value Proposition
IntelItanium2 Processor / IntelE8870 Platform Advancements
Performance Performance
Scalability Scalability
Availability Availability
Investment Investment
Protection Protection
Choice Choice
l E8870 chipset scalability port for 8P+ systems
l Cache line size increased to 128 from 64
l Support for larger page size (4 GB), addressing (1024 TB)
l Hot Plug Processor Boards, Memory, I / O
l Fail-over redundancy
l Extensive error detection, correction and logging
l Major OEMs worldwide shipping I tanium-based systems
l Support from broad list of leading OSVs
l S/ W application and platform reach expands over time
l Platform compatible w/ future I tanium processors
l Compatible with I tanium-based OS/ software
l Common set of S/ W tools for I tanium processor family
l Up to ~1.5-2X performance increase over I taniumproc.
l 3X increase in FSB bandwidth
l 2X improvement in cache latencies
27
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelItaniumArchitecture
28
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Fundamental Architecture Challenges
Sequentiality inherent in traditional architectures
Complex hardware needed to (re)extract ILP
Limited ILP available within basic blocks
Branches make extracting ILP difficult
Memory dependencies further limit ILP
Increasing latency exacerbates ILP need
Limited resources : A fundamental constraint
Shared resources create more overhead
Loop ILP extraction costs code size
And the challenges continue ...
Itanium

architecture overcomes these


fundamental challenges!
29
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

Architecture
Performance Features
Parallelism- inherent in Itaniums EPIC architecture
Frees up hardware for parallel execution
Predication reduces branches, enhances ILP
Control Speculation breaks branch barrier, enhances ILP
Data Speculation breaks data dependence, increases ILP
Control and Data Specn address memory latency
Itaniumarch has abundant reg & memresources
Stack/RSE reduces call overhead and management
Loop support yields performance w/o overhead
And the performance features continue ...
ItaniumArchitecture : Beyond RISC
30
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I taniumProcessor Block Diagram
(schematic overview)
31
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Instruction 2
41 bits
Instruction 1
41 bits
Instruction 0
41 bits
Template
5 bits
128 bits (bundle)
Basis for increased parallelism
M=Memory
F=Floating-point
I=Integer
L=Long Immed.
B=Branch
(MMI)
Memory (M) Memory (M)
e.g. Integer (I)
I tanium

Architecture:
Explicitly Parallel
Template specifies instruction types
MFI, MMI, MII, MLX, MIB, MMF, MFB, MMB, MBB, BBB
Stops specify group breaks (dependencies)
Intra-bundle (M;;MI or MI;;I) and Inter-bundle stop
Most common template combinations covered
Headroom for additional templates
Simplifies hardware requirements
Scales compatibly to future generations
32
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
EPI C (Explicit Parallel I nstruction Computing)
Source Code
Instruction
Bundles
(3 Instr. each,
128 bit wide)
Instruction Groups
(series of bundles)
Up to 6 instructions executed per clock
Michael S.Schlansker, B.Ramakrishna Rau:
EPI C: Explicit Parallel I nstruction
Computing;
I EEE Computer, February 2000, pp.37-45
Instructions
Compiler
33
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
M F I M F I
Load 4 DP (8 SP) ops
via 2 ld-pair
2 ALU ops (post++)
4 DP FLOPS
(8 SP FLOPS)
2 ALU ops
6 instructions
provides
12 parallel ops/ clock
(SP: 20 parallel ops/ clock)
for digital content creation
& scientific computing
2 Loads +
2 ALU ops (post++)
M I B
M I B
2 ALU ops
1 Branch Hint +
1 Branch instr
6 instructions
provides
8 parallel ops / clock
for enterprise &
Internet applications
I taniumprocessor delivers greater ILP
than any contemporary processor
Breakthrough Parallelism
34
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating-Point:
High performance and High precision
Floating-Point Architecture
Fused Multiply Add Operation
An efficient core computation unit
Abundant Register resources
128 registers (32 static, 96 rotating)
High Precision Data computations
82-bit unified internal format for all data types
Software divide/square-root
High throughput achieved via pipelining
35
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating Point Features
l Native 82-bit hardware provides support for multiple numeric models
l 2 Extended precision pipelined FMACs deliver 4 EP / DP FLOPs/cycle
l Performance for security, efficient use of hardware: Integer mul-add, s/w divide
l Balanced with plenty of operand bandwidth from registers / memory
6 x 82-bit operands
L2 L2
Cache Cache
128 entry 128 entry
82 82- -bit bit
RF RF
2 x 82-bit results
4Mbyte 4Mbyte
L3 L3
Cache Cache
2 stores/clk
2 DP
Ops/clk
4 DP
Ops/clk
(2 x Fld-pair)
odd
even
36
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Parallel, deep, and dynamic pipeline
designed for maximum throughput
Itanium

Processor Pipeline
6-Wide EPIC hardware under compiler control
Parallel hardware and control for predication & speculation
Efficient mechanism for enabling register stacking & rotation
Software-enhanced branch prediction
10-stage in-order pipeline designed for:
Single cycle ALU (4 ALUs globally bypassed)
Low latency from data cache
Dynamic support for run-time optimization
Decoupled front end with prefetch to hide fetch latency
Aggressive branch prediction to reduce branch penalty
Non-blocking caches, register scoreboard to hide load latency
37
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Predication
Control Flow to Data Flow
Traditional Arch.
then
else
br
cmp
br
cmp p1,p2
p2
p2
p1
p1
Itanium Architcteure
if if
Removes/ Reduces Branches and
Enables Parallel Execution
64 predicate registers
Can be combined with logical ops
38
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Loop support: ILP+++, Overhead---
Software Pipelining Support
High performance loops without
code size overhead
No prologue/epilogue
Register rotation (rrb)
Predication
Loop control registers (LC, EC)
Loop branches (br.ctop,br.wtop)
Especially valuable for integer loops
with small trip counts
Whole loop computation in parallel
39
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Software Pipelining (cont.)
Traditional architectures use loop unrolling
Results in code expansion and increased cache misses
Itanium-Processor Software Pipelining uses rotating
registers
Allows overlapping execution of multiple loop instances
Predication controls the pipeline stages
Sequential Loop
T
i
m
e
Software-Pipelined Loop
T
i
m
e
load load
compute compute
store store
40
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Software Pipelining (cont.)
stage 1
stage 2
stage 3
stage 4
Loop Iteration
Special Loop control and branch
registers, also usable for WHI LE-
loops
Predicate registers rotate as well
and define the pipeline stages
41
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Register Rotation
GR32-127 and FR32-127 can rotate (specified range)
Separate rotating register base for each set (GR, FR)
Loop branches decrement all register rotating bases (RRB)
Instructions contain a virtual register number
physical register # = RRB + virtual register #
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7
same
phy.
reg.
Predicate register range also rotates.
diff.
virtual
number
42
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Control & Data Speculation
Control Speculation
moves loads above
branches / calls
Barrier
instr. 2
ld r1=
use = r1 use = r1
branch st[?]
instr. 1
instr. 2
instr. 1
ld r1=
Barrier
Data Speculation moves
loads above possibly
conflicting stores
Speculation reduces the impact
of memory latency
43
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Control Speculation
Control Speculation moves loads above branches
Detected exception indicated using NaT bit / NaTVal
Check raises detected exceptions
Branch barrier broken to minimize memory latency
Barrier
instr. 2
chk.s r1
use = r1 use = r1
ld.s r1=
branch branch
instr. 1
instr. 2
instr. 1
ld r1=
Itanium
Traditional Arch.
Detect exception
Deliver exception
P
r
o
p
a
g
a
t
e

e
x
c
e
p
t
i
o
n
44
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Hoisting Uses
Barrier
instr. 2
chk.s r1
use = r1 use = r1
ld.s r1=
branch branch
instr. 1
instr. 2
instr. 1
ld r1=
Itanium Itanium
Traditional Arch.
use = r1
Recovery code
Speculative
use
ld r1=
branch
All computation instructions propagate NaTs to reduce
number of checks to allow single check on results
Compares also propagates when writing predicates
45
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Data Speculation
Barrier
instr. 2
ld.c r1
use = r1 use = r1
ld.a r1=
st[?] st[?]
instr. 1
instr. 2
instr. 1
ld r1=
Itanium
Traditional Arch.
Data Speculation moves loads above possibly
conflicting stores
- Keeps track of load addresses used in advance (ALAT)
Advanced-loaded data can be used speculatively
46
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Advanced Load Address Table:
ALAT
ld.a inserts entries
Conflicting stores remove entries
also ld.c.clr, chk.a.clr
Presence of entry indicates success
chk.a branches when no entry is found
reg#
reg#
reg#
reg#
:
:
addr
addr
addr
addr
:
:
ld.a reg# =
chk.a reg# ?
st[addr]
47
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Hoisting Uses
Barrier
instr. 2
chk.a r1
use = r1 use = r1
ld.a r1=
st[?] st[?]
instr. 1
instr. 2
instr. 1
ld r1=
Itanium
Traditional Arch.
Data and Control Speculation
can be combined
use = r1
Recovery code
Speculative
use
ld r1=
branch
48
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelItanium2 Processor
Architecture
49
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Itanium

2 Processor
Codename McKinley
Target for 2H2002
Enhanced Itaniumdesign
100% Itaniumbinary compatible
1.0GHz clock-rate
6 Integer units
256KB L2 cache
1.5MB or 3MB iL3 cache
6.4GB/s system bus
1.5-2x Performance increase over
Itaniumbased systems
50
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium2 Optimizations
Improved dynamic properties
Production frequency is 1 GHz
Reduced L1, L2, L3 latencies
L3 cache has been incorporated on die
Improved L2 cache capacity
Improved FSB bandwidth
Lower branch prediction penalties
I tanium2 provides significant speed-ups on
existing I taniumprocessor binaries
51
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Optimizations
Reduced execution paths
More parallelism/resources
More integer, multi-media units and memory ports
Short latencies
Fully bypassed functional units
Very Low L1D/L2/L3 Cache Latencies
Low latency FP execution
Many more ways to issue/execute 6 insts/clk
I tanium2 provides performance headroom for
re-optimized binaries
52
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
System Bus
64 bits wide
133MHz/266 MT/s
2.1 GB/s
Width
2 bundles per clock
4 integer units
2 load or stores per clock
9 issue ports
Caches
L1 2X16KB - 2 clock latency
L2 96K 9 clock latency
L3 - 4MB external 21 clk
12.8 GB/s bandwidth
Addressing
44 bit physical addressing
50 bit virtual addressing
Maximum page size of 256MB
System Bus
Core
800 MHz
L3 Cache
BSB
System Bus
128 bits wide
200MHz/400 MT/s
6.4 GB/s
Width
2 bundles per clock
6 integer units
2 loads and 2 stores per clock
11 issue ports
Caches
L1 2X16KB - 1 clock latency
L2 256K 5 clock latency
L3 - 3MB 12 clk
32 GB/s bandwidth
Addressing
50 bit physical addressing
64 bit virtual addressing
Maximum page size of 4GB
Core
1 GHz
L3 Cache
System Bus
I taniumProcessor I tanium2 Processor
2X
3X
1.5X
2X
53
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium2 Processor Block Diagram
(schematic overview)
54
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Architectural Changes
Beneficial to compilers
Improved data/control speculation support
. ALAT - fully associative = minimize thrashing
. processor directly vectors to recovery code for reduced
processor speculation costs
64-bit Long Branch Instruction
Beneficial to OS and System designs
Full 64-bit virtual addressing
Full 2**24 virtual address spaces
4GB virtual pages = reduced TLB pressure
50-bit Physical addressing = very large memory/IO spaces
More flexibility for compiler, OS and system
designs
55
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

I tanium

Architecture
Processors
Technology 180nm 180nm
Clockrate 800MHz 1GHz
- INT Units 4 6
- MM Units 4 6
- FP Units 2 (*,+) 2 (*,+)
- ADDR Units 2L, 2S or 1L+1S 2L+2S or 4L
L1-Caches (I/ D) 16/ 16KB 16/ 16KB
L2-Cache 96KB 256KB
L3-Cache 4MB extern 3MB on die
System Bus 2.1GB/ s 6.4GB/ s
- Clockrate 266MHz 400MHz
- Width 64 bit 128 bit
Intel Chipset 460GX E8870
56
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
25.6
GB/s
25.6
GB/s
Memory Cache Hierarchy
Itanium 2 Processor (1GHz)
L1D
16KB
64B CL
1 CLK
L1I
16KB
64B CL
1 CLK
L2-Cache
256KB
128B CL
8-way
5-7 CLKS
L3-Cache
1.5/3MB
128B CL
12-way
12-15 CLKS
32
GB/s
6.4 GB/s
Itanium Processor (800MHz)
L1D
16KB
32B CL
2 CLK
L1I
16KB
32B CL
2 CLK
L2-Cache
96KB
64B CL
6-way
6-9 CLKS
2.1 GB/s
Memory
(Controller)
32
GB/s
32
GB/s
12.8
GB/s
L3-Cache
2/4MB
64B CL
4-way
20 CLKS
Memory
(Controller)
210 CLKS
57
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium2 Cache Hierarchy
3 level caching on Itanium2 processor
1st level cache optimized for latency
2nd level cache optimized for bandwidth
3rd level cache optimized for size
58
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Large Register Set
BR7
BR0
Br anc h Regi st er s
63
0
96 Framed, Rotating
GR1
GR31
GR127
GR32
GR0
NaT
32 Static
0
I nt eger Regi st er s
63 0
Pr edi c at e
Regi st er s
PR1
PR63
PR0
PR15
PR16
48 Rotating
16 Static
96 Rotating
FR1
FR31
FR127
FR32
FR0
32 Static
+ 0.0
F.P. Regi st er s
81 0
+ 1.0
1
59
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Functional Units
Itanium
Itanium2
I nteger
F.P.
Multimedia
Load/ Store
Branch
F.P. MAC
F.P. MAC
ALU/INT/MM
ALU/INT/MM
ALU/MM/MEM
ALU/MM/MEM
ALU/MM/MEM
ALU/MM/MEM
BRANCH
BRANCH
BRANCH
Issue Ports/ Units
60
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Dispersal Matrix
Possible Itanium2 full issue
Possible Itaniumprocessor and Itanium2 full issue
* hint in first bundle
MFB*
MMB*
BBB
MBB
MIB*
MMF
MFI
MMI
MLI
MII
MFM MBB BBB MBB MIB MMF MFI MMI MLI MII
I tanium2 allows more compiler dispersal options
61
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
A simple Example
.
.
double precision, dimension(10000) :: a,b,c,d
do i=1,10000
a(i)=a(i)*b(i)+c(i)*d(i)
enddo
.
.
DAXPY like loop over floating-point vectors
can be optimized differently for Itanium
and Itanium2
62
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I taniumvs. I tanium2 Assembly Code
3 clockticks on Itanium
.b1_2:
{ .mmf
(p16) ldfd f37=[r8],8
(p16) ldfd f45=[r3],8
(p19) fma.d f52=f40,f48,f0 ;;
}
{ .mmi
(p16) ldfd f32=[r33]
(p16) ldfd f40=[r2],8
nop.i 0 ;;
}
{ .mfi
(p23) stfd [r40]=f51
(p20) fma.d f48=f36,f44,f53
nop.i 0
}
{ .mib
(p16) add r32=8,r33
nop.i 0
br.ctop.sptk .b1_2 ;;
}
2 clockticks on Itanium 2 !
.b1_2:
{ .mfi
(p16) ldfd f43=[r8],8
(p19) fma.d f51=f46,f50,f0
nop.i 0
}
{ .mmf
(p16) ldfd f47=[r3],8
(p23) stfd [r32]=f56
(p21) fma.d f54=f37,f42,f53 ;;
}
{ .mii
(p16) ldfd f32=[r33]
nop.i 0
nop.i 0
}
{ .mmb
(p16) ldfd f37=[r2],8
(p16) add r32=8,r33
br.ctop.sptk .b1_2 ;;
}
63
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
6.4 GB/s 6.4 GB/s
128 bits wide 128 bits wide
400 MHz 400 MHz
Itanium 2 Processor Itanium 2 Processor Itanium Processor Itanium Processor
10 10
4 Integer,
3 Branch
2 FP,
2 SIMD
2 Load
or 2 Store
1 2 3 4 5 6 7 8 9
Pipeline Pipeline
Stages Stages
328 on 328 on- -board Registers board Registers
6 Instructions / Cycle 6 Instructions / Cycle
4 MB L3 on board, 96k L2, 32k L1 on 4 MB L3 on board, 96k L2, 32k L1 on --die die
2.1 GB/s 2.1 GB/s
64 bits wide 64 bits wide
266 MHz 266 MHz
800 MHz 800 MHz
Issue Issue
Ports Ports
8 8
2 FP,
1 SIMD
2 Load &
2 Store
1 2 3 4 5 6 7 8 9
328 on 328 on- -board Registers board Registers
6 Instructions / Cycle 6 Instructions / Cycle
3 MB L3, 256k L2, 32k L1 all on 3 MB L3, 256k L2, 32k L1 all on- -die die
1 GHz 1 GHz
1011
Large on Large on- -die cache, die cache,
reduced latency reduced latency
Increased Increased
Core frequency Core frequency
Additional Additional
Execution units Execution units
Additional Additional
Issue ports Issue ports
3X increase 3X increase
Systembus bandwidth Systembus bandwidth
McKinley delivers performance through: McKinley delivers performance through:
Bandwidth and cache improvements Bandwidth and cache improvements
Micro Micro- -architecture enhancements architecture enhancements
Increased frequency Increased frequency
System bus System bus
Itanium 2
221 million transistors total
25 million in CPU core
6 Integer,
3 Branch
64
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Architectural Changes
Beneficial to compilers
Improved data/control speculation support
ALAT - fully associative = minimize thrashing
processor directly vectors to recovery code for reduced
speculation costs
64-bit Long Branch Instruction
Beneficial to OS and System designs
Full 64-bit virtual addressing
Full 2**24 virtual address spaces
4GB virtual pages = reduced TLB pressure
50-bit Physical addressing = very large memory/IO spaces
Changes provide more flexibility to compiler,
OS and system designs
65
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Pipelines
L2 Queue Nominate/Issue (4) L2N-L2I Integer and FP Register File read (6) REG
Integer and FP Register Rename (6 inst)
Expand, Port Assignment and Routing
Instruction Rotate and Buffer (6 inst)
IP Generate, L1I Cache (6 inst) and TLB
access
L2A-W
FP1-WB
WB
DET
EXE
L2 Access, Rotate, Correct, Write (4)
FP FMAC pipeline (2) + reg write REN
Writeback, Integer Register update EXP
Exception Detect, Branch Correction ROT
ALU Execute(6), L1D Cache and TLB
access + L2 Cache Tag Access(4)
I PG
Short 8-stage in-order main pipeline
In-order issue, out-of-order completion
Reduced branch misprediction penalties
Fully interlocked, no way-prediction or flush/replay mechanism
Pipelines are designed for very low latency
REN EXP ROT IPG DET WB EXE REG
L2W L2C L2D L2M L2A L2I L2N
WB FP4 FP3 FP2 FP1
FPU
Core
L2
66
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Issue Ports
Issue ports
4 Mem/ALU/Multi-Media
2 Integer/ALU/Multi-Media
2 FMAC
3 branch
4 memory ports
Integer: allow 2 load AND 2 store per clk
FP: 2 FP load pairs AND 2 store per clk to feed 2 FMACs
L1 instruction cache
two instruction
bundles
ALU/
MEM
1
ALU/
MEM
2
ALU/
MEM
3
ALU
MEM
4
six arithmetic
logic units
two load ports
two store ports
(1 cycle latency)
ALU/
INT
1
ALU/
INT
2
L1
data
cache
Itanium 2
Substantial performance headroom for
FP and integer kernels
67
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Unit Latencies
Consuming Class Instruction
Producing Class Instruction Integer Multi- Load Store
media Address Data
Mem/integer ports ALU 1 2 1 1
Integer only ports ALU 1 2 1 1
Multimedia 3 2 3 3
Integer Loads (L1D hit) 1 2 2 1
Short latencies and full bypasses, improve
performance for re-optimized code
68
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating Point Latencies
Short latencies = performance upside for re-
optimized FP code
6 INT FP (setf)
4 FMISC
5 FP INT (getf)
4 FMAC
6 FP Load (L2 Cache hit)
Itanium2 Latency Operation
69
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating Point Architecture
DIV and SQRT are done in software to enable
better ILP
full pipelining
higher throughput
more flexibility
support full IEEE.754 compliance
versions optimized for latency and throughput
also available for SIMD F.P. operations
Source: Intel Technology Journal Q4, 1999
70
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating-Point DIV Throughput
Optimized
71
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Floating-Point SQRT Throughput
Optimized
72
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I nteger DI V
73
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium2 Branch Prediction
Zero clock branch prediction
2 level branch prediction hierarchy
L1IBR Level 1 Branch Cache
Part of the L1 I-cache
1K trigger predictions+0.5K target addresses
L2B - Level 2 Branch Cache (12K histories)
PHT - Pattern History Table (16K counters)
Reduced prediction penalties
IP-relative branch w/correct prediction - 0 cycle
IP-relative branch w/wrong target - 1 cycle
Return branch w/correct prediction - 1 cycle
Last branch in counted loop prediction - 0 cycle
Branch Misprediction 6 cycle
Reduced branch penalties speed up existing code
74
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Instruction Prefetching
Streaming prefetching
Initiated by br.many (hint on branch inst)
CPU prefetches ahead the sequential execution stream
Streaming prefetch is cancelled by:
a predicted-taken branch in the front-end
a branch misprediction occurs on the back-end
Software cancels the prefetch with a brp instruction
Branch Prefetching Hints
Initiated by brp.few, brp.many or mov_to_br
One time prefetch for the target
Two hint prefetches can be initiated per cycle
Software initiated instruction prefetching improves
performance by lower instruction fetch penalties
75
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Caches
R: 32 GBs
W: 32 GBs
R: 32 GBs
W: 32 GBs
R: 16 GBs
W: 16 GBs
R: 32 GBs Bandwidth
WB (WA) WB (WA
+ RA)
WT (RA) - Write Policy
12 I NT: 5
FP: 6
I NT:1 I-Fetch:1 Latency
(load to use)
NRU NRU NRU LRU Replacement
12 8 4 4 Ways
128B 128B 64B 64B Line Size
3M on die 256K 16K 16K Size
L3 L2 L1D L1I
All caches are physically indexed, pipelined, and non-blocking:
score boarded registers allow continued execution until load use
76
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
L1D (1 clock Integer Data Cache)
High Performance 32GB/ s, 2 ld AND 2 st ports
Write Through all stores are pushed to the L2
FP loads force miss, FP stores invalidate
True dual-ported read access no load conflicts
pseudo-dual store port write access
2 store coalescing buffers/port hold data until L1D update
Store to load forwarding
One clock data cache provides a significant
performance benefit
77
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
L2 and L3 Cache
L2 256KB, 32GB/ s, 5 clk
Data array is pseudo-4 ported - 16 banks of 16KB each
Non-blocking/ out-of-order
L2 queue (32 entries) - holds all in-flight load/stores
out-of-order service - smoothes over load/store/bank conflicts, fills
Can issue/retire 4 stores/loads per clock
Can bypass L2 queue (5,7,9 clk bypass) if
no address or bank conflicts in same issue group
no prior ops in L2 queue want access to L2 data arrays
Large iL3 3MB, 32GB/ s, 12 clk cache on die !!
Single ported full cache line transfers
Large on die L2 and L3 cache provides significant
performance potential
78
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
TLBs
2-level TLB hierarchy
DTC/ITC (32/32 entry, fully associative, .5 clk)
Small fast translation caches tied to L1D/L1I
Key to achieving very fast 1-clk L1D, L1I cache accesses
DTLB/ITLB (128/128 entry, fully associative, 1 clk)
All architected page sizes (4K to 4GB)
Supports up to 64/64 ITR/DTRs
TLB miss starts hardware page walker
Small fast TLBs enable low latency caches
79
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
System Bus Enhancements
Extension of the I taniumprocessor bus
Same protocol with minor extensions
Increased to 6.4GB/s bandwidth
frequency 200MHz, 400MHz data, 128-bit data bus
Bus is non-blocking and out of order
Most transactions can be deferred for later service
Buffering
18 bus requests/CPU are allowed to be outstanding
16 Read Line + 6 Write Line + two 128 byte WC buffers
I tanium2 significantly extends the system bus
performance level
80
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
New Bus Transactions
L3 cast-outs (Normally silent L3 replacement (E->I, S->I))
Reduces snoop traffic in Directory based systems
Backward inquiry for L2, L1 coherency
Memory read current
non-destructive (non-coherent) snoop of CPU lines
Used in high bandwidth graphic based systems
Cache Cleanse writes all modified lines to memory
M->E, Used in fault tolerant systems invoked via PAL
I tanium2 provides several new bus transactions
to improve performance/ reliability
81
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Error Features
Error detection on all major arrays
Parity coverage on L1D, L1I, and TLBs
ECC on L2 and L3
double bit detection single bit correction - Out of path repair
all errors are fully contained
Bus is covered with parity/ECC
double bit detection single bit correction on transmission
Error Isolation (end-to-end error detection)
From memory: unique FSB 2xECC syndrome encoding can
tolerant additional single bit errors in transmission
Error not reported until referenced by a consuming process
I tanium2 provides extensive error
detection/ correction/ containment
82
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Thermal Management
Programmable fail-safe thermal trip
Itanium2 will reduce power consumption
Reduce power consumption to ~60% of peak
Execution rate dropped to 1 inst per clock
Correct Machine Check notification posted to OS
Full speed execution resumes when temperature
drops
never invoked in properly designed and
operating cooling systems
even on worse case power code
I tanium2 provides a thermal fail-safe
mechanism in the event of a cooling failure
83
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2 Processor
Itanium2 builds on and extends the I taniumprocessor
family to meet the needs of the most demanding
enterprise and technical computing environments
Enhanced Itanium2 features are a result of extensible Itanium
architecture
Itanium2 is binary compatible with Itaniumprocessor software
Major enhancements include:
Increased frequency
Enhanced micro-architecture more execution units, issue ports
Efficient data handling; higher bandwidth and reduced latencies
84
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelItanium2 Processor
Platforms
85
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Performance Scaling
Scale-Out
(Cluster)
Scale-Up
(SMP, ccNUMA)
86
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
DP/ 1U 4P/ 4U
16P
32P
64-512P
4P/ 8P/ 16P DP/ 2U
I nc r ease Capac i t y and
Capabi l i t y
Sc al i ng Out and Sc al i ng Up
Sc al i ng Ri ght
Do more, better and
faster at lower costs.
87
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium Processor Family
OEM Server Designs
4P
8-16P
>16P
17 OEMs Shipping
>20 OEM Platforms 10 OEM Designs
4 OEMs Shipping
6 OEM Designs
1 OEM Shipping
Itanium 2
(Madison)
Itanium
Processor
Substantial investment by OEMs in custom high-
end platforms and growing
88
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium

2 Systems
High-end I tanium 2-based systems
>2X more than I tanium !
Racksaver
DP/1U
1H 2003
Intel
4P/4U
2P/ 2U
Q4 2002/ Q2 2003
Unisys
16P
Q4 2002
NEC
32P
Shipping
SGI
64/512P
Early 2003
IBM
4P/8P/16P
Early 2003
HP
DP/2U
Shipping
HP 2P WS
Shipping
89
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Itanium

2-based Servers
Bringing High-End Capabilities to I ntel Architecture
Large Memory Capacity
Ex. 4P node w/48GB
512P+ system w/512GB
Scalable to High-End
Multi-Processing
32P+ SMP systems
512P+ Clustered configurations
High-Bandwidth,
Flexible I/O
Large Qty PCI-X slots
Dual GbE LAN
Ultra 320 SCSI
Remote I/O capabilities
Partitioning
Multiple System Images
Static/Dynamic Domains
High-End RAS
Intelligent Platform
Management,
Hardware redundancy
for Fault-Tolerance,
Modular and Hot-Plug
Capabilities
Selected examples of some
high-end OEM platform
capabilities. Not all capabilities
found on all platforms.
OEMs will offer datacenter computing capabilities
with their I tanium 2-based servers
90
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
The Chipset
I/O
Bridge
Processors
Memory &
I/O
Controller
Memory
Bridge
Memory
modules
I/O
Devices
Chipset
The chipset is a key ingredient to platform design and performance
91
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Optimized for:
1-2P workstations
2-4P servers
Designed for great cost &
performance
Great developers desktops
High-performance clusters
Features:
6.4 GB/s processor bandwidth
12.8 GB/s memory bandwidth
4.0 GB/s I/O bandwidth
Extremely low latency
Hewlett-Packard zx1 Chipset
HP zx1
memory & I/O
controller
HP zx1
I/O
adapter
HP zx1
scalable
memory
expander
DIMMs
HP zx1
chipset
PCI bus
PCI-X bus
AGP bus
HP scalable processor chipset zx1
3 modular components
Intel
Itanium 2
processors
(1-4)
92
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
HP Itanium

2 Processor based
Systems
1 GHz Itanium 2
4-way
HP zx1 chipset
900MHz/1GHz Itanium 2
1-2 way HP zx1 chipset
AGP4X OEM graphics
900MHz Itanium 2
1-way HP zx1 chipset
AGP4X OEM graphics
93
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I tanium2 Workstations
HP zx6000 HP zx2000
94
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
NEC I tanium2-based Server
TX7 Series
"TX7/i6010,i6510,i9010/i9510"
LI NPACK HPC of 101.77GFLOPS on 32 CPUs
http://www.nec.co.jp/press/en/0207/0901.html
95
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Shared Memory via ccNUMA
http://www.sgi.com/features/2003/jan/altix/
96
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelE8870 Chip-Set
97
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
System Bus:
16 Bytes Wide
Double Pumped
200MHz/400MT/s
6.4 GB/sec
Memory:
Quad Memory
Channels
6.4 GB/sec peak
16 DDR DIMM Sites
32 GB max
I/O Busses:
Hot Plug PCI-X up to 133MHz
Direct Attached InfiniBand*
Hub Interface 2.0 :
4 pt-to-pt Busses
16 Bits Wide
A Total of 4 GB/sec
Scalability Ports:
2 pt-to-pt Connects
16 Bits Wide
6.4 GB/sec Full Duplex
IntelE8870 Block Diagram
I
n
f
i
n
i
B
a
n
d
*
1
0
0
1
0
0
S
P
0
S
P
1
Data Bus
SystemBus
PCI 32/33
Video
Processor
FWH
LPC
MRHD
MRHD
MRHD
MRHD
870
SNC
4 Memory
Channels
870
SIOH
HL1 @
266MB/s 266MB/s
LPC HL2 HL2
Processor Processor Processor
1
3
3
1
3
3
870
P64H2
870
P64H2
SCSI
LAN
FWH
BMC
FWH
1
3
3
870
ICH
1
0
0
870
VXB
870
P64H2
98
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelE8870 Chipset Architecture
Key Features
l Open platform architecture
Efficient use of building blocks
End user ease of upgrade value
l Versatile chipset spanning
multiple segments
4 and 8 way Servers
Scalability port building block enables
up to 512 way configurations
l Balanced system
performance
Memory, scalability port, I/O bandwidth
Maximizes system throughput
l Persistent/ scalable
interfaces
Reuse spans processor generations
Systems scalability headroom
l Robust RAS features
Scalability
Port Switch
Memory Memory Memory Memory
Scalability
Node
Controller
I/O Hub
PCI PCI--(X) (X)
Bridge Bridge
Legacy
I/O
PCI PCI--(X) (X)
Bridge Bridge
Legacy
I/O
Scalability
Port Switch
Scalability
Node
Controller
I/O Hub
PCI PCI--(X) (X)
Bridge Bridge
PCI PCI--(X) (X)
Bridge Bridge

Processors Processors
99
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
4-way I tanium2
I ntel

E8870 chipset
16 DDR DI MMs (32GB)
PCI -X up to 133MHz
Lower MTBR
Tool Less I nsertion Extraction
Blind Mate Modules
No Cable Assembly
4-Way, 4U, High Performance,
Modular Platform
100
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelItanium2 Processor
Software Environment
101
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
C/ C++Data Model
OS Implements the Data Models
I LP32
int, long and ptr are 32 bits
Used by 32-bit OSs
LP64
int is 32 bits
long and pointer are 64 bits
Used by 64-bit UNIX/Linux OSs
P64
int and long are 32 bits; pointer is 64 bits
Used by Win64* and Modesto*
32 32
32 32
32 32
ILP32 ILP32
size size
(bits) (bits)
64 64
32 32
64 64
LP64 LP64
size size
(bits) (bits)
32 32
32 32
64 64
P64 P64
size size
(bits) (bits)
long long
int int
pointer pointer
default settings default settings
102
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
OSV Support for Itanium Processor Family
OpenVMS OpenVMS
NonStop NonStop Kernel, Kernel,
Converged Converged
Enterprise UNIX Enterprise UNIX
HP-UX*: Fully supported 1.5
release now, Version 1.6
update planned for 2H '02
Red Hat*, SuSE*, Caldera*,
Turbolinux* Linux in
production today
l Windows* XP 64 bit for
1-2 way workstations in
production today
l 64-bit version of
Windows* Advanced
Server, Limited Edition
available for early
adopters now
l Windows .Net Server
scheduled for 1H03
l Port to Itanium
architecture
underway
l Developer versions
target 03, production
versions in 04
103
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
High-End Enterprise Applications
(Databases, Business Intelligence, ERP / SCM)
Beta version since Q1 02,
focusing on optimization
Developer version available
since Q4 01
DB2 early adopter release
available since 2H 01
Engaged with early adopter
end-users, strong performance
Production version targets
mid-02, performance for large
data sets
Initial porting work complete,
optimization on-going
Future product plans from: Ariba, Autonomy, BEA, BMC Software, Check Point
Software, Citrix, Commvault, Computer Associates, Covalent, Entrust, IBM WebSphere,
Informix, Intershop, J D Edwards, Manugistics, MigraTEC, Network Associates, Nuance,
Oasis, Oblix, Openshop, TimesTen, Tivoli Systems, Verisign, Veritas, Zeus
Software Solution Support
104
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Microsoft Windows* Advanced Server
Limited Edition
Red Hat Linux 7.1 with 2.4 Kernel
Suse Linux 7.2 with 2.4.4 Kernel
Turbolinux 7 with 2.4 Kernel
Intel C++ 6.0 compiler
Intel Fortran 6.0 compiler
Microsoft Platform SDK
Linux GCC 2.96-RH compiler
IBM JDK1.3.0* (beta)
Appeal Jrockit JVM
GNU Compiler for Java (GCJ)
Intel Integrated Performance Primitives 1.1
Intel Math Kernel Library 5.1
VTune Performance Analyzer 6.0
Intel KAI KAP/PRO* Toolset
Operating Systems Compilers
Java Software Tools
Performance Tools
& Library
Software
Solution
Developer
Software Software
Solution Solution
Developer Developer
Comprehensive I tanium Processor Software
Development Environment Available Today
Caldera OpenLinux* Server 64
Release 3.1 with 2.4 Kernel
HP-UX* 11i
HP C , aC++ , Fortran 90 compilers
Itanium Production
Systems, Itanium 2
SDVs (Sep 02)
& Additional
3
rd
Party Tools
Many other tools available that are not captured on this slide.
Microsoft Windows* XP 64-Bit Edition
HP-UX* SDK and RTE 1.3
Linux GCC 3.01 compiler
105
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Software Tools
Optimized for on
106
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I ntel Software Development Tools
Compilers
Intel

Threading
Tools
VTune

Performance
Analyzer
Performance
Libraries
SW Products Developer Services
www.intel.com/ids
107
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Compilers
Targeted for Intel Architecture based
Windows* and Linux* platforms
Optimized for the latest Intel
microprocessors:
IntelPentium4 Processor
IntelXeonProcessor
IntelItaniumProcessor
IntelItanium2 Processor
Auto-vectorization and OpenMP support
Integration of CVF technologies in 2003
http://developer.intel.com/software/products/compilers
108
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Development Tools for Windows*
Compilers
- MSFT C/C++ Platform SDK
- IntelC/C++
- IntelFortran95
Performance Tools
- IntelIPP Library
- IntelMKL Library
- IntelVTune Performance Analyser
- Intel KAI KAP/Pro* Toolset
J ava
- IBM J DK
- BEA J Rockit* J DK
- TowerJ *
109
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Development Tools for Linux*
Compilers
- GNU gcc
- IntelC/C++
- IntelFortran95
Performance Tools
- IntelIPP Library
- IntelMKL Library
- IntelVTune Performance Analyzer Collector
- Intel KAI KAP/Pro* Toolset
- Linux glibc Library
J ava
- IBM J DK
110
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I ntel Software Toolset
KAI OpenMP IntelCompilers
IntelPerformance Libraries
IntelVTunePerf. Analyser
KAI Assure IntelThread Checker
KAI GuideView IntelThread Profiler
being integrated during 2003
111
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I ntel Compiler Architecture
C/C++
Front End
C/C++ C/C++
Front End Front End
Interprocedural analysis and optimizations:
inlining, constant prop, whole program detect, mod/ref, points-to
Interprocedural analysis and optimizations: Interprocedural analysis and optimizations:
inlining inlining, constant prop, whole program detect, mod/ref, points , constant prop, whole program detect, mod/ref, points--to to
Loop optimizations:
data deps, prefetch, scalar repl, unroll/interchange/fusion/dist, auto-parallel/OpenMP
Loop optimizations: Loop optimizations:
data data deps deps, prefetch, scalar , prefetch, scalar repl repl, unroll/interchange/fusion/dist, auto , unroll/interchange/fusion/dist, auto--parallel/ parallel/OpenMP OpenMP
Global scalar optimizations:
partial redundancy elim, dead store elim, strength reduction, dead code elim
Global scalar optimizations: Global scalar optimizations:
partial redundancy partial redundancy elim elim, dead store , dead store elim elim, strength reduction, dead code , strength reduction, dead code elim elim
Code generation:
predication, software pipelining, global scheduling, register allocation, code generation
Code generation: Code generation:
predication, software pipelining, global scheduling, register al predication, software pipelining, global scheduling, register allocation, code generation location, code generation
FORTRAN 77/95
Front End
FORTRAN 77/95 FORTRAN 77/95
Front End Front End
D
i
s
a
m
b
i
g
u
a
t
i
o
n
:
t
y
p
e
s
,

a
r
r
a
y
,

p
o
i
n
t
e
r
,

s
t
r
u
c
t
u
r
e
,

d
i
r
e
c
t
i
v
e
s
,
l
d

s
a
f
e
t
y

D
i
s
a
m
b
i
g
u
a
t
i
o
n
:
D
i
s
a
m
b
i
g
u
a
t
i
o
n
:
t
y
p
e
s
,

a
r
r
a
y
,

p
o
i
n
t
e
r
,

s
t
r
u
c
t
u
r
e
,

d
i
r
e
c
t
i
v
e
s
,
l
d

s
a
f
e
t
y

t
y
p
e
s
,

a
r
r
a
y
,

p
o
i
n
t
e
r
,

s
t
r
u
c
t
u
r
e
,

d
i
r
e
c
t
i
v
e
s
,
l
d

s
a
f
e
t
y

M
a
c
h
i
n
e

M
o
d
e
l
M
a
c
h
i
n
e

M
o
d
e
l
M
a
c
h
i
n
e

M
o
d
e
l
P
r
o
f
i
l
e
r
P
r
o
f
i
l
e
r
P
r
o
f
i
l
e
r
112
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
IntelCompilers Version 7.0
Released November 2002
Improved stability and optimization
Supports I tanium2 (-tpp2)
More OpenMP2.0 support
Improved C99 standard support
Improved gcc compatibility
More and better reporting switches
New Fortran directives (e.g. PREFETCH)
Bridge to Version 8.0 (CVF IVF), improved
compatibility with CVF
113
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Performance Counter
Light-weight performance analysis tool to complement
VTune
Leverage HPs excellent pfmon on ItaniumArchitecture Linux64
114
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Intel

Performance Libraries
IntelMKL (Math Kernel Library)
Highly optimized library to provide high performance on critical
kernel operations in science and engineering
Parallelism built into the library for automatic SMP support
Vector Math Library (VML)
IntelIPP (Integrated Performance Primatives)
Highly optimized functions to provide high performance on
critical kernel operations for multi-media data types
Available on multiple platforms to increases the portability of
performance-based applications
http://developer.intel.com/software/products/perflib
115
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
VML Performance
X87
Pentium 4 Processor
Pentium III Processor
Itanium Processor
116
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Worldwide Support & Solution Centers
OEM
OEM SI/SP
117
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
General Optimizations
-O0: disables optimization
-O1: optimizes for speed without increasing code size
-O2: optimizes for speed (default)
-O3: enables -O2 plus more aggressive optimizations,
may not improve performance for all programs
-tpp2: Itanium2 Code Generation (instruction mix)
-fno-alias: assumes no aliasing in program (may be
unsafe)
-align: analyzes and reorders memory layout for
variables and arrays (FTN only)
-pad: enables changing variable and array memory
layout (FTN only)
118
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I nterprocedural Optimization
Extends optimizations across file boundaries.
Compile & Optimize Compile & Optimize
Compile & Optimize Compile & Optimize
Compile & Optimize Compile & Optimize
Compile & Optimize Compile & Optimize
file1.c
file2.c
file3.c
file4.c
Without IPO (or with Without IPO (or with - -ip ip) )
Compile & Optimize Compile & Optimize
file1.c
file4.c file2.c
file3.c
With IPO With IPO
119
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
How I PO Works
foo
(optimized
executable)
Link program
i cc - o f oo - i po f oo. o
2a. Compiler performs whole-program
optimizations
2b. Compiler invokes linker to produce
executable
foo.o
(fake object file)
Compile program
i cc - c - i po f oo. c
foo.il
(un-optimized
intermediate
language files)
120
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Programs that Benefit from I PO
Many small utility functions
Frequent constructor/destructor invocation
One-liner member functions
Lynx success story
Intel Spice-like circuit simulator
Highly tuned algorithmically
Intel compiler (icc) with O2 & IPO:
1.2x - 5.2x speedup over gcc -O (2x typical)
1x - 2.4x speedup over icc -O2 (1.2x typical)
121
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Profile-Guided Optimization
Benefits:
More accurate branch prediction
Better register allocation
Improved IPO inlining
Basic block movement
status = UtilityFunc (arg1, arg2, arg3);
if (status != 0) // Not expected to fail
HandleErr (status);
Improves I-cache behavior
Available for Itaniumand IA-32
Feed back of profile data gathered during pro-gram
execution to improve subsequent builds.
122
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
How PGO Works
foo
(instrumented
executable)
Compile+link to add instrumentation
i cc o f oo - pr of _gen f oo. c
12345678.dyn
(dynamic profile)
Execute instrumented program
./f oo
pgopti.dpi
(merged .dyn files)
foo
(optimized
executable)
Compile+link using feedback
i cc o f oo - pr of _use f oo. c
123
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
Consistent hot paths
Many if statements or switches
Nested if statements or switches
Significant
Benefit
Little
Benefit
Programs that Benefit from PGO
124
EMEA HPTC Virtual Team
Intel

I t ani um Architecture
Copyright 2002-2003 Intel Corporation
*Other brands and names are the property of their respective owners
I taniumTuning Tips
Enable the Compiler
Software pipelining of key loops
Pointer disambiguation in C codes
Interprocedural Optimization
Profile guided optimizations
Utilize Cache Hierarchy (spacial & temporal locals)
Use tuned libraries
Use tuning tools
IntelVTune Performance Analyzer
Use Web resources
http://developer.intel.com/itanium
Thank You.
www.intel.com

S-ar putea să vă placă și