CS 352: Computer Systems Architecture

CS 352: Computer Systems Architecture
Lecture 1: What is Computer

Architecture?
January 22, 2007
Doug Burger
Computer Architecture and Technology Laboratory
University of Texas at Austin
dburger@cs.utexas.edu
UTCS Lecture 1 1
Goals
• Understand the “how” and “why” of computer system

organziation
– Instruction Set Architecture
– System Organization (processor, memory, I/O)
– Microarchitecture
• Learn methods of measuring and improving performance
– Metrics
– Benchmarks
– Performance methods
• Pipelining, ILP, prediction
• Learn to think and program concurrently
UTCS Lecture 1 2
Logistics
Lectures M/W 9:00-10:15 am, GEO 2.102

Instructor Prof. Doug Burger
TA Dong Li
Grading Final Exam 1 25%

Midterm Exam 2 15% each
Homework ~7 25%
Project 1 20%
Texts Patterson & Hennessy, Computer

Organization and Design (Third Edition)
Course Readings (handed out in class)
UTCS Lecture 1 3
CS352 Online
URL: Blackboard!
Other stuff off of my home page
(Course materials, research info)
Computer Architecture Seminar Series:

www.cs.utexas.edu/users/cart/arch
UTCS Lecture 1 4
Specification compute the fibonacci sequence
for(i=2; i<100; i++) {

Program a[i] = a[i-1]+a[i-2];}
load r1, a[i];

ISA (Instruction Set Architecture) add r2, r2, r1;
registers
microArchitecture
A
Logic F
B
D D
S
Transistors G G
Physics/Chemistry S S
UTCS Lecture 1 5
CS352 Topics
• Technology Trends
• Instruction set architectures
• Pipelining
• Modern pipelined architectures
– Dynamic ILP machines
– Static ILP machines
• Cache memory systems
• Virtual memory
• Multiprocessors
• Computer system implementation
UTCS Lecture 1 6
What is Computer Architecture?
I PA
k ni L
ASI
/I
nah C O
Interfaces
Technology
IR
Regs
Machine Organization
Computer
Applications
Architect
Measurement &
Evaluation
UTCS Lecture 1 7
Technology Constraints
• Yearly improvement
– Semiconductor technology
• 60% more devices per 1989
chip
(doubles every 18 months) 1992
• 15% faster devices
(doubles every 5 years)
• Slower wires 1995
– Magnetic Disks
• 60% increase in density
– Circuit boards 1998
• 5% increase in wire
density
– Cables
2002
• no change
100x more devices since 1989
8x faster devices
UTCS Lecture 1 8
Changing Technology leads to
Changing Architecture
• 1970s (CISC mainframes) • 1990s (fast clocks)
– multi-chip CPUs
– lots of transistors
– semiconductor memory very
expensive – complex control to exploit
– microcoded control instruction-level
– complex instruction sets parallelism
(good code density)
• 2000s (???)
• 1980s (RISC micros)
– even more transistors
– single-chip CPUs, on-chip RAM
feasible – slow wires
– simple, hard-wired control – BIG SHIFT Here!!!
– simple instruction sets
• Parallelism is focus
– small on-chip caches
• Power now critical
• Open debate
UTCS Lecture 1 9
Changing Technology leads to
Changing Architecture
• 1970s (CISC mainframes) • 1990s (fast clocks)
– multi-chip CPUs
– lots of transistors
– semiconductor memory very
expensive – complex control to exploit
– microcoded control instruction-level
– complex instruction sets parallelism
(good code density)
• 2000s (???)
• 1980s (RISC micros)
– even more transistors
– single-chip CPUs, on-chip RAM
feasible – slow wires
– simple, hard-wired control – BIG SHIFT COMING!!!
– simple instruction sets
• Parallelism is focus
– small on-chip caches
• Power now critical
• Open debate
UTCS Lecture 1 10
QuickTimeª and a
TIFF (LZW) decompressor
are needed to see this picture.
UTCS Lecture 1 11
Courtesy Intel
UTCS Lecture 1 12
Courtesy Troubador
UTCS Lecture 1 13
Courtesy Troubador
Intel 4004 - 1971
• The first
microprocessor
• 2,300 transistors
• 108 KHz
• 10µ m process
UTCS Lecture 1 14
Intel Pentium IV - 2001
• “State of the art”
• 42 million transistors
• 2GHz
• 0.13µ m process
• Could fit ~15,000

4004s on this chip!
UTCS Lecture 1 15
Application Constraints
• Applications drive machine

‘balance’
– Numerical simulations
• floating-point performance
• main memory bandwidth
– Transaction processing
• I/Os per second
• integer CPU performance
– Decision support
• I/O bandwidth
– Embedded control
• I/O timing, power
– Media processing
• low-precision ‘pixel’
arithmetic
UTCS Lecture 1 16
Interface Design
• A good interface
– lasts through several generations of implementations
• IBM 360 and x86 ISAs, DOS APIs
– is simple - ‘economy of mechanism’
• Interfaces are visible, Implementations generally aren’t
• 3 Types of Interfaces
– Between Layers
• API, ISA
– Between Modules
• Network protocol (Ethernet), I/O channel or bus (SCSI or
PCI)
– Standard Representations
• ASCII, IEEE floating-point
UTCS Lecture 1 17
Instruction-Set Architecture
Hardware/Software Interface
• Software impact
– support OS functions OP R1 R2 R3 imm
• restartable instructions
• memory relocation and
protection
– a good compiler target
• simple
• orthogonal
– dense OP M1 R1 M2 R2 im2
...
• Hardware impact
– admits efficient implementation
• across generations M3 R3 im2
– admits parallel implementation
• no ‘serial’ bottlenecks
• Abstraction without
interpretation
UTCS Lecture 1 18
System-Level Organization
• Design at the level of
processors, memories, and
interconnect.
800MHz
• More important to application P
4-way Issue
performance than CPU design
• Feeds and speeds 16Bytes x
– constrained by IC pin count,
module pin count, and signaling 200MHz Display
rates
• System balance Net
SW I/O
– for a particular application Disk
• Driven by
– performance/cost goals
– available components
(cost/perf)
– technology constraints
M M M M
UTCS Lecture 1 19
Microarchitecture
• Register-transfer-level (RTL) design

• Implement instruction set
• Exploit capabilities of technology
– locality and concurrency Instr.
PC
• Iterative process Cache
– generate proposed architecture
– estimate cost
– measure performance
• Current emphasis is on overcoming IR
sequential nature of programs
– deep pipelining
B
– multiple issue
– dynamic scheduling
C
– branch prediction/speculation Regs
UTCS Lecture 1 A 20
Performance Measurement and Evaluation
Many Dimensions to Performance

• CPU execution time
– by instruction or sequence P
• floating point
• integer
• branch performance
• Cache bandwidth $
• Main memory bandwidth
• I/O performance
– bandwidth
– seeks M
– pixels or polygons per second
• Relative importance depends
on applications
UTCS Lecture 1 21
Evaluation Tools
• Benchmarks, traces, & mixes

– macrobenchmarks & suites
• application execution time MOVE 39%
– microbenchmarks BR 20%
LOAD 20%
• measure one aspect of STORE 10%
performance ALU 11%
– traces
• replay recorded accesses
LD 5EA3
• cache, branch, register ST 31FF
• Simulation at many levels ….
LD 1EA2
– ISA, cycle accurate, RTL, gate, ….
circuit
• trade fidelity for simulation rate
• Area and delay estimation
• Analysis
– e.g., queuing theory
UTCS Lecture 1 22
Next Time
• Evaluation of Systems
– Performance
• Amdahl’s Law, CPI
– Cost
• Computer system elements

– Transistors and wires
• Reading assignment
– P&H Chapter 1, 2.1-2.4
UTCS Lecture 1 23

CS 352: Computer Systems Architecture

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CS 352: Computer Systems Architecture

Încărcat de

Drepturi de autor:

Formate disponibile

CS 352: Computer Systems Architecture

Lecture 1: What is Computer

January 22, 2007

• Understand the “how” and “why” of computer system

Lectures M/W 9:00-10:15 am, GEO 2.102

Grading Final Exam 1 25%

Texts Patterson & Hennessy, Computer

Computer Architecture Seminar Series:

for(i=2; i<100; i++) {

load r1, a[i];

• “State of the art”

• Could fit ~15,000

• Applications drive machine

• Register-transfer-level (RTL) design

Many Dimensions to Performance

• Benchmarks, traces, & mixes

• Computer system elements

S-ar putea să vă placă și