Sunteți pe pagina 1din 23

CS 352: Computer Systems Architecture

Lecture 1: What is Computer


Architecture?

January 22, 2007

Doug Burger
Computer Architecture and Technology Laboratory
University of Texas at Austin
dburger@cs.utexas.edu

UTCS Lecture 1 1
Goals

• Understand the “how” and “why” of computer system


organziation
– Instruction Set Architecture
– System Organization (processor, memory, I/O)
– Microarchitecture
• Learn methods of measuring and improving performance
– Metrics
– Benchmarks
– Performance methods
• Pipelining, ILP, prediction
• Learn to think and program concurrently

UTCS Lecture 1 2
Logistics

Lectures M/W 9:00-10:15 am, GEO 2.102


Instructor Prof. Doug Burger
TA Dong Li

Grading Final Exam 1 25%


Midterm Exam 2 15% each
Homework ~7 25%
Project 1 20%

Texts Patterson & Hennessy, Computer


Organization and Design (Third Edition)
Course Readings (handed out in class)

UTCS Lecture 1 3
CS352 Online

URL: Blackboard!
Other stuff off of my home page
(Course materials, research info)

Computer Architecture Seminar Series:


www.cs.utexas.edu/users/cart/arch

UTCS Lecture 1 4
Specification compute the fibonacci sequence

for(i=2; i<100; i++) {


Program a[i] = a[i-1]+a[i-2];}

load r1, a[i];


ISA (Instruction Set Architecture) add r2, r2, r1;

registers
microArchitecture
A

Logic F
B
D D
S
Transistors G G

Physics/Chemistry S S
UTCS Lecture 1 5
CS352 Topics

• Technology Trends
• Instruction set architectures
• Pipelining
• Modern pipelined architectures
– Dynamic ILP machines
– Static ILP machines
• Cache memory systems
• Virtual memory
• Multiprocessors
• Computer system implementation

UTCS Lecture 1 6
What is Computer Architecture?

I PA

k ni L
ASI

/I
nah C O
Interfaces
Technology
IR

Regs

Machine Organization

Computer
Applications
Architect
Measurement &
Evaluation

UTCS Lecture 1 7
Technology Constraints

• Yearly improvement
– Semiconductor technology
• 60% more devices per 1989
chip
(doubles every 18 months) 1992
• 15% faster devices
(doubles every 5 years)
• Slower wires 1995
– Magnetic Disks
• 60% increase in density
– Circuit boards 1998
• 5% increase in wire
density
– Cables
2002
• no change
100x more devices since 1989
8x faster devices
UTCS Lecture 1 8
Changing Technology leads to
Changing Architecture
• 1970s (CISC mainframes) • 1990s (fast clocks)
– multi-chip CPUs
– lots of transistors
– semiconductor memory very
expensive – complex control to exploit
– microcoded control instruction-level
– complex instruction sets parallelism
(good code density)
• 2000s (???)
• 1980s (RISC micros)
– even more transistors
– single-chip CPUs, on-chip RAM
feasible – slow wires
– simple, hard-wired control – BIG SHIFT Here!!!
– simple instruction sets
• Parallelism is focus
– small on-chip caches
• Power now critical
• Open debate

UTCS Lecture 1 9
Changing Technology leads to
Changing Architecture
• 1970s (CISC mainframes) • 1990s (fast clocks)
– multi-chip CPUs
– lots of transistors
– semiconductor memory very
expensive – complex control to exploit
– microcoded control instruction-level
– complex instruction sets parallelism
(good code density)
• 2000s (???)
• 1980s (RISC micros)
– even more transistors
– single-chip CPUs, on-chip RAM
feasible – slow wires
– simple, hard-wired control – BIG SHIFT COMING!!!
– simple instruction sets
• Parallelism is focus
– small on-chip caches
• Power now critical
• Open debate

UTCS Lecture 1 10
QuickTimeª and a
TIFF (LZW) decompressor
are needed to see this picture.

UTCS Lecture 1 11
Courtesy Intel
UTCS Lecture 1 12
Courtesy Troubador
UTCS Lecture 1 13
Courtesy Troubador
Intel 4004 - 1971

• The first
microprocessor

• 2,300 transistors
• 108 KHz
• 10µ m process

UTCS Lecture 1 14
Intel Pentium IV - 2001

• “State of the art”

• 42 million transistors
• 2GHz
• 0.13µ m process

• Could fit ~15,000


4004s on this chip!

UTCS Lecture 1 15
Application Constraints

• Applications drive machine


‘balance’
– Numerical simulations
• floating-point performance
• main memory bandwidth
– Transaction processing
• I/Os per second
• integer CPU performance
– Decision support
• I/O bandwidth
– Embedded control
• I/O timing, power
– Media processing
• low-precision ‘pixel’
arithmetic

UTCS Lecture 1 16
Interface Design

• A good interface
– lasts through several generations of implementations
• IBM 360 and x86 ISAs, DOS APIs
– is simple - ‘economy of mechanism’
• Interfaces are visible, Implementations generally aren’t
• 3 Types of Interfaces
– Between Layers
• API, ISA
– Between Modules
• Network protocol (Ethernet), I/O channel or bus (SCSI or
PCI)
– Standard Representations
• ASCII, IEEE floating-point

UTCS Lecture 1 17
Instruction-Set Architecture

Hardware/Software Interface

• Software impact
– support OS functions OP R1 R2 R3 imm
• restartable instructions
• memory relocation and
protection
– a good compiler target
• simple
• orthogonal
– dense OP M1 R1 M2 R2 im2

...
• Hardware impact
– admits efficient implementation
• across generations M3 R3 im2
– admits parallel implementation
• no ‘serial’ bottlenecks
• Abstraction without
interpretation

UTCS Lecture 1 18
System-Level Organization
• Design at the level of
processors, memories, and
interconnect.
800MHz
• More important to application P
4-way Issue
performance than CPU design
• Feeds and speeds 16Bytes x
– constrained by IC pin count,
module pin count, and signaling 200MHz Display
rates
• System balance Net
SW I/O
– for a particular application Disk
• Driven by
– performance/cost goals
– available components
(cost/perf)
– technology constraints
M M M M

UTCS Lecture 1 19
Microarchitecture

• Register-transfer-level (RTL) design


• Implement instruction set
• Exploit capabilities of technology
– locality and concurrency Instr.

PC
• Iterative process Cache
– generate proposed architecture
– estimate cost
– measure performance
• Current emphasis is on overcoming IR
sequential nature of programs
– deep pipelining

B
– multiple issue
– dynamic scheduling

C
– branch prediction/speculation Regs

UTCS Lecture 1 A 20
Performance Measurement and Evaluation

Many Dimensions to Performance


• CPU execution time
– by instruction or sequence P
• floating point
• integer
• branch performance
• Cache bandwidth $
• Main memory bandwidth
• I/O performance
– bandwidth
– seeks M
– pixels or polygons per second
• Relative importance depends
on applications

UTCS Lecture 1 21
Evaluation Tools

• Benchmarks, traces, & mixes


– macrobenchmarks & suites
• application execution time MOVE 39%
– microbenchmarks BR 20%
LOAD 20%
• measure one aspect of STORE 10%
performance ALU 11%
– traces
• replay recorded accesses
LD 5EA3
• cache, branch, register ST 31FF
• Simulation at many levels ….
LD 1EA2
– ISA, cycle accurate, RTL, gate, ….
circuit
• trade fidelity for simulation rate
• Area and delay estimation
• Analysis
– e.g., queuing theory

UTCS Lecture 1 22
Next Time

• Evaluation of Systems
– Performance
• Amdahl’s Law, CPI
– Cost

• Computer system elements


– Transistors and wires

• Reading assignment
– P&H Chapter 1, 2.1-2.4

UTCS Lecture 1 23

S-ar putea să vă placă și