Documente Academic
Documente Profesional
Documente Cultură
Todd M. Austin
taustin@ichips.intel.com
Intel MicroComputer Research Labs
January, 1997
Todd M. Austin
Page 1
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 2
System
Inputs
Device
Simulator
System Outputs
System Metrics
Todd M. Austin
Page 3
Functional
Trace-Driven
Performance
Exec-Driven
Interpreters
Inst Schedulers
Cycle Timers
Direct Execution
Todd M. Austin
Page 4
Specification
Arch
Spec
uArch
Spec
Simulation
Arch
Sim
uArch
Sim
Development
q
q
Todd M. Austin
Page 5
q
q
Simulator
execution-driven simulation:
q
q
q
inst trace
program
Simulator
Todd M. Austin
Page 6
q
q
q
cycle-timer simulators
q
q
q
q
Todd M. Austin
Page 7
Pick
Two
Detail
Flexibility
q
q
Todd M. Austin
Page 8
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 9
q
q
q
q
q
q
q
Todd M. Austin
Page 10
F2C
C code
GCC
Assembly code
GAS
Simulators
object files
libf77.a
libm.a
libc.a
GLD
Executables
Bin Utils
Todd M. Austin
Page 11
Primary Advantages
extensible
q
q
portable
q
q
detailed
q
q
q
q
q
Sim-Fast: 4+ MIPS
Sim-OutOrder: 200+ KIPS
Todd M. Austin
Page 12
Sim-Safe
Sim-Profile
- 420 lines
- functional
- 4+ MIPS
- 350 lines
- functional
w/ checks
- 900 lines
- functional
- lot of stats
Sim-Cache/
Sim-Cheetah
Sim-Outorder
Performance
Detail
Todd M. Austin
Page 13
Simulator Structure
User
Programs
Prog/Sim
Interface
SimpleScalar ISA
Functional
Core
Performance
Core
Resource
Cache
Simulator
Core
Loader
Regs
Stats
Dlite!
Memory
Page 14
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 15
Installation Notes
follow the installation directions in the tech report, and
DONT PANIC!!!!
avoid building GLIBC
q
q
q
q
Todd M. Austin
Page 16
Todd M. Austin
Page 17
q
q
q
q
Todd M. Austin
Page 18
Page 19
breakpoints:
code:
q break <addr>
data:
q dbreak <addr> {r|w|x}
code:
q rbreak <range>
DLite! expressions
q
q
q
q
operators: +, -, /, *
literals: 10, 0xff, 077
symbols: main, vfprintf
registers: $r1, $f4, $pc, $fcc, $hi, $lo
Todd M. Austin
Page 20
Execution Ranges
specify a range of addresses, instructions, or cycles
used by range breakpoints and pipetracer (in sim-outorder)
format:
q
q
q
@main:+278
#:1000
:
Todd M. Austin
- main to main+278
- cycle 0 to cycle 1000
- entire execution (instruction 0 to end)
Page 21
Todd M. Austin
Page 22
Todd M. Austin
Page 23
Todd M. Austin
Page 24
example applications:
-pcstat sim_num_insn
-pcstat sim_num_refs
-pcstat il1.misses
-pcstat bpred_bimod.misses
- execution profile
- reference profile
- L1 I-cache miss profile (sim-cache)
- br pred miss profile (sim-outorder)
Page 25
example output:
executed
13 times
never
executed
{
{
{
00401a10: ( 13,
strtod.c:79
00401a18: ( 13,
strtod.c:87
00401a20:
00401a28:
strtod.c:89
00401a30: ( 13,
00401a38: ( 13,
00401a40: ( 13,
Page 26
Todd M. Austin
Page 27
<name>:<nsets>:<bsize>:<assoc>:<repl>
where:
<name> - cache name (make this unique)
<nsets> - number of sets
<assoc> - associativity (number of ways)
<repl> - set replacement policy
l - for LRU
f - for FIFO
r - for RANDOM
examples:
il1:1024:32:2:l
dtlb:1:4096:64:r
Todd M. Austin
il1
dl1
-cache:il1 il1:128:64:1:l -cache:il2 il2:128:64:4:l
-cache:dl1 dl1:256:32:1:l -cache:dl2 dl2:1024:64:2:l
il2
dl2
to unify any level of the hierarchy, point an I-cache level into the
data cache hierarchy:
il1
dl1
-cache:il1 il1:128:64:1:l -cache:il2 dl2
ul2
Todd M. Austin
Page 29
q
q
extra options:
-refs {inst,data,unified}
-C {fa,sa,dm}
-R {lru, opt}
-a <sets>
-b <sets>
-l <line>
-n <assoc>
-in <interval>
-M <size>
-c <size>
Todd M. Austin
Todd M. Austin
Page 31
Todd M. Austin
Page 32
Todd M. Austin
Page 33
Todd M. Austin
bimod
Page 34
where:
size of the first level table
size of the second level table
history (pattern) width
<l1size>
<l2size>
<hist_size>
pattern
history
l2size
l1size
branch
address
2-bit
predictors
branch
prediction
hist_size
Todd M. Austin
Page 35
Sim-Outorder Pipetraces
produces detailed history of all instructions executed, including:
supported in sim-outorder
use the -ptrace option to generate a pipetrace
q
example usage:
-pcstat FOO.trc :
-pcstat BAR.trc 100:5000
-pcstat UXXE.trc :10000
view with the pipeview.pl Perl script, it displays the pipeline for
each cycle of execution traced:
pipeview.pl <ptrace_file>
Todd M. Austin
Page 36
example output:
new cycle
indicator
@ 610
new inst
definitions
gf = 0x0040d098: addiu
gg = 0x0040d0a0: beq
current
pipeline
state
[IF]
gf
gg
[DA]
gb
gc
gd/
ge
[EX]
fy
fz
ga+
inst being
inst being
inst
fetched, or in decoded, or executing
fetch queue awaiting issue
Todd M. Austin
r2,r4,-1
r3,r5,0x30
[WB]
fr\
fs
ft
fu
[CT]
fq
pipeline event:
(mis-prediction
detected), see output
header for event defs
Page 37
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 38
q
q
63
Todd M. Austin
16-annote
16-opcode
8-ru
48
32
24
8-rt
16
16-imm
8-rs
8-rd
0
Page 39
r0 - 0 source/sink
r1 (32 bits)
Unused
PC
r2
HI
.
.
0x00400000
Text
(code)
LO
r30
FCC
r31
Data
(init)
(bss)
f1
f1
f2
f3
.
.
f30
0x10000000
Stack
f31
f31
0x7fffc000
0x7fffffff
Todd M. Austin
Page 40
SimpleScalar Instructions
Control:
Load/Store:
Integer Arithmetic:
j - jump
jal - jump and link
jr - jump register
jalr - jump and link register
beq - branch == 0
bne - branch != 0
blez - branch <= 0
bgtz - branch > 0
bltz - branch < 0
bgez - branch >= 0
bct - branch FCC TRUE
bcf - branch FCC FALSE
lb - load byte
lbu - load byte unsigned
lh - load half (short)
lhu - load half (short) unsigned
lw - load word
dlw - load double word
l.s - load single-precision FP
l.d - load double-precision FP
sb - store byte
sbu - store byte unsigned
sh - store half (short)
shu - store half (short) unsigned
sw - store word
dsw - store double word
s.s - store single-precision FP
s.d - store double-precision FP
addressing modes:
(C)
(reg + C) (w/ pre/post inc/dec)
(reg + reg) (w/ pre/post inc/dec)
Todd M. Austin
Page 41
SimpleScalar Instructions
Floating Point Arithmetic:
Miscellaneous:
nop - no operation
syscall - system call
break - declare program error
Todd M. Austin
Page 42
q
q
bit annotations:
q
q
field annotations:
q
q
Todd M. Austin
Page 43
Simulator
results out
sys_write(fd, p, 4)
args in
q
q
q
q
Todd M. Austin
Page 44
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 45
Simulator Structure
User
Programs
Prog/Sim
Interface
SimpleScalar ISA
Functional
Core
Performance
Core
Resource
EventQ
Simulator
Core
Loader
Regs
Stats
Cache
Memory
Page 46
Dispatch
Scheduler
Exec
Memory
Scheduler
Mem
Writeback
I-Cache
I-TLB
(IL1)
D-Cache
D-TLB
(DL1)
I-Cache
(IL2)
D-Cache
(DL2)
Commit
Virtual Memory
Page 47
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 48
implemented in ruu_fetch()
models machine fetch bandwidth
inputs:
q
q
q
program counter
predictor state (see bpred.[hc])
mis-prediction detection from branch execution unit(s)
outputs:
Todd M. Austin
Page 49
q
q
q
fetch insts from one I-cache line, block until misses are resolved
queue fetched instructions to Dispatch
probe line predictor for cache line to access in next cycle
Todd M. Austin
Page 50
Dispatch
implemented in ruu_dispatch()
models machine decode, rename, allocate bandwidth
inputs:
q
q
q
q
outputs:
Todd M. Austin
Page 51
Dispatch
q
q
q
q
Todd M. Austin
Page 52
Scheduler
to functional units
Memory
Scheduler
inputs:
RUU, LSQ
outputs:
q
q
Todd M. Austin
Page 53
Scheduler
to functional units
Memory
Scheduler
q
q
Todd M. Austin
Page 54
Exec
Mem
memory requests to D-cache
implemented in ruu_issue()
models func unit and D-cache issue and execute latencies
inputs:
q
q
outputs:
q
q
Todd M. Austin
Page 55
Exec
Mem
memory requests to D-cache
q
q
q
q
Todd M. Austin
Page 56
Writeback
implemented in ruu_writeback()
models writeback bandwidth, detects mis-predictions, initiated misprediction recovery sequence
inputs:
q
q
outputs:
q
q
q
Todd M. Austin
Page 57
Writeback
q
q
Todd M. Austin
Page 58
Commit
implemented in ruu_commit()
models in-order retirement of instructions, store commits to the Dcache, and D-TLB miss handling
inputs:
q
q
outputs:
q
q
Todd M. Austin
Page 59
Commit
Todd M. Austin
Page 60
implemented in sim_main()
walks pipeline from Commit to Fetch
Page 61
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 62
Hackers Guide
source code design philosophy:
q
q
section organization:
q
q
Todd M. Austin
Page 63
Todd M. Austin
Page 64
Todd M. Austin
Page 65
q
q
emit a linkage map (-Map mapfile) and then edit the executable in a
postpass
KLINK, from my dissertation work, does exactly this
Todd M. Austin
Page 66
q
q
Todd M. Austin
Page 67
Simulator Structure
User
Programs
Prog/Sim
Interface
SimpleScalar ISA
Functional
Core
Performance
Core
Resource
EventQ
Simulator
Core
Loader
Regs
Stats
Cache
Memory
Page 68
Machine Definition
a single file describes all aspects of the architecture
q
q
q
DEFINST(ADDI,
0x41,
addi,
t,s,i,
assembly
IntALU,
F_ICOMP|F_IMM,
template
GPR(RT),NA,
GPR(RS),NA,NA
FU reqs
SET_GPR(RT, GPR(RS)+IMM))
output deps
inst flags
input deps
semantics
Todd M. Austin
Page 69
(regs_R[N])
(regs_R[N] = (EXPR))
(mem_read_word((SRC))
switch (SS_OPCODE(inst)) {
#define DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3,EXPR)
case OP:
EXPR;
break;
#define DEFLINK(OP,MSK,NAME,MASK,SHIFT)
case OP:
panic("attempted to execute a linking opcode");
#define CONNECT(OP)
#include "ss.def"
#undef DEFINST
#undef DEFLINK
#undef CONNECT
}
Todd M. Austin
Page 70
\
\
\
\
\
Crafting an Decoder
#define DEP_GPR(N)
(N)
switch (SS_OPCODE(inst)) {
#define DEFINST(OP,MSK,NAME,OPFORM,RES,CLASS,O1,O2,I1,I2,I3,EXPR)
case OP:
out1 = DEP_##O1; out2 = DEP_##O2;
in1 = DEP_##I1; in2 = DEP_##I2; in3 = DEP_##I3;
break;
#define DEFLINK(OP,MSK,NAME,MASK,SHIFT)
case OP:
/* can speculatively decode a bogus inst */
op = NOP;
out1 = NA; out2 = NA;
in1 = NA; in2 = NA; in3 = NA;
break;
#define CONNECT(OP)
#include "ss.def"
#undef DEFINST
#undef DEFLINK
#undef CONNECT
default:
/* can speculatively decode a bogus inst */
op = NOP;
out1 = NA; out2 = NA;
in1 = NA; in2 = NA; in3 = NA;
}
Todd M. Austin
\
\
\
\
\
\
\
\
\
\
Page 71
opt_print_help()
opt_print_options()
opt_reg_header()
opt_reg_note()
Todd M. Austin
Page 72
q
q
stat_print_stats()
q
q
q
Todd M. Austin
Page 73
q
q
q
q
q
q
Todd M. Austin
Page 74
q
q
q
static
BTB w/ 2-bit saturating counters
2-level adaptive
important interfaces:
q
q
q
bpred_create(class, size)
bpred_lookup(pred, br_addr)
bpred_update(pred, br_addr, targ_addr, result)
Todd M. Austin
Page 75
q
q
q
important interfaces:
q
q
q
q
q
Todd M. Austin
Page 76
q
q
important interfaces:
q
q
eventq_queue(when, op...)
eventq_service_events(when)
Todd M. Austin
Page 77
q
q
q
q
important interfaces:
Todd M. Austin
Page 78
q
q
q
q
sim_options(argc, argv)
sim_config(stream)
sim_main()
sim_stats(stream)
Todd M. Austin
Page 79
q
q
important interfaces:
Todd M. Austin
Page 80
q
q
q
q
q
q
q
q
fatal()
panic()
warn()
info()
debug()
getcore()
elapsed_time()
getopt()
Todd M. Austin
Page 81
Todd M. Austin
Page 82
q
q
resource configuration:
{ name, num, { FU_class, issue_lat, op_lat }, ... }
important interfaces:
q
q
Todd M. Austin
Page 83
Tutorial Overview
Computer Architecture Simulation Primer
SimpleScalar Tool Set
q
q
Overview
Users Guide
q
q
Model Microarchitecture
Implementation Details
Hacking SimpleScalar
Looking Ahead
Todd M. Austin
Page 84
Looking Ahead
MP/MT support for SimpleScalar simulators
Linux port to SimpleScalar
Todd M. Austin
Page 85
To Get Plugged In
SimpleScalar public releases available from UW-Madison
Technical Report:
q
q
simplescalar@cs.wisc.edu
contact Doug Burger (dburger@cs.wisc.edu) to join
Todd M. Austin
Page 86
Backups
Todd M. Austin
Page 87
q
q
key insights:
q
q
q
q
q
Todd M. Austin
Page 88
q
q
q
q
q
Todd M. Austin
Page 89
Example Applications
my dissertation: H/W and S/W Mechanisms for Reducing Load
Latency
q
q
q
q
other users:
q
q
q
SCI project
Galileo project
more coming on-line
Todd M. Austin
Page 90
Related Tools
SimOS from Stanford
q
q
q
functional simulators:
q
q
q
Todd M. Austin
Page 91
Opcode Map
ADDI
BNE
...
FADD
Todd M. Austin
Page 92