BEngDSP Notes

1
U H
BEng
School of Engineering & Technology, University of Hertfordshire
Prof. Talib Alukaidey
Digital Signal Processing
2
U H
Table of Contents
Outline of
Digital Signal Processors
Digital vs. Analogue Signal Processing ---------------------------------------------------------------------- Page 3
Why process signals digitally? -------------------------------------------------------------------------------- Page 5
What is Digital Signal Processing? ---------------------------------------------------------------------- Page 6
What are Digital Signal Processors? -------------------------------------------------------------------- Page 7
What are the typical Applications for DSP? ---------------------------------------------------------- Page 9
What do you need to produce a Functional DSP Device? --------------------------------------- Page 15
The Efficiency of the Assemblers & the Goodies of the Simulators ------------------------ Page 20
High Level Languages and Their Advantages ------------------------------------------------------- Page 25
Binary Notation in DSP's ------------------------------------------------------------------------------------ Page 29
Features Of ADSP-2100 Base Architecture ----------------------------------------------------------- Page 41
ADSP-2100 Family Base Internal Architecture ------------------------------------------------------- Page 42
ALU ----------------------------------------------------------------------------------------------------------------- Page 43
MAC ---------------------------------------------------------------------------------------------------------------- Page 55
Shifter -------------------------------------------------------------------------------------------------------------- Page 71
Data Address Generator (DAG) Operations ---------------------------------------------------------- Page 83
Program Sequencer Operations -------------------------------------------------------------------------- Page 93
ADSP-2100 Family Peripherals --------------------------------------------------------------------------- Page 102
The Base Architecture of Floating-Point DSP Processor ---------------------------------------- Page 128
The System Architecture ------------------------------------------------------------------------------------ Page 133
The Complete Architecture --------------------------------------------------------------------------------- Page 134
What is a Real Time Application? ------------------------------------------------------------------------ Page 135
Real Time Operating Systems as an Ideal Environment for Embedded Applications -- Page 136
Compression Techniques and a Compressor and De-Compressor Generator ---------- Page 140
Performance Measures ------------------------------------------------------------------------------------------ Page 145
Data Flow Bottle-necks & Solutions; Pipeline & Parallel Architectures With Examples --- Page 147
High Performance System Classification Scheme ------------------------------------------------- Page 163
SIMD Matrix Multiplication & SIMD FFT ---------------------------------------------------------------- Page 166
How To Design SIMD DSP System From The Off-Shelf Fixed-Point DS Processors? ----------- Page 167
Multiprocessing With The SHARC ------------------------------------------------------------------------ Page 171
VLIW Compiler and the DSP Super Computer Architecture Goes Hand in Hand -------- Page 213
3
U H
Digital vs. Analogue Signal Processing
Digital vs. Analogue Signal Processing
Y(f)
X(f)
LP BP HP
f
x(t)
R
y(t)
C
x(t)
R
y(t)
C
L
x(t)
R
y(t)
C
Simple
Filters
Y
HP
Y
BP
Y
LP
t
t
x(t)
Data with a
broad range
of spectral
content
Filters are typically used to pick out signals of interest from noise, by making use of their differing frequency
characteristics.
Filters can be designed analogue components or digital components. The following figure shows simple
analogue filters:

4
U H
DIGITAL
S/H A/D
PROCE-
D/A
f
s
NOISY SIGNAL CLEAN
Discrete
Time
Value
Analogue
Discrete
Filter
Processing
SSOR
Signal
SIGNAL
The following figure shows the required components for Digital filters:
Analogue
Signal
5
U H
Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton
0
10
20
30
40
50
60
70
80
90
Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton
Analogue
Digital
Why process signals digitally?
6
U H
For reasons of simplicity and flexibility
associated with the binary nature of the
electronics, processing of signals is most
conveniently done digitally and it is this
major area of electronics, information
technology and control engineering known
as Digital Signal Processing.
What is Digital Signal Processing?
Digital Signal Processings are Numerical Techniques To
Extract Information From Discrete Time, Discrete Valued
Signals.
7
U H
The rapid advances being made in the field of digital component technology
are having profound effects on all aspects of digital systems design.
Nowhere are these effects being felt more strongly than in the design of high
performance systems for such applications as digital signal processing.

This part of the DSP
2
course brings together a wide variety of logical
concepts that impact the design of such systems which acknowledge and
take advantage of modern component technology.

The Digital Signal Processors may be interpreted as:
1- The design of VLSI components intended for use in digital signal
processing applications, &
2- The design of digital signal processing systems that utilise VLSI
components.
What are Digital Signal Processors?
8
U H
2
< 300 s
1
< 300 s
0
< 300 s
S A M P E L S
F I
ai xn
i
N
( )* ( )
=
1
1
< 300
R
yn ai xn
i
N
( ) ( )* ( ) =
=
1
1
2 1 0
SPEECH RECOG.
<1/2 sec
Spoken Words
Recognised Word
2 1 0
Real Images
IMAGE PROCESSING
Recognised
Image
< Frame Up-Date Period (30 HZ)
Problems: Flicker` Smearing, Missing info., etc.
s
The Digital Signal Processors are algorithm-cruncher processors.
9
U H
What are the typical Applications for DSP?

Communications
Echo Cancellation
Scrambler-Descrambler
etc.
Radar
Imaging
Speech
Control
Geology
Medical
and more and more

10
U H
SPEECH
Among The Applications of DSP to Speech are:
. VOCODERS . SYNTHESIS . ANALYSIS . RECOGNITION

One of the Largest Applications is in Voice Synthesis:
Impulse
Train
Generator
Random
Number
Generator
Pitch Period
Digital Filter Coefficients
(Vocal Tract Parameters)
X
Amplitude
Speech
Samples
Time-Varying
DigitalFilter

11
U H
CONTROL
Control Systems are Finding Applications for DSP
. Lead/Lag Compensators . Transducer Linearisation . Large
Multivariate Systems

For Example: Feedback Control
Digital
Command
D/A
Dynamic
System
A/D
Feedback
Digital
Filter
.
12
U H
COMMUNICATION
Communications Applications of DSP Include:
. PCM Generation . Tone Detection . Adaptive Echo
Cancellers . SSB Generation

For Example: SSB Via Hilbert Filters
X(t)
A/D
Delay
Hilbert
Filter
SIN
COS
Y(n)
X(f)
f
Y(f)
f
13
U H
IMAGING
Image Processing Applications Include:
. Deblurring . Data Compression . Scene Analysis . 3-D
Reconstruction

For Example: A Moving Camera Blurs a Picture and can
be Modelled as a Low Pass Filter. Deblurring Requires the
Inverse Linear Operation
Scene
Moving
Camera
Picture
2-D
Filter
Inverse
Point Spread
Function
Deblurred
14
U H
MEDICAL
DSP is Finding New Applications in the Medical Field:
. Patient Monitoring . Tomography . Blood Flow Velocimeters
. EKG Pattern Analysis . XRAY Enhancement

For Example: Micro Based Monitor
Commercial
Fetal
Monitor
MUX
S/H
A/D
Micro
Display
Data
Recoder
DSP
15
U H
What do you need to produce a
Functional DSP device?

Answer: HARDWARE & SOFTWARE

Real Time DSP applications require
choices In both Hardware &
Software to produce a functional
device

16
U H
APPLICATION
HARDWARE
SOFTWARE
ARRAY
PROCESSOR
MICRO-
PROCESSOR
DSP
CHIP
SPECIAL
DEVICE
HIGH
LEVEL
ASSEMBLY
CODE
MICRO
CODE
FUNCTIONAL
DEVICE
ADVANCED CAD TOOLS
17
U H
Design Capture: Draw and Specify
Translator
Analog Devices Design Implementation
GENERATOR
CODE
18
U H
Library of DSP Primitive
Functions
A
B
EQ
2
3
EQ?
1
1
IN
EXT_IN
IN?
G
1
GP
IN GP
1 2
AEXP?
AEXPAND
1 2
ACOMP?
ACOMPRES
FIR
LMS
E
AFIR?
3 Xn
3 Dn
Yn 1
2
NOISE?
NOISE
1
Z
DELAY1
1 2
-1
DELAY?
Z
DELAY2
1 2
-2
DELAY?
Z
DELAYN
1 2
-n
DELAY?
2
MULT
MULT?
GDFT2
DFT
1
2
3
4
2
MINUS
MINUS?
+
-
AMP
AMP?
2 1
19
U H
Proportional Integral Derivative (PID)
Compensation Filter
U t K e t K
de t
dt
K e t dt
p
d
i
( ) ( )
( )
( ) = - + - + -
}
2
MINUS
MINUS1
+
-
AMP
AMP1
2
1
SOURCE
1
PROFILE GEN
zcne=tzl
rate=10000.0
trigger=cp0
AMP
AMP2
2 1
AMP
AMP3
2 1
AMP
AMP4
2 1
INT Z
1 2
INT1
}
DFF_LD1
1 2
DFF1
d
dt
SUM3
2
4
SUM1
1
3
PAR_IN
1
ENCODER
OUT1
SER OUT1
des=port2
1
3
1
gdn=1.0
gdn=.7
gdn=.5
gdn=1.19
20
U H
The Efficiency of the Assemblers & The
Goodies of the Simulators
The Assembler Tools are:
Assembler
and Linker
HD
Simulator
HD
HD
Splitter
Prom
21
U H
The Assembler
The Assembler translates source code, written with an
algebraic syntax, into object code. Variables, data buffers,
and symbolic constants are defined with the Assembler
directives.
LCNTR=r15, Do end_bfly until LCE;

f8=f1*f6, f14=f11-f14, dm(i2,m0)=f10, f9=pm(i11,m8);

f11=f1*f7, f3=f9+f14, f9=f9-f14, dm(i2,m0)=f13, f7=pm(i8,m8);

f14=f0*f6, f13=f8+f12, f8=dm(i0,m0), pm(i10,m10)=f9;

end_bfly: f12=f0*f7, f13=f8+f12, f10=f8-f12, f6=dm(i0,m0), pm(i10,m10)=f3;
FFT Butterfly Core Example
22
U H
Due to the following characteristics, a high efficient code
could be achieved if an assembler is used:

Dedicated Purpose

Assembler is Hardware Slave

Moderate Data Size

Instruction Mnemonics, Address Labels

Simple Arithmetic Operations

High Speed

Moderate Ease Writing and Development

Moderate Ease of Documentation
23
U H
START
Burn PROMs
Prototype Test
END
(System Builder)
Define Target Hardware
Assemble Module
Link
SIMULATE EMULATE
PROM Splitter
.obj .cde .int
.sys
.dsp
.ach
.exe .exe

C
R
O
S
S
-
S
O
F
T
W
A
R
E

-
P
R
O
G
R
A
M
S
Repeat as necessary
Repeat as necessary
DSP Processor Development Cycle
24
U H
Performs interactive, instruction-level simulation of the DSP
processor code within the hardware configuration

Simulates interrupt and I/O handling,

Flags illegal operations

Supports full symbolic assembly and disassembly

Displays the internal operations and status of the processor

Provides an easy-to-use, window oriented, and graphical user
interface with commands accessed from pull-down menus
with a mouse
The Simulator
25
U H
High Level Languages and Their
Advantages

High-Level Languages are:
C
Compiler
HD
C++
HD
DSP/C
HD
Compiler
(Numerical C)
ADA
HD
26
U H
Compiles with ANSI Specification

Incorporates Optimizing Algorithms to Speed Up the
Execution of Code

They Include an Extensive Runtime Library with
Typical 100 Standard and DSP-Specific Functions

Outputs DSP Processor Assembly Language
Source Code
C Compiler and Runtime Library
27
U H
Supports ANSI Standard (X3J11.1) Numerical C as
Defined by the Numeric C Extensions Group (NECG)

Accepts C Source Input Containing Numerical C
Extensions for:
Array Selection
Vector Math Operations
Complex Data Types
Circular Pointers
Variably Dimensioned Arrays

Outputs DSP Processor Assembly Language Source
Code

DSP/C Compiler

28
U H
DSP HLLs Advantages are:

Hardware Transparent (Portability)

High Level Arithmetic Operations (Complex Math) or
Use Library Routines e.g. sin(), fir(), fft()

Loops, Arrays, Labels, I/O Format

Searching and Sorting

Peripheral Intensive System

Relatively Fast Writing & Development

Ease of Documentation
29
U H
Binary Notation in DSP's
The ADSP-2100 Family of DSP's are fixed point processors that
perform operations using a two's complement binary notation.
Therefore, to efficiently program a DSP it is important to understand
the following concepts:

1) Signed / Unsigned formats

2) Fractional / Integer formats

3) Ranges of Fractional Numbers

4) Hex to Decimal Conversions

5) Decimal to Hex Conversions
30
U H
Binary - Hexadecimal - Decimal Number
Conversion Table
Decimal

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Hexadecimal

0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
Binary

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
31
U H
Signed / Unsigned
UnSigned
Signed
0000 0V - FULL SCALE

FFFF 5V + FULL SCALE
8000 -5V - FULL SCALE

0000 0V

7FFF 5V + FULL SCALE
S/U U U U U U U U U U U U U U U U
32
U H
2's Compliment Representation
For 2's complement representation, the scale factor for the sign bit of a number
is seen as -(2)^(M-1) where M is the number of bits left of the binary point. For
a 4.2 number, the sign scale is (-2)^3.
Example:
0101.01 = 0 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)
= 5.25
= - 2.75
1101.01 = 1 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)
Binary Point
-2
-1
3
0 1 2
-(2 ) 2 2 2 2 2
Sign Bit
33
U H
Fractional versus Integer Notation
S F F F F F F F F F F F F F F F

S I I I I I I I I I I I I I I I

radix point
radix point
Integer format is 16.0 notation
34
U H
DSP is optimized for fractional
notation

DSP supports integer notation
35
U H
Ranges for 16 bit Formats
Largest Positive
Value (0x7FFF)
In Decimal

0.999969482421875
1.999938964843750
3.999877929687500
7.999755859375000
15.999511718750000
31.999023437500000
63.998046875000000
127.996093750000000
255.992187500000000
511.984375000000000
1023.968750000000000
2047.937500000000000
4095.875000000000000
8191.750000000000000
16383.500000000000000
32767.000000000000000
Largest Negative
Value (0x8000)
In Decimal

1.0
2.0
4.0
8.0
16.0
32.0
64.0
128.0
256.0
512.0
1024.0
2048.0
4096.0
8192.0
16384.0
32768.0
Value of 1 LSB
(0x0001)
In Decimal

0.000030517578125
0.000061035156250
0.000122070312500
0.000244140625000
0.000488281250000
0.000976562500000
0.001953125000000
0.003906250000000
0.007812500000000
0.015625000000000
0.031250000000000
0.062500000000000
0.125000000000000
0.250000000000000
0.500000000000000
1.000000000000000
FORMAT

1.15
2.14
3.13
4.12
5.11
6.10
7.9
8.8
9.7
10.6
11.5
12.4
13.3
14.2
15.1
16.0
Fractional
Integer
36
U H
Format Example
+5 V
-5 V
0 V
0x7FFF
0x0000
0x8000
FORMAT
1
2
3
4
5
0x7FFF

0x3FFF

0x0000

0xCCCD

0x8000
1)

2)

3)

4)

5)
16.0 1.15
5 V

2.5 V

0 V

-2.0 V

-5.0 V
= 32767 ->

= 16383 ->

= 0 ->

= -13107 ->

= -32768 ->
0.999969482... ->

0.499969482... ->

0.0000000... ->

-0.399993986... ->

-1.0000000.... ->
5 V

2.5 V

0 V

-2.0 V

-5.0 V
37
U H
There are two methods for converting Hexadecimal Numbers to Decimal
Numbers. One is easy and one is hard.

HARD WAY : Convert the hexadecimal number to binary. Place the binary
point. Multiply each bit of the binary number by its associated scale factor.

Example: Convert 0x2A00 to a 1.15 twos-complement decimal value

0x2A00 = 0.010 1010 0000 0000
= 2^-2 + 2^-4 + 2^-6
= 0.25 + 0.0625 + 0.015625
= 0.328125 = 0.33 = 1/3

EASY WAY : Use a calculator to convert the hexadecimal number to decimal.
Divide the decimal number by 2^N where N is the number of bits to the right
of the binary point.
Example: Convert 0x2A00 to a 1.15 twos-complement decimal value
0x2A00 <=> 10752 / 2^15 = 10752 / 32768 = 0.328125
Hexadecimal to Decimal Conversion
38
U H
There are two methods for converting Decimal Numbers to
Hexadecimal numbers. One is easy, and one is hard.
HARD WAY: Break the decimal number into its 2^N components.

Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format
0.8125 =>
2
-2
2
-1
2
0
2
-3
2
-4
2
-5
2
-6
2
-7
1 1/2 1/4 1/8 1/16 1/64 1/32 1/128
0 0 0 0 1 0 1 1
=> 0x6800
EASY WAY: Multiply the decimal number by 2^N where N is the number of
bits to the right of the binary point. Then use a calculator to convert to hex.

Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format
0.8125 * 2^15 = 0.8125 * 32768 = 26624 <=> 0x6800
Decimal to Hexadecimal Conversion
39
U H
Binary Notation Mini-Quiz

Mini-Quiz

1) What is 0x4000 (1.15 format) in signed decimal notation?

2) What is 0x4000 (16.0 format) in signed decimal notation?

3) What is 0x4000 (0.16 format) in unsigned decimal notation?

4) What is .875 in hex 1.15 Format?

5) What is -.875 in hex 1.15 Format?
40
U H
Binary Notation Mini-Quiz Answer
1) What is 0x4000 in 1.15 signed notation? 0.5

2) What is 0x4000 in 16.0 signed notation? 16384

3) What is 0x4000 in 0.16 unsigned notation? 0.25

4) What is .875 in 1.15 Format? 0x7000

5) What is -.875 in 1.15 Format? 0x9000
41
U H
Features Of ADSP-2100 Base Architecture
Modified Harvard Architecture
2 Data Address Generators
Advanced Program Sequencer
3 Arithmetic Units (ALU/MAC/Shifter)
Result Bus
42
U H
ADSP-2100 Family Base Internal Architecture
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS
16
DMD BUS
PMD BUS
Data
Address
Generator
#2
Data
Address
Generator
#1
DMA BUS
PMA BUS
14
14
24
16
Program
Sequencer
43
U H
ALU
44
U H
ALU Block Diagram
X Y

ALU

R
AZ
AN
AC
AV
AS
AQ
CI
MUX
AR
REGISTER
16
AF
REGISTER
AX
REGISTERS
2 x 16
16 16
16
16
24
16
PMD BUS
DMD BUS 16
R - BUS
MUX
MUX
MUX
AY
REGISTERS
2 x 16
45
U H
ALU Features
4 Input Registers ( AX0, AX1, AY0, AY1 )
Feedback Paths ( AF, AR, MR0, MR1, MR2, SR0, SR1 )
Six Status Flags
Saturation
Provisions For Double Precision
Background Registers
46
U H
ALU Instruction Examples
(Programmer's Quick Reference pgs 4-5)

AR = AX0 + AY0;
AF = MR1 XOR AY1;
AR = AX0 + AF;
IF GE AR = -AR;
IF AV AR = AY1 + 1;
47
U H
ALU Instructions
[IF Condition] dest = xop + yop ;
[IF Condition] dest = xop + C ;
[IF Condition] dest = xop + yop + C ;
[IF Condition] dest = xop - yop ;
[IF Condition] dest = xop - yop + C - 1 ;
[IF Condition] dest = yop - xop ;
[IF Condition] dest = yop - xop + C - 1;

[IF Condition] dest = xop AND yop;
[IF Condition] dest = xop OR yop;
[IF Condition] dest = xop XOR yop;

[IF Condition] dest = PASS xop ;
[IF Condition] dest = PASS yop ;
[IF Condition] dest = PASS 0;
[IF Condition] dest = PASS 1;
48
U H
ALU Instructions
[IF Condition] dest = - xop ;
[IF Condition] dest = - yop ;
[IF Condition] dest = NOT xop ;
[IF Condition] dest = NOT yop ;
[IF Condition] dest = ABS xop ;
[IF Condition] dest = yop +/-1 ;

DIVS yop , xop ;
DIVQ xop ;

XOP = [AR, MR0, MR1, MR2, SR0, SR1, AX0, AX1]
YOP = [AY0, AY1, AF]
dest = [AR, AF]

Examples: AR = AX0 + AY0;
AF = NOT AR;
AF = AX1 + AY0 + C;
49
U H
ALU Status Flags
Flag Name Definition

AZ Zero Logical NOR of all bits in ALU result reg. True if
ALU output equals 0

AN Negative Sign bit of ALU result. True if ALU output negative

AV Overflow X-OR of carry outputs of 2 most significant adder
stages. True if ALU overflows

AC Carry Carry output from most significant adder stage

AS Sign Sign of ALU X input port. Affected only by ABS
instruction

AQ Quotient Quotient bit generated only by DIVS and DIVQ

50
U H
Arithmetic Conditions
ALU Overflow Bit Set
ALU Carry Bit Set
EQ: ALU result = 0
NE: ALU Result = 0
GT: ALU Result > 0
GE: ALU Result > 0
LT: ALU Result < 0
LE: ALU Result s 0
NEG: XOP Input Negative
POS: XOP Input Positive

AV:
Not AV:

AC:
Not AC:

MV:
Not MV:

Not CE: Not Counter Expired
Absolute Value Instruction Only
MAC Overflow Bit
>
>
>
>
51
U H
ALU Saturation
Sets ALU result to full scale positive or full scale negative if overflow or
underflow occurs
Feature enabled by executing ena ar_sat (bit 3 of MSTAT)
Once enabled, affects every ALU operation
Only affects results sent to AR (AF - flags still get set)
Overflow or underflow determined by the following conditions
Overflow (AV) Carry (AC) AR Contents
0 0 ALU Output
0 1 ALU Output
1 0 0x7FFF
full-scale positive
1 1 0x8000
full-scale negative

ALU Overflow Latch Mode
Causes AV status flag to become sticky. Need to explicitly clear.
Feature enable by executing ena av_latch (bit 2 of MSTAT)
52
U H
ALU Mini-Quiz
Write The ADSP-2100 Code To Perform The Following Operations:

1) Add 0x0030 to 0x0070 And Store Result in AF.

Hint:
= 0x0070 ;
= 0x0030 ;
AF = + ;

2) Find The Logical AND Of 0x1234 And 0xF00F.
Store The Result In AR.
53
U H
ALU Mini-Quiz
Write The ADSP-2100 Code To Perform The Following Operations:

1) Add 0x0030 to 0x0070 And Store Result in AF.

Hint:
AX0 (or AX1) = 0x0070 ;
AY0 (or AY1) = 0x0030 ;
AF = AX0 + AY0 ;

2) Find The Logical AND Of 0x1234 And 0xF00F.
Store The Result In AR.
AY1 = 0x1234;
AR = 0xF00F;
AR = AR AND AY1;
54
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS
16
DMD BUS
PMD BUS
Data
Address
Generator
#2
Data
Address
Generator
#1
DMA BUS
PMA BUS
14
14
24
16
Program
Sequencer
55
U H
MAC
56
U H
MAC Block Diagram
MF
REGISTER
MY
REGISTERS
2 x 16
24
16
16
X Y
MULTIPLIER
P
MX
REGISTERS
2 x 16
16
16
32
16
MR1
REGISTER
MR2
REGISTER
MR0
REGISTER
16 8
M
U
X
R0 R1 R2
40
MV
16
PMD BUS
DMD BUS
16
R - BUS
ADD / SUBTRACT
MUX
MUX MUX
MUX MUX MUX
57
U H
MAC Features
40 Bit Accumulator
Saturation
Complete Set of Background Registers
Mixed Mode Input Operands for Multiprecision
Feedback Paths
Access to R-Bus, DM and PM
58
U H
MAC Instruction Examples

MR = MX1 * MY0 (SS);
MF = AR * MY1 (SS);
MR = MR + AR * MY1 (SS);
MR = 0;
IF MV SAT MR;
IF EQ MR = MX0 * MY0 (UU);
59
U H
MAC Instructions
[IF condition] dest = xop * yop (format);
[IF condition] dest = MR + xop * yop (format);
[IF condition] dest = MR - xop * yop (format);
[IF condition] dest = 0;
[IF condition] dest = MR [ (RND)];

Where:
condition = arithmetic conditions
dest = {MR, MF}
format = {SS, US, SU, UU, RND}
XOP = {MX0, MX1, MR2, MR1, MR0, AR, SR0, SR1}
YOP = {MY0, MY1, MF}
60
U H
Placement of Binary Point in Multiplication
Binary Integer Multiplication
M Bits
P Bits x
M+P Bits
Example: 16.0 x 16.0 => 32.0
Mixed/Fractional Multiplication
M.N Bits
P.Q Bits x
(M+P).(N+Q) Bits
Example: 1.15 x 1.15 => 2.30
4.12 x 1.15 => 5.27

61
U H
Multiplication Modes on the ADSP-21xx
Multiplier Assumes all numbers in a 1.15 Format

Multiplier Automatically 1-bit Left Shifts Product
Before Accumulation (Result Forced to 1.31 Format)

Example: MR = MX0 * MY1 (SS);
Mode 1: Fractional Mode
0x4000 0x4000
MX0 MY1
MR0 MR1 MR2
MR1
0x00 2000 0000
0x2000 underflow
overflow
62
U H
Multiplication Modes on the ADSP-21xx
Multiplier Assumes all numbers in a 16.0 Format

No automatic left-shift necessary

Example: MR = MX0 * MY1 (SS);
Mode 2: Integer Mode
0x4000 0x4000
MX0 MY1
MR0 MR1 MR2
0x00 1000 0000
0x0000
overflow
MR0
overflow
63
U H
Multiplication on the ADSP-21xx
To Switch Modes: ENA M_MODE; {Select Integer Mode} *
DIS M_MODE; {Select Fractional Mode}

MSTAT Register holds value

Fractional Mode the Default on Reset/Power-up

* Integer Mode Not Available on ADSP-2100A
64
U H
Rounding in the MAC
Rounding can be specified as part of multiply instruction (RND)
Rounding only applies to fixed point fractional results
40-bit results "rounded to nearest" 16 bit value.
Rounded result can be placed in MR1 or MF register

Input: MX0 = 0x7FF9, MY0 = 0xEEEE

Command MR2 MR1 MR0
MR = MX0 * MY0 (SS); FF EEEE EEFC
MR = MX0 * MY0 (RND); FF EEEF 6EFC
65
U H
Saturation and Overflow
Overflow occurs when sign bit is corrupted during accumulation
Overflow Status signal (MV) is updated every time a MAC operation is
executed
MV is set when MSB of MR2 does not equal MSB of MR1
Saturation is performed by following instruction:
IF MV SAT MR

Input: MX0 = 0x7FFF, MY0 = 0x7FFF, MR = 00 7FFE 0002
Command MR2 MR1 MR0
MR = MR + MX0 * MY0 (SS); 00 FFFC 0004
IF MV SAT MR; 00 7FFF FFFF
66
U H
MAC Mini-Quiz
Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply
the result by 0x20.
AX0 = 0x0020;
AY0 = 0x0010;
AR = _______________
___ = _______________
____=_______ * _________
67
U H
Binary Multiply Mini-Quiz
Fractional Mode Integer Mode
0x1240 * 0x0001

0x4000 * 0x4000

0x4000 * 0x0002
What is the ADSP-21xx Multiplier Output?
(Hint: The Output is 32 Bits Wide)
68
U H
MAC Mini-Quiz
Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply
the result by 0x20.
AX0 = 0x0020;
AY0 = 0x0010;
AR = AX0 + AY0;
MY0 = 0x20;
MR = AR * MY0 (SS);
69
U H
Binary Multiply Mini-Quiz
Fractional Mode Integer Mode
0x1240 * 0x0001

0x4000 * 0x4000

0x4000 * 0x0002
What is the ADSP-21xx Multiplier Output?
(Hint: The Output is 32 Bits Wide)
0x0000 2480 0x0000 1240
0x2000 0000
0x0001 0000
0x1000 0000
0x0000 8000
70
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS
16
DMD BUS
PMD BUS
Data
Address
Generator
#2
Data
Address
Generator
#1
DMA BUS
PMA BUS
14
14
24
16
Program
Sequencer
71
U H
Shifter
72
U H
Shifter Block Diagram
16
32
SR1
REGISTER
SR0
REGISTER
16
SI
REGISTER
MUX
SE
REGISTER
NEGATE
MUX
EXPONENT
DETECTOR

SHIFTER
ARRAY
I
C
O
OR / PASS
MUX
8
32
16
16 16
From
Instruction
16
8
MUX
DMD BUS
R - BUS
BLOCK
EXPONENT
LOGIC
MUX
MUX
16
73
U H
Shifter Features
16 Bit Input Value Gets Stored Anywhere in a 32 Bit Output Field
All Shift Instructions Execute in a Single Instruction Cycle
Specify Immediate Shift Value within Instruction or indirectly in
the SE register
Normalize, Denormalize, and Exponent Detect Instructions Used
For Block Floating Point and Floating Point Operations
74
U H
Shifter Instruction Examples

SR = ASHIFT SI BY -3 (LO);
SR = LSHIFT AR BY 6 (HI);
SR = SR OR LSHIFT SR1 (LO);
75
U H
Shifter Instructions
Shift Immediate Instructions
SR = [SR OR] ASHIFT xop BY <data> (alignment);
SR = [SR OR] LSHIFT xop BY <data> (alignment);

Shift By Value in SE Register
[IF condition] SR = [SR OR] ASHIFT xop (alignment);
[IF condition] SR = [SR OR] LSHIFT xop (alignment);
Where:
condition = Arithmetic Condition
xop = {SI, SR0, SR1, MR2, MR1, MR0, AR}
alignment = {HI, LO}
data = -32 ... 32
Arithmetic Shift Sign Extends Right Shifts
Logical Shift Zero fills Right Shifts
Left Shifts Are Always Zero Filled
Positive SE or <data> Values Shift Left
Negative SE or <data> Values Shift Right
NO "+" for Positive Shifts
76
U H
Using the Shift Instructions
Placement of Output Depends on HI/LO Modifier, SE Register and <data> Value
Refer to Table 2.4 In ADSP-21xx Users Manual
Example 1: SR = LSHIFT SI BY -12 (LO);
1110 1010 0011 0101 SI
Before:
xxxx xxxx SE
xxxx xxxx xxxx xxxx
SR0
SR1
xxxx xxxx xxxx xxxx
SI
After:
xxxx xxxx SE
0000 0000 0000 1110
SR0 SR1
0000 0000 0000 0000
1110 1010 0011 0101
77
U H
Immediate Shift Instructions
Example 2: SR = LSHIFT SI BY -12 (HI);
1110 1010 0011 0101 SI
Before:
xxxx xxxx SE
xxxx xxxx xxxx xxxx
SR0
SR1
xxxx xxxx xxxx xxxx
SI
After:
xxxx xxxx SE
1010 0011 0101 0000
SR0 SR1
0000 0000 0000 1110
1110 1010 0011 0101
78
U H
Shift Instructions with SE Register
Example 3: SR = LSHIFT SI (HI);
1110 1010 0011 0101 SI
Before:
1111 0100 (-12) SE
SR0 SR1
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
1110 1010 0011 0101 SI
After:
SE
SR0
SR1
1010 0011 0101 0000 0000 0000 0000 1110
1111 0100 (-12)
79
U H
Shift Instructions with OR Functionality
Example 4: SR = SR OR LSHIFT SI (HI);
1110 1010 0011 0101 SI
Before:
1111 0100 (-12) SE
SR0 SR1
0000 0000 0000 0000 0000 0000 0000 0101
1110 1010 0011 0101 SI
After:
SE
SR0
SR1
1010 0011 0101 0101 0000 0000 0000 1110
1111 0100 (-12)
80
U H
Shifter Mini-Quiz
Write ADSP-2101 Code to:

Write 0x0034 into the AR register
Write 0x0012 into the SI register
Shift AR into the MS bits of SR0 (SR0 = 0x3400)
Shift SI into the LS bits of SR0

Hint: 4 Instructions SR1 = 0x0000, SR0 = 0x3412 When Done
81
U H
Shifter Mini-Quiz Answers
Solution 1:
AR = 0x0034;
SI = 0x0012;
SR = ASHIFT AR BY 8 (LO);
SR = SR OR ASHIFT SI BY 0 (LO);

Solution 2:
AR = 0x0034;
SI = 0x0012;
SR = LSHIFT AR BY -8 (HI);
SR = SR OR ASHIFT SI BY -16 (HI);
82
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS
16
DMD BUS
PMD BUS
Data
Address
Generator
#2
Data
Address
Generator
#1
DMA BUS
PMA BUS
14
14
24
16
Program
Sequencer
83
U H
Data Address Generator (DAG) Operations
Registered Indirect Addressing

Automatic Post-Modify of Address

Circular Buffering

DAG 1 Fetches/Stores to Data Memory

DAG 2 Fetched/Stores to Data or Program Memory

Bit-Reverser For FFT Support (DAG 1 Only)
84
U H
Data Address Generator Block Diagram
L
REGISTERS
4 x 14
MUX
ADDRESS
DMD BUS
FROM
INSTRUCTION
ADD
I
REGISTERS
4 x 14
M
REGISTERS
4 x 14
MODULUS
LOGIC
BIT
REVERSE
14 2 14 14 14
14
DAG1 ONLY
FROM
INSTRUCTION
2
85
U H
DAG Features
Data Fetch/Store Execute Simultaneous With Arithmetic
Instruction

2 DAGS In Processor

4 Index Address Registers Per DAG

4 Modify Registers Per DAG

4 Length Registers Per DAG

Any Modifier Register in DAG can be Used With Any
Index Register in DAG
86
U H
Example DAG Instructions
( Programmer's Quick Reference pgs10)
AX0 = DM(0X3800);

AX0 = DM(I0, M3);

MODIFY (I4, M5);

AX1 = DM(I2,M3), AY0 = PM(I4,M7);

MR=MR+MX0 * MY0 (SS), MX0 = DM(I2,M2), MY0 = PM(I6,M6);

Note: L Registers Must Be 0 If Circular Buffers Are Not Used
87
U H
DAG Instructions
Data Memory
dreg = DM(ix, mx);
DM(ix,mx) = dreg;
DM(ix,mx) = <data>;
DM(<address>) = dreg;
dreg = DM(<address>);
Program Memory
dreg = PM(ipx, mpx);
PM(ipx,mpx) = dreg;
Load / Store Instructions
Where:
ix = {I0, I1, I2, I3}
mx = {M0, M1, M2, M3}
ix = {I4, I5, I6, I7}
mx = {M4, M5, M6, M7}
OR
ipx = {I4, I5, I6, I7}
mpx = {M4, M5, M6, M7}
dreg = {AX0, AX1, AY0, AY1, MX0, MX1, MY0, MY1,
AR, MR0, MR1, MR2, SI, SE, SR0, SR1}
DAG Operations Can Be Combined with ALU, MAC or Shifter Instructions

Length Registers Must Be Initialized. (Set to zero if not used)

All DAG Operations Execute in a Single Instruction Cycle
88
U H
Modulo Addressing Example
H#0030
H#0037
I0
I0 = Current Address
M0 = Modify Value (3)
Base Address = H#0030
L0 = Buffer Length (8)
M L
Address Sequence
30
33
36
31
34
37
32
35
89
U H
Modulo Addressing Code Example
.VAR/DM/CIRC/ABS=0X30
I0 = ^Buff;
L0 = %Buff;
M0 = 3;
AX0 = DM (I0, M0);
AY0 = DM (I0, M0);
AX1 = DM (I0, M0);
AY1 = DM (I0, M0);
Buff [8]; /*Define Buffer */
/*I0 = Start address of Buff */
/*L0 = Length of Buff */
/*Modify value = 3 */
/*Fetch data at address 30 */
90
U H
Bit Reversal with the ADSP-2100 Family
Only available with DAG1

Enabled by setting bit 1 of MSTAT register or using the instruction
ENA BIT_REV

Reverses all 14 bits of address

normal order: 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Bit-reversed: 00 01 02 03 04 05 06 07 08 09 10 11 12 13

For an FFT of size 2^N, set M register to 2*2^(14-N)*
* x2 because FFT output has real and imaginary data interleaved
i.e. 256 FFT = 2^8 FFT, M = 2*2^(14-8) = 2*2^6 = 128
91
U H
DAGS Mini-Quiz
0x1234
0x1234
0x1234
0x1234
0x1234
Data Memory
DM(0x3800)
Write the ADSP-2101 Instructions
to Find the Sum of the N=5 Numbers
Stored in Data Memory
Hint:

Use Multifunction Instructions
Nine Instructions Total
3 Instructions are Repeated
Questions:

1) How Many Instructions Cycles Are
Required?

2) How Many Instruction Cycles are
Required if N=100?

3) Is this an Efficient Use of the Processor?
92
U H
DAGS Mini-Quiz Answer
.module/boot = 0 dags_mini_quiz;
.var/dm/circ data_buf [5];

start:
i0 = ^data_buf; /*Load DAG Registers */
l0 = % data_buf;
m3 = 1;
ar = dm (I0, m3); /*Load first data value */
ay0 = dm (I0, m3); /*Load second data value */
ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load third value */
ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fourth value */
ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fifth value */
ar = ar + ay0; /*Last addition */

.endmod;
1) 9 Cycles
2) 104 Cycles
3) No, it would waste program memory
93
U H
Program Sequencer Block Diagram
INTERRUPT
CONTROLLER
CONDITION
LOGIC
LOOP STACK
4 X 18
DMD BUS
16
NEXT
ADDRESS
SOURCE
SELECT
INCREMENT
PROGRAM
COUNTER
NEXT ADDRESS MUX
PC STACK
16 X 14
PMA BUS
14
MUX
From INSTRUCTION REGISTER
LOOP
COMPARATOR
18
14
14
2
IRQ
4
4
14
16
14
COUNTER
LOGIC
STATUS
LOGIC
CE
94
U H
Program Sequencer Operations
Zero Overhead Looping

Conditional/Unconditional Branches

Interrupt Handling

Counter and Status Stacks

Next Instruction Address Generation
Program Sequencer Features
Automatic Operation, Transparent to User

Single Cycle Conditional Branches

4-Deep Loop, Counter Stack

16-Deep PC Stack
95
U H
Sequencer Instructions
( Programmer's Quick Reference pgs 12)
[ IF condition] JUMP <dest>;

[ IF condition] CALL <dest>;

[ IF condition] RTS;

[ IF condition] RTI;

IF <flag_condition> CALL <address>;

IF <flag_condition> JUMP <address>;

SET / TOGGLE / RESET FLAG_OUT;
Where:
condition = Branch Condition
<dest> = {(I4), (I5), (I6), (I7), <address>}
flag_condition = {FLAG_IN, NOT FLAG_IN}
96
U H
Program Loop Example
General Form:
DO LABEL UNTIL CONDITION
Example:
CNTR=10;
DO ENDLOOP UNTIL CE;
{ First Loop Instruction } ;
{ Last Loop Instruction } ;
ENDLOOP:
{ Next Loop Instruction } ;
{ First Instruction Outside Loop } ;
Address Pushed
On PC Stack
Address Pushed
On LOOP Stack
97
U H
Interrupt Handling
Interrupts Can Be Generated By An External Interrupt Signal Or
2100 Family Peripherals (Timer, Sport, HIP, etc)
External Interrupts (IRQx) Can Be Level Or Edge Sensitive (ICNTL)
Interrupts Have Priority And Can Be Nested
Interrupts Can Be Masked (IMASK)
Interrupts Can Be Forced Or Cleared Under Software Control (IFC) *
Different Family Members Have Different Interrupt Vector Tables
Interrupt Vector Table Always Begins At PM Address 0x0000
* Except ADSP-2100A
98
U H
Interrupt Latency
Interrupt latency is at least THREE (3) cycles except for the
TIMER interrupt which is only ONE (1) cycle
CLKOUT
Address for
Instruction Fetch
Interrupt
Instruction
Executing
n2 n1 n NOP
1st instr of
serv routine
n1 n n+1
interrupt
vector i
i+1
Pushed on PC Stack
99
U H
Interrupts & Interrupt Vector Addresses
Interrupt Source
Program startup at RESET
IRQ2
SPORT1 Transmit / IRQ1
SPORT1 Receive / IRQ0
Timer
Interrupt Vector Address
0x0000
0x0004 (highest priority)
0x0010
0x0014
0x0018 (lowest priority)
ADSP-2105
Interrupt Source
Program startup at RESET
IRQ2
Interrupt Vector Address
0x0000
0x0004 (highest priority)
SPORT0 Transmit
SPORT0 Receive
SPORT1 Transmit / IRQ1
SPORT1 Receive / IRQ0
Timer
0x0008
0x000C
0x0010
0x0014
0x0018 (lowest priority)
ADSP-2101
0x0014
0x0018
0x001C
0x001C
100
U H
Sequencer Mini-Quiz
Modify the answer of the DAGS Mini-Quiz to use a zero-overhead loop.

Assume N=100. Your program should require 9 Instruction Locations
0x1234
0x1234
0x1234
0x1234
0x1234
Data Memory
DM(0x3800)
Write the ADSP-2101 Instructions
to Find the Sum of the N=100
Numbers Stored in Data Memory
0x1234

101
U H
Sequencer Mini-Quiz Answer
.module/boot = 0 sequencer_mini_quiz;
.const buf_len = 100;
.var/dm/circ/abs=0x3800 data_buf [buf_len];

start:
i0 = ^data_buf; /*Load address of data buf */
l0 = %data_buf; /*Load length of data buf */
m3 = 1;
cntr = buf_len - 2; /*Load counter */
ar = dm (i0, m3); /*Load first data value */
ay0 = dm (i0, m3); /*Load second data value */
do add_loop until ce;

/*Value */

ar = ar + ay0; /*Last addition */

rts;
.endmod;
102
U H
ADSP-2100 Family Peripherals
Memory Interfacing
Timer
Serial Ports
103
U H
ADSP-21xx Family Memory Interface
104
U H
ADSP-2101 Basic System Configuration
SCLK
RFS
TFS
DT
DR
14 24
16
8
24
Serial Device

14
2
Clock or Crystal
SCLK
RFS or IRQ0
TFS or IRQ1
DT or FO
DR or FI
A D
OE
WE
CS
DATA
MEMORY
&
PERIPHERALS
(Optional)
A D CS
OE
WE
PROGRAM
MEMORY
(Optional)
A D CS
OE
BOOT
EPROM
27C64
27C128
27C256
27C512
150 ns
ADSP-2101
CLKIN CLKOUT V
DD
SERIAL
PORT 0
GND
SERIAL
PORT 1
DATA ADDRESS PMS DMS BMS RD WR
XTAL
MMAP
BG
BR
IRQ2
RESET
(Optional)
Serial Device

(Optional)
105
U H
ADSP-21xx Family Memory Architecture
Varied Memory Configurations Across Family Members*
Core Can Access PM Twice and DM Once Per Instruction
PM and DM Buses Multiplexed Off Chip*
Can Perform One Off-Chip Access with No Cycle Penalty
On Chip PM Can Be Initialized Through Boot EPROM or
Host Interface Port
External EPROM Can Store 8 Pages of Bootable Code.
Software Programmable Wait States

* Does not apply to ADSP-2100A
106
U H
On Chip Memory Configurations For
ADSP-21xx Processors
Program
ADSP-2100A

ADSP-2101

ADSP-2103

ADSP-2105

ADSP-2111

ADSP-2115

ADSP-21msp5x

ADSP-2161/63

ADSP-2171/73
Program
Memory
RAM
Data
Memory
RAM
Memory
ROM
-

1k

1k

1/2k

1k

1/2k

1k

1/2k

2k
-

-

-

-

-

-

2k

8k/4k

8k
-

2k

2k

1k

2k

1k

2k

-

2k
ADSP-2181 16k -
16k
107
U H
ADSP-2101 Program Memory Architecture
0x0000
(Reset
Vector)
0x07FF
0x0800
0x37FF
0x3800
0x3FFF
Internal PM
RAM Booted
From External
Boot Memory
External
Program
Memory
External
Program
Memory
Internal PM
RAM Not
Booted
MMAP = 0
(Boot)
MMAP = 1
(No Boot)
108
U H
ADSP-21xx Data Memory Architecture
0x0000
0x3FFF
Internal
Data Memory
RAM
0x0400
0x0800
0x3000
0x3400
0x3800
0x3C00
1K External
DWAIT0
1K External
DWAIT1
10K External
DWAIT2
1K External
DWAIT3
1K External
DWAIT4
Memory Mapped
and Reserved
Registers
ADSP-2171
Internal Data
Memory
RAM
109
U H
ADSP-21xx Memory Control Registers
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
DWAIT4 DWAIT3 DWAIT2 DWAIT1 DWAIT0
Data Memory Wait State Control Register DM(0x3FFE)
System Control Register DM(0x3FFF)
0 1 0 0 0 1 1 1 1 0
PWAIT
Program
Memory
Wait States
BWAIT
Boot
Memory
Wait States*
BPAGE
Boot Page
Select
BFORCE
Boot
Force Bit
* 7 wait states for Boot Memory on ADSP-2171
110
U H
Memory Mapped Control Registers
vs. Status Registers
> Physical locations in Data Memory
> Accessed by address
> Addresses 0x3C00 thru 0x3FFF (All Processors)

Status Registers (or Non-Memory Mapped Registers)
> Physical registers in the DSP
> Accessed by name

> Mainly to set up the peripherals (i.e., mode of operation)

Status Registers
> Set up the operation of the DSP core (i.e., MAC, interrupts)
> Provide information about the DSP core (i.e., stacks, status flags)

Initialize Memory Mapped Registers before running (i.e., not on the fly)
Status Registers are meant to be used on the fly
111
U H
0x3FFF System Control Register - Wait states, Enable SPORTs
0x3FFE Data Memory Waitstate Control Register
0x3FFD-0x3FFB Timer Control Registers - Set Timer values
0x3FFA -0x3FF7 SPORT0 Multichannel Word Enable Register
0x3FF6 SPORT0 Control Register - clock, frame and data modes
0x3FF5 SPORT0 SCLKDIV - Divide down register for SCLK
0x3FF4 SPORT0 RFSDIV - Divide down register for internal RFS
0x3FF3 SPORT0 Autobuffer Control Register
0x3FF2-0x3FEF SPORT1 Control and Setup (same as SPORT0)
0x3FEF-0x3FEC Analog Control Registers No SPORT1 autobuffer on msp5x parts
0x3FEB-0x3FE9 NO REGISTERS
0x3FE8 HMASK Register - HIP mask for interrupts
0x3FE7-0x3FE6 HIP Status Registers - HSR7 and HSR6
0x3FE5-0x3FE0 HIP Data Registers

112
U H
Status Registers
ASTAT ALU Status Flags, MAC Overflow Flag, Shifter Input Flag
SSTAT Stacks Overflow and Empty (Read-Only)
MSTAT Computation Modes, Miscellaneous Functions
5 4 3 2 1 0
Timer
SPORT1 Receive or IRQ0
SPORT1 Transmit or IRQ1
SPORT0 Receive
SPORT0 Transmit
IRQ2
0 0 0 0 0 0
1 = Enable
0 = Disable
4 3 2 1 0
IRQ0 Sensitivity
IRQ1 Sensitivity
IRQ2 Sensitivity
Interrupt Nesting
0
1 = Edge
0 = Level
1 = Enable
0 = Disable
ICNTL External Interrupt Sensitivity (edge/level) and Nesting
IMASK Interrupt Enables - Masks the servicing of interrupts
IFC Interrupt Force/Clear (Write-Only)
113
U H
Boot EPROM to Internal PM RAM
8k x 8

Boot
Page 0

2k x 24
0x0000
0x2000
BOOT
EPROM
Internal PM RAM
0x1FFF
.
.
.
Additional
Boot
Pages
0x0000
0x07FF
8 bits 24 bits
8 bits
A
B
C
Page length
24 bits
A B C
A
B
C
X
A B C
1
1
2
2
Booting Order
114
U H
ADSP-2101 Timer Block Diagram
TSCALE
TPERIOD
CLKOUT
Timer Enable
& Prescale Logic
TCOUNT Decrement Zero
Count Register Load Logic
Timer
Interrupt
Timer Enable
16
16 8
DMD Bus
16
115
U H
ADSP-2100 Family Timer Features
The ADSP-21xx programmable interval timer can generate periodic interrupts
based on multiples of the processor's cycle time. The timer is not available on
the ADSP-2100.

TCOUNT = dedicated count-down register

TPERIOD = reloads TCOUNT at interrupt

TSCALE = # of Clock ticks before TCOUNT decrements - 1

TCOUNT is decremented every TSCALE+1 cycles. After TCOUNT
expires, it is reloaded with the value in TPERIOD. One interrupt
occurs every (TPERIOD + 1) * (TSCALE + 1) cycles.
116
U H
ADSP-2101 Timer Registers
0x3FFD
0x3FFC
0x3FFB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
TPERIOD Period Register
TCOUNT Counter Register
TSCALE Scaling Register
0 0 0 0 0 0 0 0
117
U H
ENABLING THE TIMER
1. Set values for TCOUNT, TPERIOD, and TSCALE.
2. Set bit 0 in IMASK to 1 to enable interrupt.
3. Execute "ena timer" instruction to start counting down.
(Bit 5 in MSTAT register)
118
U H
Example Setup Code for Timer
i0 = 0x3ffb; /*i0 points to TSCALE*/
m0 = 1; /*modify value is 1 */
l0 = 0; /*not a circular buffer */
dm(i0,m0) = 0; /*set TSCALE to decrement every cycle*/
dm(i0,m0) = 49; /*to generate first interrupt at 50 cycles*/
dm(i0,m0) = 99; /*to reload TCOUNT with 99 at interrupt*/
IMASK = 0x1; /*enables the timer interrupt*/
ena timer; /*starts the count down after executing this*/
119
U H
TIMER MINI-QUIZ
1. Write code to generate a timer interrupt every 50 cycles the first time, and
75 cycles thereafter (any decrement that works).

2. Write code to generate a timer interrupt every 300 ms. Assume clock is
16.67MHz.

3. What is the longest time you can set the timer for if you have a 12.5MHz
cycle time. What would the values of TSCALE, TCOUNT, and
TPERIOD?
120
U H
TIMER MINI-QUIZ ANSWER
1. 2.
i0 = 0x3ffb; /*same first 3 lines*/ /*300ms = 5,000,000 cycles*/
m0 = 1; dm(i0,m0) = 0xF9; /*TSCALE = 250*/
l0=0; dm(i0,m0) = 0x4E1F; /*TCOUNT = 20,000*/
dm(i0,m0) = 0; /*set tscale=1*/ dm(i0,m0) = 0x4E1F; /*TPERIOD = 20,000*/
dm(i0,m0) = 49; /*set tcount = 50*/ imask = 0x1;
dm(i0,m0) = 74; /*set tperiod = 75*/ ena timer;
imask = 0x1;
ena timer;

3. /*same first 3 lines*/
/*12.5mHz processor yields an 80 ns instruction cycle time TCOUNT and TPERIOD are 16
bit registers - largest number they can represent is 65535, TSCALE is an 8 bit register, so
the largest number it can represent is 255. Following the equation
(TSCALE+1)*(TPERIOD+1) gives us 0x100 0000 number of cycles per timer interrupt.
This number multiplied by 80ns is 1.3422 seconds*/
dm(i0,m0) = 0xff;
dm(i0,m0) = 0xffff;
dm(i0,m0) = 0xffff;
imask = 0x1; ena timer;
121
U H
ADSP-21xx Serial Port
122
U H
ADSP-21xx Serial Port = UART
123
U H
ADSP-2101 Serial Port Block Diagram
Companding
Hardware
Receive Shift Register
16
16
TXn
Transmit Data Register
Transmit Shift Register
16
16
DMD Bus
16
DT DR
Serial
Control
SCLK TFS RFS
Internal
Serial
Clock
Generator
RXn
Receive Data Register
124
U H
ADSP-21xx SPORT Features
ADSP-21xx SPORTs Are Used For Synchronous Communication
Full Duplex
Fully Programmable
Autobuffer Capability
Multi-Channel Capability
Data Rates Up To 13 Mbits/sec
2171 Data Rates Up To 20 Mbits/sec
125
U H
Examples of Serial Port Implementation
Connecting a CODEC to the Serial Port
Connecting Two 2101's Together
Using the Serial Port as a UART
2101
TP3053
CODEC
2101 2101
2101
(with
software
UART)
PC
AD233
(RS-232 Driver)
126
U H
ADSP-21xx SPORT Hardware
SCLK: Serial Clock

RX: Data Receive

TX: Data Transmit

TFS: Transmit Frame Sync

RFS: Receive Frame Sync
SPORT Has 5 Wires
SCLK

TFS1

RFS1

RX

TX
ADSP-21xx Serial Device
Serial Clock

Transmit Frame Sync

Receive Frame Sync

Receive Data

Transmit Data
127
U H
ADSP-21xx SPORT Software
Access Serial Port Data By Accessing SPORT Data Registers:

TX0, TX1, RX0, RX1

Configure Serial Port Through Memory Mapped Control Registers:

System Control Register **
SPORT Control Register **
SPORT SCLKDIV Register
SPORT RFSDIV Register
SPORT Autobuffer Control Register
SPORT0 Multichannel Enable Registers

** Required to Configure SPORTs

Synchronize SPORT Transfers and Processor Operation With Interrupts

Each SPORT is Allocated a Transmit and Receive Interrupt
128
U H
The Base Architecture of Floating-Point DSP Processor
DAG 1
8 x 4 x 32 Program
Sequencer
CACHE
32 x 48
JTAG Test
&
Emulation
Bus
Connect
24
32
48
40
DMD BUS
PMD BUS
DMA BUS
PMA BUS
Timer
Fl P/Fx P
ALU
Multi
Fx P MAC
Fl P/Fx P 32-Bit
Barrel Shift
Register
File
16 x 40
DAG 2
8 x 4 x 24
129
U H
IEEE Compatibility
(IEEE Floating Point Standard 754/854)

Data Formats
32-Bit Single-Precision IEEE Floating Point
(23-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)
40-Bit Extended Single-Precision IEEE Floating Point
(31-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)
32-Bit Fixed Point (Integer and Fractional) With 80-Bit
Accumulation

Rounding
Rounding-to-Nearest (Unbiased Rounding)
Round-Toward-Zero (Truncation)

IEEE Exception Handling
Overflow
Underflow
Equals Zeros
Divide-by-Zero
Interrupt on Exception or Latched Status
130
U H
Fl P/Fx P
ALU
Multi
Fx P MAC
Fl P/Fx P
32-Bit
Barrel Shift
Register
File
16 x 40
Floating-Point Multiplier/MAC
Example Multiplier/MAC Instructions
F1 = F5 * F7
R2 = R3 * R8 (SSF)
MRF = MRF + R5 * R0 (UUIR)
131
U H
Fl P/Fx P
ALU
Multi
Fx P MAC
Fl P/Fx P
32-Bit
Barrel Shift
Register
File
16 x 40
Floating-Point Multiplier/MAC
Example Multiplier/MAC Instructions
F1 = F5 * F7
R2 = R3 * R8 (SSF)
MRF = MRF + R5 * R0 (UUIR)
132
U H
Example Multi-Function Instructions

IF EQ F1 = ABS F8, F9 = DM (I0,M4)

F8 =F1*F6, F3=F9+F14, F9=F9-F14,
DM(I2,M0)=F10, PM(I10,M10)=F3
133
U H

The System Architecture

Peripherals
Data
Memory
DSP
Selects
OE
WE
ADDR
DATA
Selects
OE
WE
ADDR
DATA
ACK
1x
CLOCK
PMS1-0
PMRD
PMWR
PMA
PMD
PMPAGE
PMACK
PMTS
DMS3-0
DMRD
DMWR
DMA
DMD
DMPAGE
DMACK
DMTS
CLKINRESET IRQ3-0
Program
Memory
Selects
OE
WE
ADDR
DATA
5 4
2
24
48
4
32
40
4
Processor
134
U H
DIGITAL
S/H A/D
PROCE-
D/A
f
s
NOISY SIGNAL CLEAN
Discrete
Time
Value
Analogue
Discrete
Filter
Processing
SSOR
Signal
SIGNAL
Analogue
Signal
The Complete Architecture
135
U H
What is a Real Time Application?
Real Time is misleading expression. However
it means that the DSP system can process the
required algorithm within a specified time
DIGITAL
S/H A/D
PROCESSOR
f
s
RADAR SIGNAL
DISPLAY
Fourier
Transform
x(t) x(f)
f
1
f
2
136
U H
Real Time Operating Systems as an Ideal
Environment for Embedded Applications
The current DSP processors:

Are more than high-performance signal -processing
engines

Provide a more regular instruction set, with plenty of
address space to run large programs

Come with efficient C compilers rival general
purpose microprocessors
137
U H
Click to add text
DSP
Embedded Applications
DSP
RTOS
Fax Tasks
Telephone
Tasks
Speech
Recognition
Tasks
Sound
Tasks
Generation
Answering
Machine
Tasks
138
U H
ARCHITECTURE
DSP
RTOS
DSP Memory
Managment
Real-Time
Multi-Tasking
DSP
Stream I/O
DSP
Event Handling
Memory Segments Processor Segments Peripheral Devices
139
U H
Click to add text
Operating System Features: BOS Nucleus RXTC SPOX Helios
Preemptive Task Scheduling Yes Yes Yes Yes Yes
Time-Sliced Scheduling Yes Yes Yes No Yes
Round-Robin Scheduling ? Yes Yes No Yes
Parallel Processing No No No Optional Yes
Inter-Task Messages Yes Yes Yes Yes Yes
Memory Management Yes Yes Yes Yes Yes
Interrupt Management Yes No Yes Yes Yes
Timer Management Yes Yes Yes No Yes
Device-Independent I/ O No No No Yes Yes
Stream I/ O $495* No No Yes Yes
OS RAM/ ROM Size (Bytes) 5K-40K 4K-20K 12K-16K 44K+ 80K-200K
Features for Real Time Operating Systems
Please contact the vendors listed above for the best and most up-to-date information
140
U H
Compression Techniques and a Compressor
and De-Compressor Generator
The CCITT/ISO Joint Photographic Experts Group
(JPEG) and (MPEG) digital image compression
processing algorithms are seriously required for:
Multimedia
Video Editing
Colour Publishing and Graphics Arts
Image-Processing, Storage and Retrieval
Colour Printers, Scanners and Copiers
High-Speed Image Transmission Systems for
LANs, Modem and Colour Facsimile
Digital Cameras
141
U H
These algorithms may be implemented in real time
as:
A) A dedicated Chip (Compressor)
Company Product
name
Compression
ratio
DCT
Table
Huffman
Table
Quantasition
Table
Price
in
Fast
Forward
Outlaw Digital
Video
from 4:1 to
10:1
Board: Disc 0.5 GByte 4700
950
C-Cube CL 550
En- / Decoder
from 8:1 to
100:1
static program program 80
C-Cube CL 650
En- / Decoder
from 1:1 to
50:1
static program program 200
Winbond W9930
En- / Decoder
from 8:1 to
100:1
static static program 29
LSI Logic L64702 * progr
am
program program 60
142
U H
B) DSP Processor + Compressor
DATA
compressed
uncompressed
DATA
compressed
uncompressed
DCT
IDCT
DSP Processor
DCT: Discrete Cosine Transform
143
U H
C) Software Solution (DSP C / Assembler code)
Company Processor type Data Bits Operation
frequency
Benchmarks
Optibase Motorola
56002
24 40 MHz *
Atlanta Signal
Processor
Texas
Instruments
TMS320C31
32 16 MHz 64 KB Grey scale
700ms
Sonitech
International
Texas
Instruments
TMS320C3x
32 16 MHz 400 Kbytes/s b &
540 Kbytes /s Colour
Atlanta Signal
Processor
Analog Devices
21020
32 33 MHz 500 Kbytes/s b & W
Zoran Corp Zoran ZR38000 16 25 MHz 440Kbytes/s b & W
144
U H
Compressor-De-Compressor Generator
n Millions
Pixels/Second
Processing Rate
Quantizer
&
Huffman
Tables
MPEG Param.
Comp/Decomp
Generator
C Assembly
JPEG Param.
Compression
Rate
1:1 to 80:1
n Bit Gray Scale, RGB, CMYK, 4:4:4:4, YUV Colour Space I/O
Comdisco: SPW
Hyperception: HW
Momentum: FDAS
Models
for
Code & Model Generator
145
U H
Performance Measures
Two measures are used commonly:

MIPS: Millions of Instructions Per Second
This is a measure of raw instruction
execution rate without specifying the nature of the
computations.

MFLOPS: Millions of Floating Point Operations Per Second
This is a measure useful in assessing computations in
floating point format.
146
U H
The difference between MIPS and MFLOPS can be appreciated by
considering a simple DO LOOP high level language construction:

DO I = 1 TO 1000000 STEP 1
BEGIN
Z(1) = X(I) * Y(I) + C(I);
END

Each iteration accomplishes two floating point operations, yet depending on the
host computer the compiled assembly language code could occupy many bytes.
The speed of execution of the two floating point operations depends therefore on
the MIPS of the processor; provided that each iteration could be completed in
say a nanosecond, the processor would then execute at the rate of two MFLOPS.
A system of a giga (one thousand millions)! processors could conceivably do all
the iterations at once and attain a performance of two giga MFLOPS.

Despite its spread use, an MIP is perhaps the poorest definition of performance
since it contains no quantifiable attributes for assessing useful processing.
The term FLOPS is widely used in signal processing applications and is a
common measure of performance in comparing processors.
147
U H
Data Flow Bottle-necks & Solutions ;
Pipeline & Parallel Architectures With Examples
DATA IN
MEMORY
BUS
DATA
OUTPUT
INSTRUCTION
Bottle-neck Of a Shared Instruction/Data Bus in
Von-Neumann Machine
148
U H
INSTRUCTION
DATA BUS
DATA
ALU
TMP
ACCUM
GENERAL
GPURPOSE
REGISTERS
PROG CNTR
ADDR REG
MEMORY
(INSTRUCTIONS
AND DATA)
ADDRESS
CONTROL & TIMING
ADDRESS BUS
The First Generation

P Architecture
Each instruction is a new event; it is fetched, decoded, and executed.
The Assembly Language Commands Help To Execute Lengthy Manipulations
On Designated Strings Of Data.
The Programmer Must Code Iterative Loops Or To Use Other Mechanisms To
Enhance Performance While Constrained With The Basic Limitations.
At The Algorithmic Level, Many Sequences Of Operations Have Little Or No
Precedence Relationships.
149
U H
The simplest view of a pipeline is that each stage consists of combinational
logic driven by an input register. The output from a stage captured by the
input register of the following stage. Each stage has a delay for the initial
data capture and subsequent processing.
It is possible to construct two types of pipeline system:
i) Synchronous Pipeline
If all stages have an equal delay, then a synchronous clock can transfer
results into each input register. This is the simplest control problem.
ii) Asynchronous Pipeline
If there is a large discrepancy between the various delays in each stage,
then an asynchronous data transfer might be in order. Here the intermediate
registers are omitted. The design of such pipes requires careful timing of
data input/output.

The following figure shows a simple Pipeline DSP System.
Overview of the Pipeline Approach
150
U H
Combinatorial Logic
I
n
p
u
t

R
e
g
i
s
t
e
r
DSP
ADSP-2181
Stage j
Stage j-1 Stage j+1
Simple Pipeline DSP System
AD
Converter
j j-1 j+1
DA
Converter
151
U H
When can the Pipeline Approach be considered?
In general a pipeline can be considered if:
i) The procedure can be broken into a sequence of discrete steps,
ii) The steady state data flow matches the reminder of the system, &
iii) Components can be found which implement the steps with the
desired response.
How can the performance of the pipeline be measured?
A synchronous pipeline produces a result every clock period t,
i.e. a data-flow rate of 1/t outputs per second. An N-stage pipeline
gives an apparent N-fold increase in performance. If the input to the
pipeline is intermittent, however, then some stages will not be
processing valid data, and this must be accounted for by the control
mechanism. If, on the average, only a fraction P of the total stages
are occupied, then the data flow falls to P/t outputs per second.
152
U H
In the following figure, a sequence of procedures is assumed each to
process data in time t, except for the FFT procedure which
consumes 8 t. Given that all the mechanisms for increasing
throughput (i.e. for decreasing t) have been exhausted, what are the
alternatives to enhance DSP performance?
Question:
t 8t t
P
1
P = FFT
2
P
3
Sequential Data Flow
153
U H
The final resort to enhance DSP Performance is in the form of Multiplicity:
a) Pipeline Array of Processing Units
Answer:
Here the procedure has been partitioned into a sequence of 8 processing steps each
requiring t seconds (this forms the pipeline) as shown in the following figure.
Input data then progresses through the pipe being successively modified until it emerges.
The design must be such that a new input can be accepted every t seconds, and also
such that each unit operates independently (no interunit data dependencies).
If this is the case the throughput matches the rest of the system.
Answer to be continued
t t t t t t
Control
1 2 7 8
Bandwidth in = 1/t Bandwidth out = 1/t (after pipe is filled)
Pipeline Data-Flow Solution
154
U H
Overview of the Parallel Approach
The simplest view of a parallel approach is that the input data to be fed to the units
sequentially via the input commutater and the output commutater collect the result
data after the processors have been executed simultaneously.

The following figure shows a simple Parallel DSP System.
155
U H
Simple Parallel DSP System
AD
Converter
DA
Converter
DSP
ADSP-2181
I
n
p
u
t

C
o
m
m
u
t
a
t
e
r
O
u
t
p
u
t

C
o
m
m
u
t
a
t
e
r
156
U H
When can the Parallel Approach be considered?
In general a parallel approach can be considered if:
i) The procedure can not be broken into a sequence of discrete steps, &
ii) The steady state data flow does not need to be constrained.
How can the performance of the parallel be measured?
A parallel array need not have an identical delay in each path, though this
complicates the control problem. If each of N units has a delay t
i
, then the
average delay could be used to compute data-flow. For N parallel paths the
response will be shown to be the same as an N-stage pipeline. If a proportion
P of units is unused then the output rate drops.
Note: The input/output commutation is usually difficult to implement and consumes
some overhead which lowers the effective throughput.
The overall behaviour is identical therefore with a pipeline although
implementation issues are widely different.
157
U H
The final resort to enhance DSP Performance is in the form of Multiplicity:
Answer (continue):
b) Parallel Array of Processing Units
In this case the individual processors still operate with a response time of 8 .
The input commutater sequentially allocates input data which is collected
8 seconds later by the output commuter.
t 8t t
1
8
Bandwidth in = 1/t Bandwidth out = 1/t
Parallel Data-Flow Solution
Input Commutater Output Commutater
158
U H
Example: FFT with Serial, Pipelining and Parallel Butterflies
The FFT provides a good example of the use of alternative
signal-processing architecture to improve throughput.

The key comparison is:
i) That of butterfly time, t
B
, &
ii) The time, (N/2)T log
2
N, to cycle through all butterflies of an FFT.

The interval, t, includes the butterfly computation time and any
overhead in address generation or looping.

Realistic alternatives to consider are:
Serial (direct)
Pipeline log
2
N stages deep, with N/2 steps
Parallel N/2 butterfly processors, iterate log
2
N times
159
U H
t4
t3
t2
t1
t5
t6
t7
t8
t9
t1
0
t11
t12
DO 20 J = 1, log
2
N
DO 10 I = 1, N/2
10 CONTINUE
20 CONTINUE
The Serial (Classic) Approach
Serial (direct)
Single processor compute each butterfly, one step at a time.
The Computation Flow
160
U H
Log
2
N BUTTERFLY
PROCESSORS (B1 - B3)
IN A PIPELINE
B1
t4
B1
t3
B1
t2
B1
t1 B2
t1
B2
t2
B2
t3
B2
t4
B3
t1
B3
t2
B3
t3
B3
t4
The Pipeline Approach
DO I = 1, N/2 DO I = 1, N/2 DO I = 1, N/2
Pipeline log
2
N stages deep, with N/2 steps
Here there are log
2
N butterfly processors, corresponding to the number of passes
(3 in the case of 8 data points- B1, B2, B3); each is used to compute the butterflies
pertinent to its pass in series; as each pass is computed, the processors are ready to
accept a pair of inputs for the next pass, and when the pipeline is full (steady state),
a set of outputs will be produced by each pass (N/2 computations).
161
U H
N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL
B4
t1
B3
t1
B2
t1
B1
t1 B1
t2
B2
t2
B3
t2
B4
t2
B1
t3
B2
t3
B3
t3
B4
t3
The Parallel Approach
DO J = 1, log
2
N
DO J = 1, log
2
N
DO J = 1, log
2
N
DO J = 1, log
2
N
Here there is one processor for each of the N/2 steps per pass; all butterflies for
that pass are computed at the same time; as soon as one pass is completed, all are
ready for the next pass; in the steady state, there will be an output for every
computation cycle.
Parallel N/2 butterfly processors, iterate log
2
N times
162
U H
Summarize the differences between the serial, pipeline, and the
parallel architecture for the FFT example in terms of the
computation time and the number of butterfly processors.
Consider a 1024-point FFT, what are the time and hardware costs
for the three architectures?
Q.
A.
Architecture Computation Time Number of Butterfly
Processors
Serial N/2log
2
N 1
Pipeline N/2 log
2
N
Parallel log
2
N N/2
The 1024-point FFT costing:
Serial 5,120 1
Pipeline 512 10
Parallel 10 512
163
U H
High Performance System Classification Scheme
There have been many attempts to classify processor architectures. A standard classification
scheme would be exceedingly useful both for discussion purposes and as a guide to processor
designs. The requirements for such a scheme are at least that:
It be complete (i.e., include all architectures) and
Orthogonal (i.e., differentiate the key attributes).
Unfortunately, despite the attractiveness of the concept, no such scheme exists. Of the many
proposals, one forms the basis of many others. It is neither complete nor orthogonal, yet its
elegance and intrinsic simplicity are attractive and it does concentrate on data flow and control
in a general way.
The basis of scheme is that a processor processes data by a sequence of instructions regardless
of the format and mechanisms whereby each arrives at the point of action. Based on the concept
of a data stream and an instruction stream, four possibilities exist:
SISD - Single Instruction Single Data Stream
SIMD- Single Instruction Multiple Data Stream
MISD Multiple Instruction Single Data Stream
MIMD Multiple Instruction Multiple Data Stream
164
U H
Answer:
Note that both the Babbage and Von Neumann architectures are SISD, although they differ greatly in
implementation. The performance of such a configuration can be though of as unity for purposes of comparison:
Data in
I
1
I
2
Data out
(Version 1)
I
3
I
1
I
2
I
3
I
4
Data in
D
(Version 2)
I
D
Data in Data out
Data in
Data out
D
1
D
2
D
3
D
4
I
SISD SIMD
MISD
D
1
I
1
D
2
I
2
D
3
I
3
D
4
I
4
MIMD
Examples are shown in the following figures.
Q. With the aid of appropriate diagram(s), show how the four categories in Flynns taxonomy can be emulated
on a dual processor shared-memory system. Your diagrams must clearly show the IS and DS from and to the
various units.
165
U H
The SIMD architecture is an example of a parallel array in which each processing unit executes the same
instruction. It can achieve an n-fold increase in data flow band-width for each instruction, provided that
the units can be continuously utilized.
The original motivation for developing SIMD array processors was to perform parallel computations on
vector or matrix types of data. Parallel processing algorithms have been developed by many computer
scientists for SIMD computers. Important SIMD algorithms can be used to perform matrix multiplication,
Fast Fourier Transform (FFT), matrix transposition, summation of vector elements, matrix inversion,
parallel sorting, linear recurrence, Boolean matrix operations, and to solve partial differential equations.
The MIMD architecture is implemented by a multiple processor system. Clearly implied is some form of
cooperative network to share a computational task (completely autonomous units being of little interest).
This is an example of a parallel array in which the task assigned to each processor can be different. The
performance enhancement potential is equal to the number of processors.
The MISD architecture is not widely implemented in practice and substantial disagreement exist on its
exact structure. It is considered here as a pipeline in which a single data stream is modified at successive
stages., and its performance enhancement potential equals the number of stages as shown in the previous
section.
There is a relationship between these classifications and the structure of processing algorithms. An
algorithm may contain a collection of processing tasks which could optimally be assigned to different
processing configurations to achieve an overall higher performance. If components were of sufficiently
low cost, a solution might be to build a conglomerate of different processing architectures and utilize the
optimum one at appropriate points in the algorithm. The task assignment problem here is formidable; and
as well the physical complexity and lowered reliability of such a conglomerate of components is a major
limiting factor of such a scheme. This will be discussed in more detail later.
Discussion on the classification scheme
166
U H
SIMD Matrix Multiplication & SIMD FFT
*) G Barnes, et al.,"The ILLIAC-IV Computer," IEEE Trans. on Computers,
Aug. 1968, pp. 746-756.

**) K Hwang & F Briggs, "Computer Architecture and Parallel Processing,"
McGraw-Hill Book Company, 1985.

*) B Wilkinson,"Computer Architecture: Design and Performance,"
Prentic-Hall Int. Ltd, 1991.
To be found in the following References
167
U H
How To Design SIMD DSP System From The
Off-Shelf Fixed-Point DS Processors?
The off-shelf Fixed-Point DS Processors are two ADSP-2101s,
each with its own private memories. The following figure shows
a block diagram of the system hardware architecture.
A processor pair almost doubles the speed of a single processor while
Keeping the architecture and
Inter processor co-ordination as simple as possible.
Here we will develop SIMD DSP system with a processor-pair
architecture, based on a dual-port RAM. The design is easy to
implement and provides a significant computational boost over
a single processor.
Hardware Architecture
168
U H
Program
Memory
Program
Memory
ADSP
2101
ADSP
2101
Private
Data
Memory
Private
Data
Memory
Common Data
Data
Memory
(Dual-Port
RAM)
DMA DMA
PMD PMD
DMA DMA
PMD PMD DMACK DMACK
BUSY
L
BUSY
R
Processor Pair Block Diagram
Private memories are accessible to one processor only.
Common memory is accessed by both.
Each memory has a private memory of 32K of 24-bit
program memory and 14K of 16-bit data memory.
In addition, 2K of 16-bit dual-port RAM is shared by both processors.
This area of memory allows inter-processor communication and data
transfers.
169
U H
Software Architecture
The alternating buffers in this application are two identical buffers
located in dual-port RAM so both processors can access them:
The first processor fills buffer 1 with information,
While the second processor fills the information in buffer 2.
Each buffer has a flag that indicates completion of operations on
that buffer.
When processor 1 has finished its operations on the buffer data,
It sets the flag, signalling processor 2 to begin operations on that buffer.
To complement the hardware design, a hypothetical application is
presented. Data is input and low-pass filtered by one processor,
then the second processor determines the peak location within a
filtered window.
Although the software implementation is simplistic, it shows a technique for programming
in a multiprocessing environment: alternating buffers and flags.
170
U H
The sequence of operations is shown in the following table:
Processor 1 (Filter) Processor 2 (Peak Locator)
Initialise flags, coefficients initialise pointers
delay line, pointers

Perform low pass filter Check flag 1; wait if not set
operation on data in buffer 1
Set flag 1
Check flag 1; if set, perform
Perform low pass filter peak locating operation on
operation on data in buffer 2
data in buffer 1
Clear flag 1
Set flag 2
Check flag 2; if set, perform
Perform low pass filter peak locating operation on
operation on data in buffer 1 data in buffer 2
Clear flag 2
Set flag 1; etc.
Check flag 1; etc.
The alternating buffer scheme is easier to implement than a single buffer scheme. If only one buffer were used, careful timing analysis or extensive
handshaking would be required to ensure that the processors did not use old or invalid data.
171
U H
The Modified Harvard Architecture
Harvard Architecture:
Simultaneous Access of Data and Instruction

Modified Harvard Architecture:
Simultaneous Access of 2 Data Memories and Instruction from Cache
Gives Three Bus Performance with only 2 Busses
DSP
Processor Data
Storage
DM
Program
&
Data
Storage
PM
Data Data
Address Address
32
32/40
24
48
I/O
Cache
Multiprocessing With The SHARC
172
U H
SHARCComplete Signal Computer On A Chip
ADSP-21000 Family High Performance Processor Core
- 25ns = 40MIPS / 120 MFLOPS
Large Efficient On-Chip Memory System
- 4 Megabits on ADSP-21060
- 2 Megabits on ADSP-21062
DMA Controller and I/O Processor
- Allows Flexible, Zero-Overhead, High-Speed Data Transfers
- 240 Mbytes/s
Host Interface
- Efficient Interface to 16- & 32-Bit Microprocessors
Two Serial Ports
- 40 Mbit/s Multichannel Serial Ports
Two Integrated Multiprocessing Interfaces
- Glueless Cluster Interface Transfers at 240 MBytes/s
- Six Link Ports Allow Point-To-Point Transfers at 40 MByte/s Each
173
U H
SHARC Internal Architecture
PM Address Bus 24
DM Address Bus 32
PM Data Bus 48
DM Data Bus 40
PROCESSOR PORT I/O PORT
SERIAL PORTS
(2)
LINK PORTS
(6)
INSTRUCTION
CACHE
32 x 48-Bit
DA G 2
8 x 4 x 24
DA G 1
8 x 4 x 32
DMA
CONTROLLER
Addr Bus
Mux
32
48
IOD
48
IOA
17
Data Bus
Mux
IOP
REGISTERS

Control,
Status, &
Data Buffers
6
6
36
4
ADDR DATA ADDR DATA
7
JTAG
MULTIPLIER
BARREL
SHIFTER
ALU
DATA
REGISTER
FILE

16 x 40-Bit
Core Processor
Dual-Ported SRAM
Two Independent,
Dual-Ported Blocks
B
L
O
C
K

0
B
L
O
C
K

1
External Port
I/O Processor
HOST PORT
Test &
Emulation
ADDR DATA ADDR DATA
PROGRAM
SEQUENCER
MULTIPROCESSOR
INTERFACE
TIMER
Connect
Bus
(PX)
174
U H
ADSP-210XX Family Features
40 MIPS / 120 MFLOPS Arithmetic Processing
- Parallel Operation of: Multiplier, ALU, 2 Address Generators & Sequencer
- No Arithmetic Pipeline; All Computations Are Single-Cycle
High Precision and Extended Dynamic Range
- 32/40-Bit IEEE Floating-Point Math
- 32-Bit Fixed-Point MACs with 64-Bit Product & 80-Bit Accumulation
Single-Cycle Transfers with Two Memory Structures:
DM and PM
- Supported by Cache Memory and Enhanced Harvard Architecture
Circular Buffer Addressing Supported in Hardware
- Dual DAGs Support Circular Buffer and Modulus Addressing
- 32 Address Pointers Support 32 Circular Buffers (Primary and Secondary)
Zero-Overhead Looping in Hardware, 6 Nested Levels
Rich, Algebraic Instruction Set is Easy-To-Learn and
Easy-To-Use
- Conditional Arithmetics, Bit Manipulation, Divide & Square Root
175
U H
DAG 2
8 x 4 x 24
DAG 1
8 x 4 x 32
CACHE
MEMORY
32 x 48
PROGRAM
SEQUENCER
PMD BUS
DMD BUS
24 PMA BUS
PMD
DMD
PMA
32 DMA BUS
DMA
48
40
JTAG TEST &
EMULATION
FLOATING & FIXED-POINT
MULTIPLIER,
FIXED-POINT
ACCUMULATOR
32-BIT
BARREL
SHIFTER
FLOATING-POINT
& FIXED-POINT
ALU
REGISTER
FILE
16 x 40
BUS CONNECT
TIMER
ADSP-210XX Family Core
176
U H
ADSP-210XX Register File
A L U S H I F T E R
PMD BUS
DMD BUS
1 6 x 2 r e g i s t e r s x 4 0 b i t s
M R 0 M R 1 M R 2
r 0 o r f 0
r 1 o r f 1
r 2 o r f 2
r 3 o r f 3
r 4 o r f 4
r 5 o r f 5
r 6 o r f 6
r 7 o r f 7
r 8 o r f 8
r 9 o r f 9
r 1 0 o r f 1 0
r 1 1 o r f 1 1
r 1 2 o r f 1 2
r 1 3 o r f 1 3
r 1 4 o r f 1 4
r 1 5 o r f 1 5
2 r e g i s t e r s x 8 0 b i t s
MULTIPLIER
177
U H
Data Address Generator (1 of 2)
L e n g t h
R E G I S T E R S
8 x N
D M D B U S
A D D
I n d e x
R E G I S T E R S
8 x N
M o d i f y
R E G I S T E R S
8 x N
M O D U L U S
L O G I C
N N
N
F R O M
I N S T R U C T I O N
B a s e
R E G I S T E R S
8 x N
M U X
M U X
N N
B I T - R E V E R S E
o p t i o n a l
N
U P D A T E
N
N
D M A B U S ( D A G 1 )
P M A B U S ( D A G 2 )
D A G 1 : N = 3 2
D A G 2 : N = 2 4
B I T - R E V E R S E
I 0 , I 8 o n l y ; o p t i o n a l
N
178
U H
Example Multi-Function
Instruction In FFT Routine
f11 = f1 * f7, f3 = f9 + f14, f9 = f9 - f14, dm (i2, m0) = f13, f7 = pm (i8, m8);

In a Single 25ns Cycle the ADSP-2106X Performs:
- 1 Multiply
- 1 Addition
- 1 Subtraction
- 1 Memory Read
- 1 Memory Write
- 2 Address Pointer Updates

Plus the I/O Processor Performs:
- Active Serial Port Channels (2 Transmit, 2 Receive)
- Active Link Ports (6)
- Memory DMA
- 2 DMA Pointer Updates

The Algebraic Syntax of the Assembly Language Facilitates Coding of
DSP Algorithms
179
U H
ADSP-210XX Family Code Example
Matrix Times a Matrix: C [MxJ] = A [MxN] * B [NxJ]
. S E G M E N T / P M p m _ c o d e ;
s e t u p : m 1 = 1 ; m 2 = - 2 ; m 3 = J ; m 9 = - ( J * 2 - 1 ) ; m 1 0 = J ; / * s e t u p m o d i f i e r s * /
b 0 = m a t _ a ; l 0 = @ m a t _ a ; / * s e t u p p o i n t e r s * /
b 1 = m a t _ c ; l 1 = @ m a t _ c ;
b 8 = m a t _ b ; l 8 = @ m a t _ b ;

m x n x n x o : l c n t r = J , d o c o l r o w u n t i l l c e ;
r 8 = r 8 - r 8 , f 0 = d m ( i 0 , m 1 ) , f 4 = p m ( i 8 , m 1 0 ) ;
f 1 2 = f 0 * f 4 , f 0 = d m ( i 0 , m 1 ) , f 4 = p m ( i 8 , m 1 0 ) ;
l c n t r = M , d o c o l u m n u n t i l l c e ;
l c n t r = N , d o r o w u n t i l l c e ;
r o w : f 1 2 = f 0 * f 4 , f 8 = f 8 + f 1 2 , f 0 = d m ( i 0 , m 1 ) , f 4 = p m ( i 8 , m 1 0 ) ;
c o l u m n : f 8 = p a s s f 1 5 , d m ( i 1 , m 3 ) = f 8 ;
f 0 = d m ( i 0 , m 2 ) , f 4 = p m ( i 8 , m 9 ) ;
c o l r o w : m o d i f y ( i 1 , 1 ) ;
. E N D S E G ;
180
U H
ADSP-2106X Internal Architecture
PM Address Bus 24
DM Address Bus 32
PM Data Bus 48
DM Data Bus 40
PROCESSOR PORT I/O PORT
SERIAL PORTS
(2)
LINK PORTS
(6)
INSTRUCTION
CACHE
32 x 48-Bit
DA G 2
8 x 4 x 24
DA G 1
8 x 4 x 32
DMA
CONTROLLER
Addr Bus
Mux
32
48
IOD
48
IOA
17
Data Bus
Mux
IOP
REGISTERS

Control,
Status, &
Data Buffers
6
6
36
4
ADDR DATA ADDR DATA
7
JTAG
MULTIPLIER
BARREL
SHIFTER
ALU
DATA
REGISTER
FILE

16 x 40-Bit
Core Processor
Dual-Ported SRAM
Two Independent,
Dual-Ported Blocks
B
L
O
C
K

0
B
L
O
C
K

1
External Port
I/O Processor
HOST PORT
Test &
Emulation
ADDR DATA ADDR DATA
PROGRAM
SEQUENCER
MULTIPROCESSOR
INTERFACE
TIMER
Connect
Bus
(PX)
181
U H
Thank You
&
Best Luck
Professor Talib Alukaidey
Tel: 01707-284183
Fax: 01707-284199
Email: T.Alukaidey@herts.ac.uk
UNIVERSITY OF HERTFORDSHIRE

BEngDSP Notes

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

BEngDSP Notes

Încărcat de

Drepturi de autor:

Formate disponibile

1

S-ar putea să vă placă și