Sunteți pe pagina 1din 13

1 ECE734 VLSI Arrays for Digital Signal Processing

Chapter 3 Parallel and


Pipelined Processing
2 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Basic Ideas
Parallel processing Pipelined processing
a1 a2 a3 a4
b1 b2 b3 b4
c1 c2 c3 c4
d1 d2 d3 d4
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
a4 b4 c4 d4
P1

P2

P3

P4
P1

P2

P3

P4
time
Colors: different types of operations performed
a, b, c, d: different data streams processed
Less inter-processor communication
Complicated processor hardware
time
More inter-processor communication
Simpler processor hardware
3 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Data Dependence
Parallel processing requires
NO data dependence
between processors
Pipelined processing will
involve inter-processor
communication
P1

P2

P3

P4
P1

P2

P3

P4
time time
4 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Usage of Pipelined Processing
By inserting latches or
registers between
combinational logic circuits,
the critical path can be
shortened.
Consequence:
reduce clock cycle time,
increase clock frequency.
Suitable for DSP
applications that have
(infinity) long data stream.
Method to incorporate
pipelining: Cut-set retiming
Cut set:
A cut set is a set of edges of
a graph. If these edges are
removed from the original
graph, the remaining graph
will become two separate
graphs.
Retiming:
The timing of an algorithm is
re-adjusted while keeping
the partial ordering of
execution unchanged so
that the results correct
5 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Graphic Transpose Theorem
The transfer function of a signal flow graph remain
unchanged if
The directions of each arc is reversed
The input and output labels are switched.
z
1
z
1
x[n]
y[n] h[2]
h[1] h[0]
z
1
z
1
y[n]
x[n]
h[2] h[1] h[0]
u[n]
=
?
6 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Data broadcast structure
Algorithm transform may
lead to pipelined structure
without adding additional
delays.
Given a FIR filter SFG




Critical path T
M
+2T
A

Use graph transposition
theorem:
Reverse all arcs
Reverse input/output
We obtain




Critical path T
M
+ T
A

No additional delay added!
7 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Fine-grain pipelining
To further reduce T
M.

Critical Path = Max {T
M1
, T
M2
, T
A
}
8 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Block Processing
One form of vectorized
parallel processing of DSP
algorithms. (Not the parallel
processing in most general
sense)
Block vector: [x(3k) x(3k+1)
x(3k+2)]
Clock cycle: can be 3 times
longer
Original (FIR filter):



Rewrite 3 equations at a
time:




Define block vector
Block formulation:

(3 ) (3 ) (3 1) (3 2)
(3 1) (3 1) (3 ) (3 1)
(3 2) (3 2) (3 1) (3 )
y k x k x k x k
y k a x k b x k c x k
y k x k x k x k
( ( ( (
( ( ( (
+ = + + +
( ( ( (
( ( ( ( + + +

( ) ( ) ( 1)
( 2)
y n a x n b x n
c x n
= +
+
(3 )
( ) (3 1)
(3 2)
x k
k x k
x k
(
(
= +
(
( +

x
0 0 0
( ) 0 ( ) 0 0 ( 1)
0 0 0
a c b
k b a k c k
c b a
( (
( (
= +
( (
( (

y x x
9 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Block Processing
10 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
General approach for block processing
11 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Original formulation:

Rewrite


Define block vectors


Then
Block Processing for IIR Digital Filter
Time indices
n: sampling period
k: clock period (processor)
k = 2n
Note:
Pipelining: clock period =
sampling period.
Block (parallel): clock period
not equal to sampling period.
( ) ( 2) ( ) y n a y n x n = +
(2 ) (2 )
( ) , ( )
(2 1) (2 1)
x n y n
k k
x n y n
( (
= =
( (
+ +

x y
(2 ) (2 2) (2 )
(2 1) (2 1) (2 1)
y n a y n x n
y n a y n x n
= +
+ = + +
( ) ( 1) ( ) k a k k = + y y x
12 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Block IIR Filter
D
D
S/P P/S
+
+


x(2k)
x(2k+1)
y(2k+1)
y(2k)
x(n) y(n)
y(2(k1))
y(2(k1)+1)
13 ECE734 VLSI Arrays for Digital Signal Processing
(C) 1997-2006 by Yu Hen Hu
Timing Comparison

Pipelining





Block processing
1 2 3 4
x(1) x(2) x(3) x(4)
y(1) y(2) y(3) y(4)
1 2 3 4 5 6 7 8
x(1) x(2) x(3) x(4) x(5) x(6) x(7) x(7)
MAC
1 2 3 4 5 6 7 8
y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(7)
Add
a y(1)
Mul
1 1 3 3 5 5 7 7
2 2 4 4 6 6 8 8
x(2) x(4) x(6) x(8)
x(1) x(3) x(5) x(7)

S-ar putea să vă placă și