Documente Academic
Documente Profesional
Documente Cultură
278 .
3.2 Lee Algorithm , *
a CI 0 J2(C,-CS,
b Cz p J2Cb * Lee algorithm [ I 11 is based on the matrix
c Cl q J2(CI+C6,
representation. In fact, the first step is nothing than a
butterfly decomposition yielding to an even and an odd
d Ca r - CI + CI + Cq - CI part. The cvcn part will bc just a I-D DCT orordcr N/2.
e CI 8 cl+cl-c~+c, While, the odd part will be computed through a matrix
f c6 I C,+C1+C3-C, multiplication.
g Cl U C1+C3- Cs-c, Figure 2 illustrates I -D DCT of order 8.
h CI+G v CI-C,
!I
i C1-& w CI+CI
j ll2Cz X C]+cJ
k 1/2Cd y C1-G Butterfly
m l12Cr
6-
-7
where
-1 L--..J
____, SuttradiM
Figure 1: The algorithm by Chen et al
279
The constant C was chosen to be equal to fi which 5. The Architecture 1-mplemented
allows the first DCT coefficient to be evaluated without
From Chen et a1 Algorithm, we can find that the
any multiplication. .
transform matrix A could be divided to 2 smaller
Figure 4 explains the building blocks of the algorithm.
matrices. By using the Distributed Algorithm technique
thcsc 2 matriccs could be inore simplified to givc thc
following equations:
11 -=-.L--_:;
Io 3
y, = C A k , , ( x , + J : ~ - ~ ) for I even
k=O
3
y, =zAk,/(xk-X7-k) for 1 odd
k=O
The Architecture of the ID DCT is shown in
Figure 5
0-4T.I ~
4. Distributed Arithmetic
280
It is Clearly Obvious the great reductions have been [4] Woon Hau Chin, and B. Farhang-Boroujeny,
done on the hardware when using the Distributed "Subband Adaptive Filtering with Real-valued
Arithmetic. This will directly decreases the delays and Subband Signals for Acoustic Echo Cancellation",
the area required while increasing the output. conf. 1996.
The following flow chart briefly explains the hardware
implementation of the ID DCT that uses Arithmetic [SI K.R.Rao and P.Yip, "Discrete Cosine Transform.
Distribution we have done. Algorithms,Advantages, Applications". Academic Press.
San Diego, California, 1990.
8 Input Registers (each one 8 bit) A. Madisetti and A. Willson, "A IOOMHz 2-D 8x8
DCTIIDCT processor for HDTV applications"
IEEE Trans .Circuits, Systems for Video Tech.
vol. 5, no. 2, April 1995.
I Add / subtract units I S-Uramoto, Y.Inoue, A. Takabatake, J. Takeda,
Y.Yamashita, H. Terane, and M. Yoshimoto, "A
100 MHz 2D discrete cosine transform core
processor". in IEEE Journal of Solid State Circuits,
vol. 27, pp. 492-499, April 1992.
M.Matsui, M. Hara, Y.Uetani, L.Kim,
T.Nagamatsu, Y.Wantanabc. K.Masuda, TSakurai
"A 200 MHz 13mm2 2-D DCT macrocell using
4,
Sense-amplifying pipeline flip-flop scheme", IEEE
Jour. Solid State Circuits, vo1.29, no. 12 Dec1994.
Y.Jang, J. Kao, J. Yang, and P.Huang," A 0.8 p
Accumulators 100 MHz 2D DCT core processof', in IEEE
Transactions on Consumer Elecironics, vol. 40, pp.
703-709, August 1994.
we have 8 inputs, each input is 8bit width. [IO] W. Chen, C.H.Smith, and S.Fralick, "A fast
First the inputs are registered. Then they are added and computation algorithm for the discrete cosine
subtracted according to the matrix of Chen et al. transform", in IEEE Transactions on
The Bit Serial Architecture is primarily used in the communications, vol. 25, pp. 1004-1009,
context of multipliers, these are architectures September 1977.
[1 IJ Y .P Lee and all," A cost-effective aichitecture
where a single bit bit of each input word is for 8x8 two-dimensional D$ZTIIDCT using
transmitted during each processing cycle. This - direct method", IEEE Thnsachons on circuit
reduces YO, however an n-bit word requires and system for video technology vo1.7,N0.3,
n-processing cycles for transmission. The input June 1997.
word to the bit serial architecture is 10 bit. This means [ 121 C. Loeffler, A. Ligtenberg and G.S. Moschytz,
we need 10 cycles. Lookup tables (ROMS) contain "Practical Fast I-D DCT algorithm with 1 1
partial product terms that are indexed using the bit- multiplications", Proceedings of ICASSP, vol.2.
serial input from the multiplier. An accumulator is used pp. 988-991, 1989.
to add each partial product term. The VHDL code was [ 131 K. Ray-Liu and C.T. Chiu,"Unifed
written using FPGA Advantage and was implemented parallel lattice structures for time-recursive
on Xilinx Spartatdl FPGA, which uses look-up tables discrete cosinelsineltlsrtley transforms", in
and therefore should make an efficient use of the IEEE Transactions on Signal Processing, vol.
design. 41, pp. 1357-1377, March 1993.
[141 V. Srinivasan and K. Ray-Liu,"VLSI design of
References high-speed time-recursive 2D DCT/IDCT
processor for video applications",in IEEE
[I] G. K.Wallace, "The JPEG still picture Transactions on Circuits and Systems for Video
compression standard, in Communications of Technology, vol. 6, pp. 87-96, February 1996.
the ACM, vo1.34, pp.31-44, April 1991. [IS] V. Srinivasan and K.Ray-Liu,"Full custom V U 1
implementation of high-speed -D DCT-IDCT
[2] "CCITT Recommendation H.26 1 ",1990. chip", in IEEE Proceedings ICIP-94, vo1.3, pp.
606-610, November1994.
[3] D.L. Gall," MPEG: a video compression standard
for multimedia applications", in Communications
of the ACM, vol, 34, pp46-58, April 1991.
28 I