Sunteți pe pagina 1din 4

A FAST ALGORITHM FOR THE CONVERSION OF DCT COEFFICIENTS TO H.

264 TRANSFORM COEFFICIENTS Chan Yul Park and Nam Ik Cho Seoul National University San 56-1, Shilim-dong, Kwanak-gu, Seoul, 151-742, Korea kchanyul@ispl.snu.ac.kr, nicho@snu.ac.kr
ABSTRACT This paper proposes a fast algorithm that converts DCT coefcients into integer transform coefcients, for the transform domain transcoding from MPEG-x to H.264. For the transcoding in the same resolution, the 8 8 DCT coefcients are converted to four 4 4 integer transform coefcients by decomposing the conversion matrix into sparse ones. For the reduction of resolution by half, we also propose an algorithm that converts DCT coefcients in the lower band into 4 4 integer transform coefcients. The sparse matrices derived in this paper require fewer computations than the direct and conventional conversion matrices, and thus the overall transcoding using the proposed algorithm requires less computational complexity. 1. INTRODUCTION With the advent of H.264 video coding standard and its potential to be a major coding scheme in many applications, the need for the conversion of existing MPEG-2 video into this new standard is ever growing. However, unlike the conversion between the former video coding standards, the conversion between the H.264 and former standards is not easy because the new standard use different transform, i.e., integer transform (IT)[1]. More precisely, the source video stream need not be fully decoded to pixel level in the case of transcoding between MPEGs, because the DCT coefcients can be reused. Hence the transform domain trasncoding is generally faster than the pixel domain transcoding. However, for the conversion of DCT coefcients into IT coefcients, additional transform matrix multiplications are needed. As a result, the transform domain transcoding is not efcient at all. Hence, a fast conversion algorithm between two transform coefcients is required for the efcient transform domain transcoder. But we could nd only two algorithms related with the conversion between DCT and IT. One is the algorithm proposed by Xin[2] that multiplies a conversion matrix to the DCT coefcient matrix. The cascade of this conversion matrix and DCT matrix is similar to the integer DCT, and the number of computations is not reduced. The other is proposed by Shen which uses the factorized form of 8 8 DCT matrix[3]. Multiplications in the process of matrix multiplications are replaced by additions and shifts by using Merhavs method in [4]. But this conversion method has disadvantage that there are much computational errors because of approximation. So, we propose a more efcient algorithm for the conversion from the DCT coefcients into IT coefcients. The algorithm is based on the decomposition of the conversion matrix into cascades of sparse matrices that need fewer computations. We also propose a modied 4 4 DCT and use it instead of IT in order to reduce the computational complexity. Two cases of transcoding are considered based on these basic ideas, i.e., transcoding into the same and half resolutions. For the former, the 8 8 DCT coefcients are converted into four 4 4 IT coefcients, and the lower band coefcients among 88 DCT coefcients are converted into 4 4 IT coefcients for the latter case. The number of computations and resulting video quality of each algorithm for each case are compared and it is shown that the proposed algorithm has very little approximation errors while reducing much computational complexity. 2. CONVENTIONAL ALGORITHMS FOR CONVERTING DCT INTO IT In the case of keeping the image size while transcoding, an 8 8 DCT coefcient matrix X is converted to four 4 4 IT coefcient matrices Yij as Y00 Y01 Y10 Y11 = H4 O4 O4 H4 Tt XT8 8 Ht O4 4 O4 Ht 4 (1)

where H4 is the integer transform matrix, O4 is the 44 zero matrix and T8 is the 8 8 DCT matrix. Hence, the overall transform matrix from DCT coefcients to IT coefcients can be denoted as S= H4 O4 O4 H4 Tt . 8 (2)

0-7803-9134-9/05/$20.00 2005 IEEE

Thus the purpose of fast conversion algorithms is to reduce the computations required for the multiplication of this matrix. J. Xin et al. proposed an integer form of the conversion matrix to perform this matrix computation using 32-bit arithmetic as S = round{128 S}. Because the computational complexity of a 32-bit integer multiplication is not less than that of a oat multiplication in most processors today, this method seems to be useful only when the processor (or hardware) has no oating point multiplier. As a result, it requires the same number of computations as the pixel domain transcoding. More recently Shen[3] extended Merhavs method[4] for reducing the computational complexity of S in equation (2). The algorithm in [4] is an efcient conversion method between DCT coefcients, which is applied to the conversion of an MPEG-x sequence to another one with different resolution. And the reduction of computations is achieved by using factorized DCT by Arai et al.[5] (AAN Algorithm) and replacing a multiplication into one shift (1s), or two additions and two shifts (2a2s)[4]. More specically, the DCT matrix T8 is factorized as [5] T8 = D8 PB1 B2 MA1 A2 A3 . (3)

the 4 4 DCT matrix T4 as H4 D4 T4 where D4 is a diagonal matrix[1]. The multiplication of D4 is absorbed into the quantization process, and H4 is just considered as a new transform in H.264. But, since the pattern of the elements in H4 is somewhat random when compared to other conventional transform matrices such as DCT and DFT, the fast algorithm and further decomposition of Ed is very difcult. So, we consider T4 again instead of H4 in [3]. For this, H4 is replaced by D1 T4 in this paper (because H4 D4 T4 as 4 stated above), where D1 = diag{2, 3.1623, 2, 3.1623}. 4 Note that this D1 can also be absorbed into the quantiza4 tion process of the transcoding. Then, the multiplication of T4 and a part of Tt in the computation of equation (4) re8 sults in very sparse matrices and thus the computation can be further reduced. More detailed examples are shown in the following subsections, for the case of conversion into the same and half resolutions. 3.1. Conversion in the Same Resolution By using D1 T4 instead of H4 , each Yij in equation (4) is 4 changed as Yij = D1 T4 ei Tt XT8 et Tt D1 j 4 4 8 4 (7)

By using these factorized matrices, the conversion from DCT to IT in equation (1) can be rewritten as Yij = Ei XEt j where E0 = H4 e0 Tt = H4 e0 At At At Mt Bt Bt Tt D8 2 1 8 3 2 1 8 E1 = H4 e1 Tt = H4 e1 At At At Mt Bt Bt Tt D8 . 2 1 8 3 2 1 8 (5) (4)

Here e0 = (I4 O4 ) and e1 = (O4 I4 ) where I4 is the 4 4 identity matrix. And E+ = E0 + E1 and E = E0 E1 are dened for the reduction of redundancies in the expression of matrices like F+ and F of Merhavs algorithm[4]. In order to reduce the multiplications in equation (4), Shen dened another matrices Ed and Ed as [3] + Ed = H4 (e0 + e1 )At At At Mt + 3 2 1 Ed = H4 (e0 e1 )At At At Mt . 3 2 1 (6)

Note that D1 can be absorbed into the quantization pro4 cess. Also, Ed and Ed in (6) are changed as + 4 0 0 0 0 0 0 0 0 0 0 0 a1 0 1 a2 Ed = (8) + 0 4 0 0 0 0 0 0 0 0 0 0 1 0 a 3 a4 0 0 0 0 b1 b2 b3 1 0 0 b3 b3 0 0 0 0 (9) Ed = 0 0 0 0 0 b2 0 1 0 0 b1 b1 0 0 0 0 where a1 = -2.4143, a2 = 1.3066, a3 = -0.4142, a4 = 0.5412, b1 = 1.0824, b2 = 1.4142, and b3 = 2.6131. It can be easily shown that these matrices can also be expressed as + Ed = D4 Ed and Ed = D4 Ed , where D4 is a diagonal ma+ trix, diag{1, a2 , 1, a4 }. Since this diagonal matrix can also be absorbed into the quantization process, we need to con sider only Ed and Ed which are expressed as + 4 0 0 0 0 0 0 0 0 0 0 0 c1 0 c2 1 + Ed = (10) 0 4 0 0 0 0 0 0 0 0 0 0 c1 0 c2 1 0 0 0 0 b1 b2 b3 1 0 0 2 2 0 0 0 0 Ed = 0 0 0 0 0 b2 0 1 (11) 0 0 2 2 0 0 0 0

And multiplications to compute these matrices are replaced by 1s or 2a2s. 3. PROPOSED ALGORITHM The proposed algorithm is also based on the decomposition of S as in [3]. The main difference is that we use a new matrix instead of H4 and to further decompose the matrices, which results in reduces computations and approximation errors. More specically, it is noted that H4 is related with

where c1 = 1.8478 and c2 = 0.7654. These matrices need fewer multiplications and additions than the original ones, and thus saving in computational complexity is achieved. In summary, the overall computations are 80 multiplications and 400 additions, whereas the pixel domain transcoding requires 80 multiplications, 720 additions, and 64 shifts. Table 1(a) shows the comparison of computations required for the pixel domain transcoding implemented by a fast IDCT and ITs (AAN IDCT + 4 IT), two existing algorithms [2][3], and the proposed algorithm. In this Table, it is shown that the proposed algorithm requires less computational complexity than others. And Table 1(b) shows the number of operations required for the implementation of the algorithms in a typical DSP which shows the similar results.

4. COMPARISON OF PICTURE QUALTITY Since the conversion matrices of Shens and proposed algorithms are approximations of the original one, it is required to compare picture qualities of converted videos. For the comparison, Foreman and Akiyo sequences encoded by MPEG-2 Main prole and their down-sampled sequences in the DCT domain are used as references. In Figure 1, the pixel domain transcoding, Shens algorithms that use 1s and 2a2s instead of a multiplication, Xins algorithm and the proposed algorithms are compared in terms of frame number vs. PSNR. It can be observed that the proposed and Xins algorithms show slight degradation from the pixel domain transcoding result, whereas Shens approximations result in larger errors. Figure 2 shows the result in the cases of 8 8 to 4 4 conversion. It can be observed that the proposed method shows almost the same result as the pixel domain transcoding. 5. CONCLUSION We have proposed a fast algorithm that converts DCT coefcients into IT coefcients for the transcoding of MPEG-x into H.264. By matrix decomposition and approximation of IT matrix to diagonal and DCT matrices, the computational complexity for the conversion is much reduced. Also, it has been shown that the proposed algorithm provides almost the same picture quality as the pixel domain transcoding. 6. REFERENCES [1] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity Transform and Quantization in H.264/AVC, IEEE Transactions on Circuit and Systems for Video Technology, vol. 13, no. 7, pp 598603, July, 2003. [2] J. Xin, A. Verto, and H. Sun, Converting DCT coefcients to H.264/AVC Transform Coefcients, Technical Report of Mitsubishi Electric Research Lab., TR2004-058, June 2004. [3] B. Shen, From 8-Tap DCT to 4-Tap Integer-Transform for MPEG to H.264/AVC Transcoding, Proc. 2004 IEEE ICIP, pp. 115-118, Oct. 2004. [4] N. Merhav and V. Bhaskaran, Fast Algorithms for DCT-Domain Image Down-Sampling and for Inverse Motion Compensation, IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, no. 3, June 1997. [5] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, 1993.

Table 1. The number of computations required for the conversion into the same resolution. NOC : the number of total computations (a) General purpose processor METHOD AAN IDCT [5] + 4 IT Xins [2] Shens [3] Proposed MUL 80 352 144 80 ADD 720 352 544 400 SHIFT 64 0 112 64 NOC 864 704 800 544 NOC 1104 1056 800 1072 784

(b) TMS320C6000 class DSP METHOD MUL ADD SHIFT AAN IDCT [5] + 4 IT 80 720 64 Xins [2] 352 352 0 Shens (1s) [3] 0 576 224 Shens (2a2s) [3] 0 704 368 Proposed 80 400 64

3.2. Conversion in the Half Resolution If the target resolution is the half, 4 4 coefcients in the low band of 8 8 DCT coefcients are converted into IT coefcients as Z = H4 Tt X00 T4 Ht /2 4 4 (12)

where X00 is the lower part of given DCT coefcient matrix X, and Z is the target IT coefcient matrix. As in the previous case, if H4 is replaced by D1 T4 , all operations can be 4 absorbed into the quantization process. Note that the pixel domain transcoding implemented by using Cho and Lees fast 4 4 IDCT algorithm[6] needs 16 multiplications, 138 additions, and 16 shifts whereas no computations is required in the proposed method.

40 39 38 37 36

Foreman QCIF sequence, H.264/AVC QP = 28 Pixel Domain Jun Xin Bo Shen 1s Bo Shen 2s2a Proposed

PSNR (dB)

35 34 33 32 31 30 29 35 5 10 15 20 25 30 Frame Number 35 40 45 50 37 36 Foreman QCIF sequence, H.264/AVC QP = 28 Pixel Domain Proposed

PSNR (dB)
Pixel Domain Jun Xin Bo Shen 1s Bo Shen 2s2a Proposed

34 33 32 31 30

(a) Foreman
41 40 39 38 Akiyo QCIF sequence, H.264/AVC QP = 28

10

15

PSNR (dB)

20 25 30 Frame Number

35

40

45

50

37 36 38 35 37.5 34 37 33 5 10 15 20 25 30 Frame Number 35 40 45 50 36.5

(a) Foreman
Akiyo QCIF sequence, H.264/AVC QP = 28 Pixel Domain Proposed

(b) Akiyo Fig. 1. Accumulated Errors in the case of conversion into the same resolution. [6] N. I. Cho, and S. U. Lee, Fast Algorithm and Implementation of 2-D Discrete Cosine Transform, IEEE Transactions on Circuits and Systems, vol. 38, no. 3, pp.297-305, March 1991.

PSNR (dB)

36 35.5 35 34.5 34

10

15

20 25 30 Frame Number

35

40

45

50

(b) Akiyo Fig. 2. Accumulated Errors in the case of reducing the resolution by half.

S-ar putea să vă placă și