Documente Academic
Documente Profesional
Documente Cultură
pipeline operation. The inputs of statement (3) are the output of statement (1) and (2). So the three statements should execute one by one. Furthermore, each MUL instruction occupies two cycles. One multiply and one add operation need five cycles when running on ARM. In sub-band synthesis filter, multiply-add is the main operation, which consumes many cycles at each operation. NEON can help in the situation. VMUL of NEON instruction finishes vector multiplication in one cycle, which is equivalent to two multiply operations. The multiply-add operation is converted into NEON code:
VMUL D1, D2, D3
D1~D3 are the independent NEON register vectors. D2 contains values of r2 and r5, while D3 contains values of r3 and r6. The operation result is stored in D1. The one NEON instruction finishes 2 multiplications. Moreover, VMLA of NOEON instruction is equal to two multiply-add operations. After NEON optimization, it can reduce multiply-add operation time and the computing time of the module. IMDCT is the second largest computing time consumption module in the MP3 decoder, about 25 percent of the total. IMDCT has 32 frequency sub-band. Each subband contains one long window or three sequential short windows. Long window is consisted of 18 frequency lines, and short window is consisted of six frequency lines. The formula of IMDCT is: After the algorithm level optimization, IMDCT is converted to the algorithm, which includes mainly multiply-add operation. Its similar to optimization method of sub-band synthesis filter that VMUL and VMLA of NEON can replace multiply-add instruction of ARM code efficiently. It reduces the computing time of the IMDCT module by a large margin. The common audio decoders,
such as WMA, AAC and OGG, contain a large number of discrete cosine transform, so the same method of NEON instruction optimization can be used. The above method is also common. Furthermore, for multimedia processing features, NEON instruction set provides a range of optimized media processing instructions, such as the saturated vector operations, vector load/store and so on. If they are used properly, the optimization effect is very significant.