Sunteți pe pagina 1din 10

SubjectiveEffectsofMP3compression

{1}

Subjective Effects of MP3 Compression


Shaunak De Rizvi College of Engineering ABSTRACT Most audio recorded, produced and distributed today is in the digital format. However large quantities of data is generated in the process of conversion of analog sound signal into digital data. To reduce the cost of storage, various compression techniques are used. The most popular compression format is the recent years is the Motion Pictures Experts Group-2 Audio Layer 3 or the MP3 format. This paper examines the various subjective effects of MP3 compression of various fundamental tones that construe music: sine, square and sawtooth. It also examines the effects of compression on real world samples. The aim has been to determine the best set of parameters for MP3 compression that balance space requirements and faithful reproduction of the original audio. Keywords: MP3, bitrate, subjective effects INTRODUCTION Audio is almost exclusively produced, recorded, reproduced and distributed on digital media today, largely due to is various advantages in processing, manipulation, post production, storage and distribution. However, to record sound, primarily an analog signal, in discrete digital format, the signal must be repeatedly sampled over time at a sufficiently large resolution. This produces massive amounts of data. The storage and handling of this large quantity of data would be expensive and impractical it it wasn't for various compression techniques, both lossy and lossless. One technique that has primarily emerged as the de-facto compression format in the recent years is the Motion Pictures Experts Group-2 Audio Layer 3 or the MP3 format. MP3, is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage, transmission over the internet and playback on mobile devices. The format uses a lossy compression algorithm which reduces or removes information from the sound track, in such a manner, that an average persons perception of the sound remains unchanged. The compression works by reducing accuracy of certain parts of sound that are considered to be beyond the auditory resolution ability of most people. This method is commonly referred to as perceptual coding. At a fundamental level it can be understood to remove various overtone and harmonic spatial components that are required to fully reproduce a sound, however play only accessory roles in human perception.[1] It is a well known fact that for the accurate reproduction of a square wave, several harmonic

SubjectiveEffectsofMP3compression

{2}

components must be successfully represented. This fact is used in the testing of amplifier performance.
[2]

It is reasonable to assume that the loss of overtone and harmonic data would cause a misrepresentation of sound once compressed. It is this idea that we have investigated and attempted to subjectively examine its effect on compressed music and sound, by means of a series of experiments. EXPERIMENT The experiment comprises of three tests: 1. Pure Tone Test: Three pure tones Sine, Square and Sawtooth, each of 440Hz, 0.8 amplitude and 30 second duration were generated in a software synthesizer [Audacity]. They were exported to both an uncompressed format and the MP3 format at various bitrates. Each sample were then compared on first a speaker, then earphones. Careful observation for compression artifacts were carried out. 2. Sample Music Test: The goal of any sound compression algorithm is eventually to provide a faithful representation of musical works for storage, distribution and playback. Three tracks were selected for the test: The Good, the Bad and the Ugly[3], A New Hope[4], The Heart Asks Pleasure First[5]. These songs provide a good mix of the three basic tones tested above and each represents a challenge in terms of the content. 3. Speech Test: A section of John F Kennedy's speech[6] on September 12, 1962 at the Rice Stadium, on the decision to go to the moon was selected. This was also compressed at various bitrates using MP3 compression. The objective of this test was to demonstrate the ability for human speech to be coherent and retain quality in spite of severe compression. APPARATUS AND SET-UP The tone generation, editing and file handling was carried out on Audacity1, a cross-platform digital audio editor and recording application. Encoding for the MP3 format was done using the opensource encoding utility LAME2. All software was installed on a computer running Fedora Linux with the ALSA3 drivers installed. Playback was done in the VideoLAN media player. Complete technical specification of the system and sound subsystem are as follows: System and Hardware Operating System Linux Kernel Hardware ALSA drivers Audacity
1 http://audacity.sourceforge.net 2 http://lame.sourceforge.net 3 Http://www.alsa-project.org

Fedora 13 2.6.33.3-85-fc13 AMD Phenom 9650 x64 / 2.7 GB RAM Alsa-driver-1.0.23 Synthesizer and Encoder Build: 1.3.12-beta

SubjectiveEffectsofMP3compression

{3}

FLAC importing LAME encoder

Libflac LAME 32bits version 3.98.3

VLC player 1.06 Goldeneye These drivers and software versions were chosen for their known stability and performance, to ensure that the results are not influenced in anyway by specific software bias. All software used are freely distributed under the open source license. Thus the test can be verified with ease. Microphone hardware was disabled during playback to prevent any chance of interference. Playback was first carried out on an Altec Lasing 2.1 speaker system, then on a pair of creative EP-630 ear phone. The ear phones were of the 'in-ear' noise canceling type, and hence most comparisons were done with their aid, to prevent background noise from influencing the test in any way. For improved accuracy, and reduced ambient noise, all testing was done at night. PROCEDURE Pure Tone Test The pure tones: Sine, Square and Sawtooth were generated using Audacity's built in synthesizer. The parameters provided to the synthesizer were as follows: Tone Shape Frequency Amplitude Duration Sine, Square and Sawtooth respectively 440 Hz 0.8 30s

These tone samples were then exported to uncompressed and compressed formats. The parameters used for the export are as follows: WAV MP3 MP3 MP3 16 Bit Linear PCM 64 Kbps, Constant Bitrate, Stereo 128 Kbps, Constant Bitrate, Stereo 256 Kbps, Constant Bitrate, Stereo Uncompressed Compressed Compressed Compressed

Sample Music Test The source for the music sample were lossless FLAC files.[7] Short segments, ranging from 30 seconds to 45 seconds were selected such that there were no key changes in the passages and no sudden amplitude changes. Segments were selected so as to, as wholly as possible, represent all the instruments used in the particular song. The export formats were as specified in table number 3. Analysis of instrument wave forms reveals that most string instruments like the Acoustic Guitar, Harp, Viola, Sitar etc. produce primarily sinusoidal oscillations. The same applies for tube based instruments like trumpets, horns and saxophones. Organs, and reed based instruments, like the Harmonium produce fundamentally square waves. Sawtooth waves are produced by only electronically synthesized

SubjectiveEffectsofMP3compression

{4}

instruments and over-driven electric guitars. Each instrument, no doubt, has its own timbre, fluctuations and characteristic attack, sustain, reverb and fading pattern. The songs were chosen to allow an even representation of all these waveforms, dynamics, qualities and timbre. The three songs chosen for comparison were: The Good, The Bad Predominant acoustic string instrument sections. and The Ugly (Main Title) A New Hope The Heart Asks Pleasure First Speech Test A section of John F. Kennedy's speech at the Rice University stadium on September 12, 1962 was taken. The objective of this test was to demonstrate the compressibility of the human voice. The sample clip was severely compressed and tested for coherence. The aim was find the smallest possible file size that allowed the section of speech to be coherent. All comparisons were done on a pair of Creative EP-630 ear earphones. The comparisons were conducted at night to reduce ambient noise. Each sound clip was played using the default VLC media player settings, with post processing disabled. To eliminate human error the clips were scrutinized three times, with 1 minute breaks between tests. Points of interests were noted and replayed to confirm artifacts. OBSERVATIONS TEST I Sine Wave Subjective Feel: Sinusoidal waves are soothing the the human ear. It is 'easy' to listen to and appears 'sweet' and 'refreshing'. Format 16 Bit PCM WAV 64 Kbps MP3 - Very accurate representation of original tone - Little or negligible perceivable loss of quality - Slight 'muting' and band limiting observed - Indistinguishable from original track - No muting or loss of quality observed Observations Size 1.7 MB 79.3 KB Compression
Uncompressed

Trumpet and horn based sections. Piano based piece. Good test of dynamics and brightness.

95.335 %

128 Kbps MP3

158.5 KB 90.676 %

SubjectiveEffectsofMP3compression

{5} 316.8 KB 81.364 %

256 Kbps MP3

- Indistinguishable from original

Comments: Sinusoidal oscillations are pure tones that occupy single frequencies in the frequency domain. No overtones are required for reconstruction. Hence they seem to be unaffected even by relatively severe compression. The bitrate 128 Kbps is sufficient for good quality representation of the sine tone. If storage space is a concern, 64 Kbps also provides a highly acceptable representation. String, wind and brass instruments typically produce sine waves. Square Wave Subjective Feel: Very annoying to listen to. Cause feelings of pain, irritation anger and frustration. Difficult to endure even for relatively short periods. Lower frequency square waves more painful to hear. Format 16 Bit PCM WAV 64 Kbps MP3 - Pronounced 'wooshing' sound heard in background - Poor representation of original sound - Severe loss of quality - Pronounced periodic fading and Leslie effect4. - Very distorted output - Close approximation - Slight periodic fading observed due to removal of overtones - Faint wooshing sounds observed Observations Size 1.7 MB 79.3 KB Compression
Uncompressed

95.335 %

128 Kbps MP3

158.5 KB 90.676 %

256 Kbps MP3

- Slight sweeting of tone 316.8 KB 81.364 % - Lack of ringing in ear, as caused by original tone - Compression may be beneficial for human perception

Comments: Square wares require several overtones and harmonics for accurate representation. The compression process removes all these components and cause heavy distortion, especially at low bit rates. 128 and 64 Kbps are not able to sufficiently represent the sound, with the tone distorting severely at low bit rates. Even at 256 Kbps, the representation presents some 'rounding' and 'sweetening'. Given the annoying nature of the tone, this may actually be beneficial. Some types of organs and reed based instruments typically produce square shaped waves.

4 http://en.wikipedia.org/wiki/Leslie_speaker

SubjectiveEffectsofMP3compression

{6}

Sawtooth Wave Subjective Feel: The sound feels 'synthesized'. More like a duck's 'quack'. Not a very 'natural' sound. Slightly uncomfortable, but not annoying. Format 16 Bit PCM WAV 64 Kbps MP3 128 Kbps MP3 256 Kbps MP3 - Poor reproduction - Clear 'wooshing' sounds - Faithful representation - Slightly muted, band compressed - Indistinguishable from original Observations Size 1.7 MB 78.7 KB Compression
Uncompressed

95.370 %

157.2 KB 90.752 % 314.2 KB 81.517 %

Comments: At 64 Kbps, the representation is poor and not usable. 128 Kbps provides a fair representation. A slight band compression is observed, which is easily overcome by equalization in any playback system. 256 Kbps provides faithful representation. Electronic instruments like synthesizers and processed electric guitars typically produce sawtooth waves. OBSERVATIONS TEST II The Good, The Bad and The Ugly The song heavily relies on string acoustic instruments like violas, harps, violins and guitar. No noticeable synthesized effects. Format 16 Bit PCM WAV 64 Kbps MP3 - Fairly good approximation, very usable - Low whistle ringing [18, 28 second] - Guitar acceptable [45 second] - Practically indistinguishable from original - Only under side by side comparison, band-limited nature apparent - Indistinguishable from original - Very faithful Observations Size 7.7 MB Compression
Uncompressed

359.9 KB 95.326 %

128 Kbps MP3

719.7 KB 90.653 %

256 Kbps MP3

1.4 MB

81.818 %

Comments: 64 Kbps provides a very usable representation of the track. Slight loss of quality at some points is noticeable under comparison. 128 Kbps and above is practically indistinguishable from the

SubjectiveEffectsofMP3compression

{7}

original. Note: The compression ratios show negligible deviation from those observed with pure tones. A New Hope The song uses thick brass instruments for the melody and rhythm patterns. The sounds ave very layered and complex to hear. Format 16 Bit PCM WAV 64 Kbps MP3 - Negligible loss in quality due to brass instruments being predominantly sine waves - Some bandwidth limitation flaws [50 second] - Indistinguishable from original - Indistinguishable from original Observations Size 10.1 MB Compression
Uncompressed

471.7 KB 95.329 %

128 Kbps MP3 256 Kbps MP3

943.4 KB 90.659 % 1.8 MB 82.178 %

Comments: 64 Kbps provides a very usable representation of the track. This could be because the track is 'sine' heavy, being predominantly composed of sine waves. The track is indistinguishable from the original at 128 Kbps and above. The Heart Asks Pleasure First This is a acoustic grand piano based song, complete with piano dynamics and accidentals. It makes a good test of the brightness and 'feel' of the compressed clip. Format 16 Bit PCM WAV 64 Kbps MP3 - Slightly rounded and fuzzy tone - Slight Band limitation - Flaw during complex phrases [14 Seconds] - Very usable - Very close to original - Slight loss of dynamics and brightness [9 Seconds] - Indistinguishable from original Observations Size 10.1 MB Compression
Uncompressed

471.7 KB 95.329 %

128 Kbps MP3 256 Kbps MP3

943.4 KB 90.659 % 1.8 MB 82.178 %

Comments: 64 Kbps provides a very usable representation of the track. Primarily because the piano is eventually a string instrument and thus sine wave based. Although all three compression rates provide great representation, the complex dynamics of the piano unfortunately are not captured even at 256 Kbps. A testament to the dynamic variation of the instrument and the skill of the musician.

SubjectiveEffectsofMP3compression

{8} OBSERVATIONS TEST III

Speech Test The extract of the speech used is from John F. Kennedy's address at the Rice University Stadium, on September 12, 1962, entitled The Decision to go to the Moon.5 The objective of the experiment was to establish the maximum compression ration achievable before the speech become incoherent. Format 16 Bit PCM WAV 64 Kbps MP3 24 Kbps MP3 16 Kbps MP3 - Perfectly clear - Very usable and coherent - Audience noise slightly distorted [1:09] - Coherent and usable - Echo and Phasor like effects observed - Heavy distortion of both speech and crowd noise - Incoherent Observations Size 19.7 MB Compression
Uncompressed

916.0 KB 95.350 % 343.5 KB 98.256 % 229.0 KB 98.837 %

8 Kbps MP3

114.5 KB -

Comments: Even under severe compression, [98.837%] the speech is highly coherent. Words can be clearly distinguished, even at 16 Kbps. How ever given the minimal difference in storage space and the great increase in quality, 24 Kbps is recommended. Note: Due to algorithm limitations, for 24 and 16 Kbps encoding, the track was re-sampled at a frequency of ~22Khz. Transcript of extract used from the speech: ... theater of war. I do not say the we should or will go unprotected against the hostile misuse of space any more than we go unprotected against the hostile use of land or sea, but I do say that space can be explored and mastered without feeding the fires of war, without repeating the mistakes that man has made in extending his writ around this globe of ours. There is no strife, no prejudice, no national conflict in outer space as yet. Its hazards are hostile to us all. Its conquest deserves the best of all mankind, and its opportunity for peaceful cooperation many never come again. But why, some say, the moon? Why choose this as our goal? And they may well ask why climb the highest mountain? Why, 35 years ago, fly the Atlantic? Why does Rice play Texas? We choose to go to the moon. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too. It is for these reasons that I regard the decision last year to shift our efforts in space ...
5 http://history.nasa.gov/moondec.html

SubjectiveEffectsofMP3compression

{9}

CONCLUSIONS, OBSERVATIONS AND RECOMENDATIONS From the above results we can conclude that most real life audio tracks can be sufficiently represented using MP3 compression at a bitrate of 128 Kbps for most everyday playback situations. For relative high quality storage, where space is not of primary concern, 256 Kbps is recommended. It is found that 256 Kbps is indistinguishable from uncompressed CD quality audio to most people. However, it is noted that Piano dynamics and nuances can only be wholly represented in uncompressed or lossless formats like the WAV and FLAC. Human speech is found to be perfectly coherent even at a severe 98.837% compression [16 Kbps] however a minimum 24 Kbps is recommended.

SubjectiveEffectsofMP3compression

{10}

References
[1] Brain, M. (2001). How MP3 Files Work . HowStuffWorks.com [2] Square Wave Testing for Frequency Response of Amplifiers. http://www.kennethkuhn.com/students/ee351/text/square_wave_testing.pdf. [3] The Good, the Bad and the Ugly (soundtrack). http://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly_%28soundtrack%29 [4] A New Hope (Soundtrack). http://en.wikipedia.org/wiki/Star_Wars_Episode_IV:_A_New_Hope_ %28soundtrack%29 [5] [6] [7] The Piano (Soundtrack). http://en.wikipedia.org/wiki/The_Piano_%28soundtrack%29 John, KF. The Decision to Go to the Moon. , 1962. Free Lossless Audio Codec. http://www.flac.org/

S-ar putea să vă placă și