You are here

Lossy Audio Compression Fidelity

Error message

  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /homepages/14/d296057714/htdocs/drupal/includes/common.inc).
  • Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in menu_set_active_trail() (line 2396 of /homepages/14/d296057714/htdocs/drupal/includes/menu.inc).

Introduction

I began a project several years ago to attempt to automate the evaluation of the fidelity of lossy audio encoders. For each compression format or codec (COmpression and DECompression software, mp3, m4a, ogg vorbis, etc.) there are many encoders available (e.g. for m4a there is Nero, iTunes, and FAAC to name a few). This turns out to be a much more complicated task than I anticipated.

The goal was to remove the hard work and subjectivity from audio listening tests as conducted at HydrogenAudio or SoundExpert. The task is difficult because the way we perceive sound is subject to physiology and sensory perception that make a simple mathematical comparison of two bit streams a poor way to judge audio quality. A summary of these issues and the nature of perceived quality is in the Background Section.

The solution is to analyze the audio signals in a way analogous to the human ear and brain. My argument is that most of the elements of psycho-acoustic models used by lossy audio compression can be duplicated by modeling the physical processes occurring in the human ear. By modeling the action of the stereocilia (hair cells) within the inner ear as damped harmonic oscillators many of the features of psycho-acoustic models are reproduced. For example, the frequency dependence of time masking is a natural consequence of the decay rates of the oscillators. Details of my analysis can be found in the Calibration Section.

Using these results I conducted my own tests on 12 different encoders on 14 different audio clips at VBR settings corresponding to 128, 160, and 192 kbits/s quality settings. Note: These values are more like lower limits for VBR encoders. Average bit rates care significantly higher than this.

For example, lame -V 2 encoding (the old --preset standard) is often said to be transparent to most users and corresponds to 192 kbits/s. Some sources claim that Advanced Audio Codec (AAC) encoding at 160 kbtis/second is transparent.

The results of this test are in the Multi-Format Test Section. The short answer was that iTunes AAC encoder in either constrained or unconstrained mode, Nero AAC, or the Winamp AAC encoder are all statistically indistinguishable from each other. The other 8 encoders tested were inferior. The two best encoders appeared to be iTunes in unconstrained mode and Nero. However, ANOVA (ANalysis Of VAriance) showed the top four to be statistically indistinguishable.

For those interested in conducting their own tests, the code with instructions on how to compile can be found in the Software Section.