Audio Perception Background

Traditionally, these kinds of evaluations are done with listening tests and double blind testing. These tests involve one to dozens of listeners comparing different versions of a track to the original without knowledge of which codec was used. Special software and random name assignments are required to run such a test. If the listener is able to tell the difference between the compressed version and the original with some statistical confidence (say 8 times correctly), then they can enter a score from 1-5 rating the quality of the encoder.

These methods have several different names. ABX for comparison of A with known B and unknown X. ABC/hr for comparison of A and B to C with hidden reference. Also just "listening test" is used to describe the process.

The International Telecommunications Union (ITU) as defined to different systems for doing these listening tests:

ITU-R BS.1116-1 (Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems)

ITU-R BS.1534-1 (Methods for the Subjective Assessment of Intermediate Quality Levels of Coding Systems), also known as MUSHRA (Multiple Stimuli with Hidden Reference and Anchor)

Even though bias and subjectivity are screened for by hiding the sources from the tester. The final score of 1-5 by the tester is a purely subjective choice. The opportunity for bias can creep in if the listener is able to identify artifacts associated with a particular encoder.

My motivation then was really two fold:

1) Remove subjectivity from the assessment process by using a computer program to measure fidelity.
2) Save the time and effort required to assemble a group of testers and perform the listening tests according to one of the above protocols.