Audio Compression

For in depth information on audio compression refer to this Wikipedia page. Following is a summation of what can be found on that page.

Raw audio data is complex and rich. Because of this complexity even short recordings of sound take up a lot of memory when saved in their raw format (e.g. WAV). As sound (or audio) makes up such a big part of digital media today, many methods have evolved to compress it into a more usable form so it can be stored and transmitted faster and more efficiently. Typical applications that use compressed audio are: digital radio, digital video (e.g.YouTube), DVDs and Blue-Ray discs, and perhaps the best known of all - personal music players e.g. iPod.

Raw audio data can be large in size. For example, a one minute recording of classical music using Audacity results in a folder containing 20MB of project files. See image below.

The reason for the large folder size is because Audacity samples the audio stream at a rate of 44,100 Hertz (41,100 times a second) and stores each sample point using 8 bytes (32 bits) of memory. See images below.

The sound wave as recorded by Audacity is shown in the images below at two levels of magnification. In the right-hand zoomed image sample points are clearly visible.

Normal zoom	Maxi zoom

For more information on Audacity sound formats click on this link: Audacity help manual - the basics

There are two types of audio compression: lossless and lossy. The screenshots below show the Audacity export settings for the WAV, OGG and MP3 formats. As can be seen: quality settings for OGG can be selected using a slider; different wav formats can be chosen (default is Microsoft 16 bit PCM for Windows computers); playback bit rate for mp3s can be chosen - 128 Kbps (kilo bits per second) is the default for CD quality sound. For example, a mp3 file of size 3 MB (roughly 3000000 X 8 = 24000000 bits) will take 24000000/128000 seconds to play. This is 24000/128 = 187.5 seconds or about 3 minutes.

OGG	WAV

MP3

For more information on audio file formats ckick on this link: audio file format

Lossless audio compression

Lossless compression systems are able to reduce the file size of raw recorded audio data without losing any of it in the process. This means the uncompressed version of the audio is as good as the 'before compression' version.

Sound waveforms are complex by nature and this means lossless compression systems at best can only reduce file sizes by around 60%. There are two reasons for this:

Normal compression methods seek patterns and repetition, but audio streams are chaotic in nature.
Values of audio samples change quickly so there may be discontinuation in the number of consecutive bytes.

Lossless formats are large by nature so are are best suited for specialist purposes such as: archiving, editing (audio engineers), high quality playback, and the creation of master copies for reproduction.

Common lossless formats are "Free Lossless Audio Codec (FLAC -see middle image above), Apple's Apple Lossless, MPEG-4 ALS, Windows Media Audio 9 Lossless (WMA Lossless)". Wikipedia - lossless formats. For a full list of formats go to lossless codecs.

Note: the word codec is short for compression/decompression and refers to method or standard used to compress/decompress data.

Lossy audio compression

This form of compression is much more common than lossless compression. Well known formats are mp3 and AAC. For a comprehensive comparison of formats click on the following link: comparison of audio formats

Depending on the saving quality selected an uncompressed WAV file can be converted to an mp3 that is 10% or less of the size. Below is a screenshot of the same file saved in four different formats using the (now free) Spider Player Professional music player. The player can be downloaded from http://spider-player.com/. As can be seen the uncompressed WAV files is the largest and the compressed wma is the smallest. Note that default settings were used for the conversion and lower (and higher) figures could have been obtained.

How lossy audio compression works
Not all audio data/sounds can be detected by the human ear. Those that can not, may be safely removed. In addition, some sounds can be masked by louder ones. This means the softer sounds can be removed without their loss being detectable. Music audio streams are complex in nature when compared to ones featuring the human voice. Because of the lower level of complexity, human voice recordings can be compressed more than music audio. Also, high frequency sounds are hard to detect so can be removed too.

In all compression schemes parameters can be adjusted for best sound quality or conversely smallest file size - see image below. Note that with any method there is a loss in the quality of the reproduced sound. However, the advantages of smaller file sizes generally outway any sound quality loss.

Programs to use to learn about audio compression

Audacity - an excellent opens source (and multi-platform) program that allows for audio track creation, editing and inspection; the manual gives good background information; information on the LAME MP3 encoder can be had from the bottom of this page; note version 1.3 of Audacity has significantly increased ability and is well worth experimenting with.
Spider Player Professional - an excellent free audio player that allows the recording and ripping of audio, and conversion between many audio formats.

Comments