Data Compression

Introduction

It could be said that the main job of a computer is to make sense and organise data so it can be understood by the user. The data has to be organised and stored in an efficient manner. Today's desk and laptop computers have relatively fast processors and large amounts of RAM memory and hard disk space, compared to the computers of ten years ago. So, it could be thought that the amount of space data takes up is not an issue. Following are some of the it is:
  • New technologies call for greater amounts of storage e.g. DVDs (4.7GiB) and Blue-ray discs (25-50GiB) ) compared to CDs (700MB).
  • New technologies call for more compact storage due to low system specifications e.g. smart phones which have limited amounts of memory and relatively slow processors; also limited battery capacity.
  • More data is being used (and transmitted) by applications e.g. Media rich web pages which can contain Flash animations/videos, weather, news and advertisement feeds.
  • Data is being transmitted over costly mediums e.g. mobile phone broadband networks.
Below is an image of some of the specifications for a popular smart phone, the IDEOS U8150. Its internal memory and processor speed have been highlighted to show their low values. Click on the image to see a larger version.

Compression and Decompression

There will always be a need to store and transmit data in the most efficient form no matter how much memory computing devices have. For a long time the answer has been data compression. To put it plainly, data compression involves a certain amount of raw data (for example a sound file) being processed and stored in a much more compact form. For instance, the size of an uncompressed wav sound file may be 40MB, whereas the compressed mp3 version could be 4MB - a compression factor of 10 (40/4=10). The processing is done by computer applications that use specific compression algorithms. There are many of these, each with its own strengths and weaknesses and particular areas where they are used.

Compressed data must be decompressed before it can be used. The decompression process is usually a reversal of the compression process. For example, jpeg images are commonly found on web pages. Jpeg is a compression format, so before an image can be shown (or rendered) on a page the browser must decompress it into a bitmap image. Fortunately the jpeg format is a universal standard and well understood by all browsers.

It is not commonly known that web servers and web browsers routinely compress and decompress data traffic that flows between them to speed up web page loading. The images below show a header sent by the Firefox browser requesting a specific page and the Apache server's response header. Notice the compression methods each are using or can potentially use. The browser is telling the server it can use the gzip and deflate methods and the server is saying it can use gzip.

This web page allows you to test a URL to see what, if any, compression method is used: GIDZipTest

Click on the images to see larger versions.

Browser request header
Server response header

Also, not commonly known is that Morse Code is a form of data compression. Notice in the image below that letters that are likely to appear more often in communications have shorter dot dash sequences (or codes). For example, e, i and a. If all letters had the same numbers of dots and dashes communicating would be slower. From http://upload.wikimedia.org/wikipedia/commons/b/b5/International_Morse_Code.svg



The pages in this section of the website contain information on compression as it relates to images, text, sound (or audio) and video.

Additional resource

The following web page contains some interesting information on data compression: introduction to data compression