Floating Point Numbers

Introduction

Floating point numbers are numbers that contain a fractional part i.e. they contain a decimal point with numbers after it. They are called floating point because the point can 'float' or move when the number is expressed using scientific notation. For example 123.456 and 0.4546 can be expressed as 1.23456 x 102 and 4.546 x 10-1. In the first number the point has floated left, and in the second it has floated right.

Terminology

Points to note for the numbers 1.23456 x 102 and 4.546 x 10-1:
  • Base ten is used in the two numbers
  • 1.23456 and 4.546 are called the mantissa of each respective number
  • 2 and -1 (from 102 and 10-1) are called the powers or exponents of the numbers

Floating point binary numbers

Fixed point binary fractions have their limitations in regard to precision and flexibility.

An international standard called IEEE 754 floating point standard defines the way a floating point binary fraction is stored. There are two main forms. One uses 32 bits (single precision) to store a number and the other 64 bits (double precision) to do the same, but with more accuracy.

The 32 bit form is shown in the graphic below.


23 bits are set aside to store the mantissa, eight for the power (of 2) and one for the sign ( 0 => positive number, 1 => negative number).

Precision

The mantissa includes the digit to the left of the decimal point as well as those on its right. These digits will be ones or zeros and the one to the left will always be a 1 because if it were not the decimal point would be moved as a matter of course. Because this fact is always true 32 bit representation is said to have 24 bit precision (the normal 23 bits set aside plus 1 for the digit to the left of the decimal point: 23 + 1 = 24).

Using 64 bits to represent a number will have 53 bit precision. For more information visit this Wikipedia page

Mathematical operations can be done on floating point binary numbers.

The following page gives more information on floating point numbers and contains some useful exercises: IEEE 754 floating point representation


It is possible to have binary fractions in the mantissa, base, power (exponent) form. For example: 111.111 => 1.11111 x 102 And 000.00111 => 1.11 x 2-3
Comments