Floating Point Numbers

We can categorize the point numbers as Fixed point numbers and floating point numbers.

Fixed point numbers

Decimal Point which is fixed is referred to as the fixed point number.

Ex: There are only two digits after decimal point.

7.32

6.49

1.56

These numbers can be stored in the computer without any problem as the dot can be avoided while storing,since the position of the point is fixed and predetermined.

2)Floating Point Numbers:

Floating point numbers are numbers where the decimal point can float.

Ex:

73.2

7.32

7.124

Here the point is in different positions.In floating point numbers it is difficult to store the varying decimal point numbers because we cannot store a dot in the registers.

ex:73.2

7 can be stored,3 can be stored and even 2 can be stored in the registers,registers are flip flops.Flip-Flops can only operate in 1 and 0 thus we cannot store a dot in any form.
Hence we ignore the decimal point dot by converting floating point to fixed point.
This conversion of floating point to fixed point is know as “Normalisation”

Normalization of Floating point Numbers

There should only be one non zero digit to the left of the decimal point.
For Decimal number the normalized form is represented with the power of 10,and for binary numbers the normalized form is represented with the power of 2

Floating point(Decimal numbers) Normalized Number

73.2 7.32 x 10¹

649 649 x 10²

.847 8.47 x 10^-1

5.63 5.63 x 10⁰

ex:5.63

⁰

⁵⁶³

exponent(E) Mantissa(M)

Mantissa(M): It is the Number without the Decimal point.

Exponent(E):The exponential value of the 10 for decimal numbers as exponential value of 2 for binary numbers.

Floating point(Binary numbers) Normalized Number

0101.001 1.0100 x 2

11111.01 1.111101 x 2⁴

-10.01 -1.001 x 2¹

ex:-10.01

001

exponent(E) Mantissa(M)

1.M x 2^E

= (-1)^S x 1.M x 2^E→ Normalised form

Where S is a sign,0 for positive and 1 for negative.
As normalized numbers are stored in 1.M format, ‘1’ is not actually stored , it is instead assumed.This saves the storage space by 1 bit for each number.
The exponent is stored in the biased form by adding an appropriate bias value to it so that negative numbers can be easily represented.

Advantages of Normalisation:

Storing all numbers in a standard makes calculation easier and faster.
By not storing 1(of 1.M format) for a number , considerable storage space is saved.
The exponent is biased so their is no need for storing its sign bit.

representation of floating point numbers:

Floating point numbers can be represented in formats:

1)Single precision format

2)Double precision format

1)Single precision format/Short Real/ IEEE 754 : 32 Bit Format

S	Biased Exponent	Mantissa
(1)	(8) Bias value=127	(23 bits)

32 bits are used to store the numbers.
23 bits are used for Mantissa.
8 bits are used for the Biased Exponent.
1 bit is used for the sign of the number.
The Bias value is (127)₁₀
The range is +1 x 10^-38 to +3 x 10³⁸ approximately.
It is called a single precision format for floating point numbers.

2)Double precision format/Long Real/ IEEE 754 : 64 Bit Format

S	Biased Exponent	Mantissa
(1)	(11) Bias value=1023	(52 bits)

64 bits are used to store the numbers.
52 bits are used for Mantissa.
11 bits are used for the Biased Exponent.
1 bit is used for the sign of the number.
The Bias value is (1023)₁₀
The range is 10^-308 to 10³⁰⁸ approximately.
It is called as Double precision format for floating point numbers.

Adding floating point numbers:

1)Before adding compare exponents.

2)If the exponents are positive you can directly compare them by their values.

3)If different signs and Mantissa, shift the mantissa till they are equal and then add.

representation of floating point numbers examples

Steps for representation of floating point numbers:

1)Convert into binary number

2)Normalize the numbers.

3)Calculate the biased exponent.

4)Convert biased exponent into binary.

5)Substitute in the required format.

Ex: 14.125

Step 1:

Converting into binary

1 1 1 0

Step 2:

Normalize the numbers.

(-1) x 1.110001 x 2³

Step 3:

Calculate the biased exponent.

Baised Exponent (BE)=True value + Bias

= 3 + 127

Step 4:

Convert biased exponent into binary.

BE = 1000010

Step 5:

Substitute in the required format.

10000010

1100010

Ex: (2A3B)_H

Step 1:

Converting into binary

2=0010

A=1010

3=0011

B=1011

Step 2:

Normalize the numbers.

(-1)⁰ x 1.0101000111011 x 2¹³

Step 3:

Calculate the biased exponent.

Baised Exponent (BE)=True value + Bias

= 13 + 127

= 140

Step 4:

Convert biased exponent into binary.

BE = (10001100)₂

Step 5:

Substitute in the required format.

10001100

01010001110110

Spread knowledge

Floating Point Numbers

Fixed point numbers

Leave a Comment Cancel Reply