Floating Point Numbers

We can categorize the point numbers as Fixed point numbers and floating point numbers.

Fixed point numbers

Decimal Point which is fixed is referred to as the fixed point number.

Ex: There are only two digits after decimal point.

7.32

6.49

1.56

  • These numbers can be stored in the computer without any problem as the dot can be avoided while storing,since the position of the point is fixed and predetermined.

2)Floating Point Numbers:

Floating point numbers are numbers where the decimal point can float.

Ex:

73.2

7.32

7.124

Here the point is in different positions.In floating point numbers it is difficult to store the varying decimal point numbers because we cannot store a dot in the registers.

ex:73.2

  • 7 can be stored,3 can be stored and even 2 can be stored in the registers,registers are flip flops.Flip-Flops can only operate in 1 and 0 thus we cannot store a dot in any form.
  • Hence we ignore the decimal point dot by converting floating point to fixed point.
  • This conversion of floating point to fixed point is know as “Normalisation”

Normalization of Floating point Numbers

  • There should only be one non zero digit to the left of the decimal point.
  • For Decimal number the normalized form is represented with the power of 10,and for binary numbers the normalized form is represented with the power of 2

  Floating point(Decimal numbers) Normalized Number

       73.2        7.32 x 101

      649     649 x 102

      .847     8.47 x 10-1

      5.63     5.63 x 100

ex:5.63

0563

     exponent(E)         Mantissa(M)  

Mantissa(M): It is the Number without the Decimal point.

Exponent(E):The exponential value of the 10 for decimal numbers as exponential value of 2 for binary numbers.  

Floating point(Binary numbers)                                       Normalized Number

    0101.001     1.0100 x 2

    11111.01   1.111101 x 24

    -10.01                        -1.001 x 21

ex:-10.01

1001

   exponent(E)             Mantissa(M)  

  1.M x 2E

          = (-1)S x 1.M x 2E  Normalised form

  • Where S is a sign,0 for positive and 1 for negative.
  • As normalized numbers are stored in 1.M format, ‘1’ is not actually stored , it is instead assumed.This saves the storage space by 1 bit for each number.
  • The exponent is stored in the  biased form by adding an appropriate bias value to it so that negative numbers can be easily represented.

Advantages of Normalisation:

  • Storing all numbers in a standard makes calculation easier and faster.
  • By not storing 1(of 1.M format) for a number , considerable storage space is saved.
  • The exponent is biased so their is no need for storing its sign bit.

representation of floating point numbers:

Floating point numbers can be represented in formats:

1)Single precision format

2)Double precision format

1)Single precision format/Short Real/ IEEE 754 : 32 Bit Format

SBiased ExponentMantissa
(1)(8)
Bias value=127
(23 bits)
  • 32 bits are used to store the numbers.
  • 23 bits are used for Mantissa.
  • 8 bits are used for the Biased Exponent.
  • 1 bit is used for the sign of the number.
  • The Bias value is (127)10
  • The range is +1 x 10-38 to +3 x 1038 approximately.
  • It is called a single precision format for floating point numbers.

2)Double precision format/Long Real/ IEEE 754 : 64 Bit Format

SBiased ExponentMantissa
(1)(11)
Bias value=1023
(52 bits)
  • 64 bits are used to store the numbers.
  • 52 bits are used for Mantissa.
  • 11 bits are used for the Biased Exponent.
  • 1 bit is used for the sign of the number.
  • The Bias value is (1023)10
  • The range is 10-308 to 10308 approximately.
  • It is called as Double precision format for floating point numbers.

Adding floating point numbers:

1)Before adding compare exponents.

2)If the exponents are positive you can directly compare them by their values.

3)If different signs and Mantissa, shift the mantissa till they are equal and then add.

representation of floating point numbers examples

Steps for representation of floating point numbers:

1)Convert into binary number

2)Normalize the numbers.

3)Calculate the biased exponent.

4)Convert biased exponent into binary.

5)Substitute in the required format.

Ex: 14.125

Step 1:

Converting into binary 

1 1 1 0

Step 2:

Normalize the numbers.

(-1) x 1.110001 x 23

Step 3:

Calculate the biased exponent.

Baised Exponent (BE)=True value + Bias 

= 3 + 127

Step 4:

Convert biased exponent into binary.

BE = 1000010

Step 5:

Substitute in the required format.

0100000101100010

Ex: (2A3B)H

Step 1:

Converting into binary 

2=0010

A=1010

3=0011

B=1011

Step 2:

Normalize the numbers.

(-1)0 x 1.0101000111011 x 213

Step 3:

Calculate the biased exponent.

Baised Exponent (BE)=True value + Bias 

= 13 + 127

= 140

Step 4:

Convert biased exponent into binary.

BE = (10001100)2

Step 5:

Substitute in the required format.

01000110001010001110110
Spread knowledge

Leave a Comment

Your email address will not be published. Required fields are marked *