Floating Point Numbers
We can categorize the point numbers as Fixed point numbers and floating point numbers.
Fixed point numbers
Decimal Point which is fixed is referred to as the fixed point number.
Ex: There are only two digits after decimal point.
7.32
6.49
1.56
- These numbers can be stored in the computer without any problem as the dot can be avoided while storing,since the position of the point is fixed and predetermined.
2)Floating Point Numbers:
Floating point numbers are numbers where the decimal point can float.
Ex:
73.2
7.32
7.124
Here the point is in different positions.In floating point numbers it is difficult to store the varying decimal point numbers because we cannot store a dot in the registers.
ex:73.2
- 7 can be stored,3 can be stored and even 2 can be stored in the registers,registers are flip flops.Flip-Flops can only operate in 1 and 0 thus we cannot store a dot in any form.
- Hence we ignore the decimal point dot by converting floating point to fixed point.
- This conversion of floating point to fixed point is know as “Normalisation”
Normalization of Floating point Numbers
- There should only be one non zero digit to the left of the decimal point.
- For Decimal number the normalized form is represented with the power of 10,and for binary numbers the normalized form is represented with the power of 2
Floating point(Decimal numbers) Normalized Number
73.2 7.32 x 101
649 649 x 102
.847 8.47 x 10-1
5.63 5.63 x 100
ex:5.63
0 | 563 |
exponent(E) Mantissa(M)
Mantissa(M): It is the Number without the Decimal point.
Exponent(E):The exponential value of the 10 for decimal numbers as exponential value of 2 for binary numbers.
Floating point(Binary numbers) Normalized Number
0101.001 1.0100 x 2
11111.01 1.111101 x 24
-10.01 -1.001 x 21
ex:-10.01
1 | 001 |
exponent(E) Mantissa(M)
1.M x 2E
= (-1)S x 1.M x 2E → Normalised form
- Where S is a sign,0 for positive and 1 for negative.
- As normalized numbers are stored in 1.M format, ‘1’ is not actually stored , it is instead assumed.This saves the storage space by 1 bit for each number.
- The exponent is stored in the biased form by adding an appropriate bias value to it so that negative numbers can be easily represented.
Advantages of Normalisation:
- Storing all numbers in a standard makes calculation easier and faster.
- By not storing 1(of 1.M format) for a number , considerable storage space is saved.
- The exponent is biased so their is no need for storing its sign bit.
representation of floating point numbers:
Floating point numbers can be represented in formats:
1)Single precision format
2)Double precision format
1)Single precision format/Short Real/ IEEE 754 : 32 Bit Format
S | Biased Exponent | Mantissa |
(1) | (8) Bias value=127 | (23 bits) |
- 32 bits are used to store the numbers.
- 23 bits are used for Mantissa.
- 8 bits are used for the Biased Exponent.
- 1 bit is used for the sign of the number.
- The Bias value is (127)10
- The range is +1 x 10-38 to +3 x 1038 approximately.
- It is called a single precision format for floating point numbers.
2)Double precision format/Long Real/ IEEE 754 : 64 Bit Format
S | Biased Exponent | Mantissa |
(1) | (11) Bias value=1023 | (52 bits) |
- 64 bits are used to store the numbers.
- 52 bits are used for Mantissa.
- 11 bits are used for the Biased Exponent.
- 1 bit is used for the sign of the number.
- The Bias value is (1023)10
- The range is 10-308 to 10308 approximately.
- It is called as Double precision format for floating point numbers.
Adding floating point numbers:
1)Before adding compare exponents.
2)If the exponents are positive you can directly compare them by their values.
3)If different signs and Mantissa, shift the mantissa till they are equal and then add.
representation of floating point numbers examples
Steps for representation of floating point numbers:
1)Convert into binary number
2)Normalize the numbers.
3)Calculate the biased exponent.
4)Convert biased exponent into binary.
5)Substitute in the required format.
Ex: 14.125
Step 1:
Converting into binary
1 1 1 0
Step 2:
Normalize the numbers.
(-1) x 1.110001 x 23
Step 3:
Calculate the biased exponent.
Baised Exponent (BE)=True value + Bias
= 3 + 127
Step 4:
Convert biased exponent into binary.
BE = 1000010
Step 5:
Substitute in the required format.
0 | 10000010 | 1100010 |
Ex: (2A3B)H
Step 1:
Converting into binary
2=0010
A=1010
3=0011
B=1011
Step 2:
Normalize the numbers.
(-1)0 x 1.0101000111011 x 213
Step 3:
Calculate the biased exponent.
Baised Exponent (BE)=True value + Bias
= 13 + 127
= 140
Step 4:
Convert biased exponent into binary.
BE = (10001100)2
Step 5:
Substitute in the required format.
0 | 10001100 | 01010001110110 |