Math 441

Floating Point Numbers

Numbers in Computers

binary
\(N\) bits
Integers:
- Arithmetics modulo \(2^N\)
- Unsigned integers: \(0\) to \(2^N - 1\)
- Signed integers: \(-2^{N-1}\) to \(2^{N-1} - 1\).
- Exercise:
  
  Write down all three-bit integers, first as binary, then each of them as an unsigned integer, then each as signed integer.
Floating point numbers

Floating Point Numbers

\(k\): number of bits for exponent
\(t\): number of bits for mantissa
\(1 + k + t = N\)

\[x = {\color{#44ee44}(-1)^s} \cdot {\color{red}2^e} \cdot {\color{blue}(1.f)_2} = {\color{#44ee44}(-1)^s} \cdot {\color{red}2^{(c)_2 - b}} \cdot {\color{blue}(1.f)_2}\]

Exponent \(e = (c)_2 - b\) where \(b\) is called a bias. Typically \(b = 2^{k-1} - 1\).

8-bit Example:

\(k = 3\): number of bits for exponent
\(t = 4\): number of bits for mantissa
Bias: \(b = 2^{3 - 1} - 1 = 3\)

Largest 8-bit Number:

\(k = 3\): number of bits for exponent
\(t = 4\): number of bits for mantissa
Bias: \(b = 2^{3 - 1} - 1 = 3\)

“Smallest” Positive 8-bit Number:

\(k = 3\): number of bits for exponent
\(t = 4\): number of bits for mantissa
Bias: \(b = 2^{3 - 1} - 1 = 3\)

Exercise:

Consider 6-bit floating point numbers with 3-bit exponent and 2-bit mantissa:

How many positive floating point numbers are there?
What is the smallest one?
What is the largest one?
Find all of them, convert them to decimal representation, and plot them on a number line. Note how they are distributed!

Some Questions

How do you express 0?
Why didn’t we use all 0’s and all 1’s as exponents?
- These are reserved for special cases.

\(\pm 0\), Subnormal Numbers

\(0\):

\(-0\):

\(\pm\infty\), NaN

\(\pm\infty\):

NaN:

IEEE 64-bit Floats

\(k = 11\)
\(t = 52\)