Floating Point Numbers
Arithmetics modulo \(2^N\)
Unsigned integers: \(0\) to \(2^N - 1\)
Signed integers: \(-2^{N-1}\) to \(2^{N-1} - 1\).
Exercise:
Write down all three-bit integers, first as binary, then each of them as an unsigned integer, then each as signed integer.
\[x = {\color{#44ee44}(-1)^s} \cdot {\color{red}2^e} \cdot {\color{blue}(1.f)_2} = {\color{#44ee44}(-1)^s} \cdot {\color{red}2^{(c)_2 - b}} \cdot {\color{blue}(1.f)_2}\]
Exponent \(e = (c)_2 - b\) where \(b\) is called a bias. Typically \(b = 2^{k-1} - 1\).
Consider 6-bit floating point numbers with 3-bit exponent and 2-bit mantissa:
\(0\):
\(-0\):
\(\pm\infty\):
NaN: