Binary numbers can represent not just whole numbers but also fractions. This topic explores fixed-point and floating-point representations, precision limits, and two’s complement formats.
Fixed-point representation
Fixed-point representation is a binary method used to store real numbers (numbers that include fractional parts) by fixing the location of the binary point. This method is conceptually similar to using a decimal point in base 10. In fixed-point, the binary point does not move and is placed at a predetermined position within the bit string.
Structure of fixed-point numbers
A fixed-point number has two parts:
The integer part, which consists of the bits to the left of the binary point.
The fractional part, which consists of the bits to the right of the binary point.
The number of bits allocated to each side determines both the range of values that can be represented and the precision (smallest possible change between values).
For example, using 8 bits with the binary point fixed after the fourth bit (from the left), the binary number 00101101 would be interpreted as:
Integer part: 0010 = 2
Fractional part: 1101 = 0.5 + 0.25 + 0 + 0.0625 = 0.8125
Final value: 2 + 0.8125 = 2.8125
Place value of bits in fixed-point
Practice Questions
FAQ
Decimal fractions such as 0.1, 0.2, or 0.3 cannot always be represented exactly in binary because of how base 2 works. Binary uses powers of 2, so only fractions that can be expressed as sums of inverse powers of 2 (like 1/2, 1/4, 1/8, etc.) can be represented precisely. For example, 0.5 and 0.25 convert perfectly, but 0.1 becomes a repeating binary fraction: 0.0001100110011... (and so on). Since computers store numbers using a limited number of bits, they must truncate or round the result, causing slight inaccuracies. This limitation affects both fixed-point and floating-point formats, although it's more commonly associated with floating-point due to its use in real-world scientific and financial computations. These tiny errors can accumulate in calculations and must be accounted for in software design to avoid serious inaccuracies, especially when exact values are crucial, such as in currency, physics engines, or simulations.
Normalisation ensures that floating-point numbers are stored in a standardised form where the mantissa starts with a non-zero digit (in binary, always 1). This eliminates redundancy and maximises the precision offered by the fixed number of mantissa bits. For example, instead of storing 0.011 × 2^5, normalisation shifts the mantissa left and decreases the exponent, storing it as 1.1 × 2^3. This guarantees that the most significant bit is utilised, preventing wasted space in the mantissa. Without normalisation, the same value could be stored in many different ways, making comparisons and arithmetic operations inconsistent or less efficient. Additionally, normalisation improves consistency in the level of detail (precision) stored, helping avoid unnecessary rounding errors. In hardware, normalised numbers are easier to compare and add, since the exponent is adjusted to align binary points uniformly. Therefore, normalisation helps maintain accuracy, speeds up operations, and simplifies implementation in both hardware and software environments.
Floating-point calculations are slower than fixed-point on many systems because they involve more complex hardware and require additional steps for handling the mantissa and exponent. In fixed-point arithmetic, operations such as addition and multiplication resemble integer operations and are straightforward: the binary point is fixed, so bit alignment is predictable. In contrast, floating-point operations require the exponents to be compared and aligned, mantissas adjusted (shifted), then the result normalised again, potentially with rounding. Each step adds computational overhead. Furthermore, floating-point units (FPUs) may not be present in all embedded or low-power systems, meaning floating-point operations have to be emulated in software, which drastically reduces performance. Fixed-point is often preferred in such environments for its simplicity and speed. In performance-critical or real-time systems—such as video processing or audio applications—fixed-point is frequently used despite the trade-off in range or flexibility. Therefore, while floating-point offers dynamic range, its operational cost is higher in constrained environments.
When a floating-point number is too small to be represented with the available number of bits, it results in a condition known as underflow. This occurs when the value is closer to zero than the smallest representable number. For instance, if a system can only store down to 2^-10, and a result evaluates to 2^-15, it falls below the minimum threshold. In most systems, underflow causes the number to be rounded to zero, known as subnormalisation or denormalisation if supported. If not supported, the number is simply lost, and this can introduce significant errors in algorithms that rely on maintaining small magnitude values, such as those used in physics simulations or probability models. Unlike overflow, which is often obvious due to a large incorrect value, underflow can be harder to detect. Mitigation strategies include increasing the bit-width of the exponent or adopting error-handling practices like tolerance thresholds or scaling techniques.
Rounding in binary is handled similarly to decimal systems but is constrained by the number of available bits. When a binary number has more digits than the format allows (e.g. a mantissa limited to 4 bits), the extra bits are either truncated (cut off) or rounded using various methods, such as round-to-nearest or round-toward-zero. This rounding introduces a small error known as a rounding error. Over a series of calculations, especially in floating-point arithmetic, these errors can accumulate, leading to significant deviations from the expected result. For example, adding very small values to a large value may result in no change if the small number is rounded away entirely. This issue is known as loss of significance. Rounding can also introduce non-associativity, where (a + b) + c ≠ a + (b + c), which breaks mathematical expectations. Programmers must be aware of how rounding affects data, especially in financial or scientific applications, and implement rounding modes cautiously.
