Fixed point arithmetic
Fixed-point numbers are useful for representing fractional values, usually in base 2 or base 10, when the executing processor has no floating point unit (FPU) or if fixed-point provides improved performance or accuracy for the application at hand. Most low-cost embedded microprocessors and microcontrollers do not have an FPU.
Definition
A fixed-point number is essentially an integer that is scaled by a certain factor. It is important to note that the scaling factor is determined by the type; it is the same for all values of a certain fixed-point type. Floating-point types, on the other hand, store the scaling factor as part of the value, which allows them to have a wider range of values.
The upper bound of a fixed-point type is simply the upper bound of the underlying integer type, divided by the scaling factor. Similarly, the lower bound is the lower bound of the integer type, divided by the scaling factor. For example, a binary fixed-point type in two's complement format, with f fractional bits and a total of b bits, has a lower bound of − (2b − 1) / 2f and an upper bound of (2b − 1 − 1) / 2f.
To add or subtract two fixed-point numbers, it is sufficient to add or subtract the underlying integers. When the same is done for multiplication or division, the result needs to be rescaled—for multiplication the result needs to be divided by the scaling factor, for division it needs to be multiplied. To see this, suppose we want to multiply two real numbers a and b, stored as fixed-point numbers with scaling factor S. If we multiply the underlying integers, we obtain aS · bS = abS2. However, the value we want is abS, so we need to divide by S.
Binary vs. decimal
The two most common fixed-point types are decimal and binary. Decimal fixed-point types have a scaling factor that is a power of ten, for binary fixed-point types it is a power of two.
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.
Notation
There are various notations used to represent word length and radix point in a binary fixed-point number. In the following list, f represents the number of fractional bits, m the number of magnitude or integer bits, s the number of sign bits, and b the total number of bits.
Precision loss and overflow
Because fixed point operations can produce results that have more bits than the operands, there is opportunity for information loss. For instance, the result of fixed point multiplication could potentially have as many bits as the sum of the number of bits in the two operands. In order to fit the result into the same number of bits as the operands, the answer must be rounded or truncated. If this is the case, the choice of which bits to keep is very important. When multiplying two fixed point numbers with the same format, for instance with I integer bits, and Q fractional bits, the answer could have up to 2I integer bits, and 2Q fractional bits.
Some operations, like divide, often have built-in result limiting so that any positive overflow results in the largest possible number that can be represented by the current format. Likewise, negative overflow results in the largest negative number represented by the current format. This built in limiting is often referred to as saturation.
Some processors support a hardware overflow flag that can generate an exception on the occurrence of an overflow, but it is usually too late to salvage the proper result at this point.
Implementations
Very few computer languages include built-in support for fixed point values, because for most applications, binary or decimal floating-point representations are usually simpler to use and accurate enough. Floating-point representations are easier to use than fixed-point representations, because they can handle a wider dynamic range and do not require programmers to specify the number of digits after the radix point. However, if they are needed, fixed-point numbers can be implemented even in programming languages like C and C++, which do not commonly include such support.
A common use of fixed-point BCD numbers is for storing monetary values, where the inexact values of binary floating-point numbers are often a liability. Historically, fixed-point representations were the norm for decimal data types; for example, in PL/I or COBOL. The Ada programming language includes built-in support for both fixed-point (binary and decimal) and floating-point. JOVIAL and Coral 66 also provide both floating- and fixed-point types.
|