EAFixedPoint

Introduction

The EAFixedPoint module implements fixed point math via classic C macros and functions and via a more advanced implementation using C++ classes. The C++ classes constitute a fairly complete implementation of a fixed point C++ numerical data type that acts much like the built-in float and double data types.

The following code freely mixes the SFixed16 (signed 16:16 fixed point) data type with other numerical data types:

SFixed16_16 a(1), b(2), c(3);
float f(4.5f);
double d(3.2);
int i(6);
a = b * f;
a = (c / d) + b + f;
a = c / d * (b % i) + f / c;
a = i * -c / (b++);
a = sin(a) + pow(b, d) * sqrt(a);
a = log(a) / log(f);

Fixed point math has a number of uses:

Information about fixed point can be found on the Internet by simply searching for "fixed point" with your favorite search site.

Fixed point vs. floating point

Fixed point

+ Can be very fast. Fixed point math executes at the same speed as integer math. Fixed to int conversions are much faster float to int.
+ Executes concurrently with floating point math, due its use of integer math.
– Limited range. Fixed point16:16 numbers are between -32767 and +32767. The fractional part is accurate to 1 / 65536.
– Harder to code in high-level languages.

Floating point

+ Large range. The range for floating point numbers is typically in excess of 1e-100 to 1e+100 and have an accuracy of about 13 decimal places.
+ Executes concurrently with integer math.
– Can be slower. Generally fast for addition and multiplication but may be slower for division and float to int conversions.

Fixed point precision

The C++ fixed point classes provide varying precision via the use of template parameter constants. EAFixedPoint provides the following predefined C++ fixed point types

Type Signed Integral type Precision
SFixed24_8 signed 32 24 bits of integer, 8 bits of fraction
UFixed24_8 unsigned 32 24 bits of integer, 8 bits of fraction
SFixed22_10 signed 32 22 bits of integer, 10 bits of fraction
UFixed22_10 unsigned 32 22 bits of integer, 10 bits of fraction
SFixed20_12 signed 32 20 bits of integer, 12 bits of fraction
UFixed20_12 unsigned 32 20 bits of integer, 12 bits of fraction
SFixed18_14 signed 32 18 bits of integer, 14 bits of fraction
UFixed18_14 unsigned 32 18 bits of integer, 14 bits of fraction
SFixed16_16 signed 32 16 bits of integer, 16 bits of fraction
UFixed16_16 unsigned 32 16 bits of integer, 16 bits of fraction
SFixed14_18 signed 32 14 bits of integer, 18 bits of fraction
UFixed14_18 unsigned 32 14 bits of integer, 18 bits of fraction
SFixed12_20 signed 32 12 bits of integer, 20 bits of fraction
UFixed12_20 unsigned 32 12 bits of integer, 20 bits of fraction
SFixed10_22 signed 32 10 bits of integer, 22 bits of fraction
UFixed10_22 unsigned 32 10 bits of integer, 22 bits of fraction
SFixed8_24 signed 32 8 bits of integer, 24 bits of fraction
UFixed8_24 unsigned 32 8 bits of integer, 24 bits of fraction
       
SFixed48_16 signed 64 48 bits of integer, 16 bits of fraction.
UFixed48_16 unsigned 64 48 bits of integer, 16 bits of fraction.
SFixed44_20 signed 64 44 bits of integer, 20 bits of fraction.
UFixed44_20 unsigned 64 44 bits of integer, 20 bits of fraction.
SFixed40_24 signed 64 40 bits of integer, 24 bits of fraction.
UFixed40_24 unsigned 64 40 bits of integer, 24 bits of fraction.
SFixed36_28 signed 64 36 bits of integer, 28 bits of fraction.
UFixed36_28 unsigned 64 36 bits of integer, 28 bits of fraction.
SFixed32_32 signed 64 32 bits of integer, 32 bits of fraction.
UFixed32_32 unsigned 64 32 bits of integer, 32 bits of fraction.
SFixed28_36 signed 64 28 bits of integer, 36 bits of fraction.
UFixed28_36 unsigned 64 28 bits of integer, 36 bits of fraction.
SFixed24_40 signed 64 24 bits of integer, 40 bits of fraction.
UFixed24_40 unsigned 64 24 bits of integer, 40 bits of fraction.
SFixed20_44 signed 64 20 bits of integer, 44 bits of fraction.
UFixed20_44 unsigned 64 20 bits of integer, 44 bits of fraction.
SFixed16_48 signed 64 16 bits of integer, 48 bits of fraction.
UFixed16_48 unsigned 64 16 bits of integer, 48 bits of fraction.

Example usage

To a large degree, you can use fixed point types the same way you would use floating point types.

Mixed integer math expressions (same as shown earlier above):

SFixed16_16 a(1), b(2), c(3);
float f(4.5f);
double d(3.2);
int i(6);
a = b * f;
a = (c / d) + b + f;
a = c / d * (b % i) + f / c;
a = i * -c / (b++);
a = sin(a) + pow(b, d) * sqrt(a);
a = log(a) / log(f);

printf:

SFixed24_8 f = 23.5f;

printf("%f", f.AsFloat());

Logical expresions:

SFixed16_16 a = 20.4;
SFixed16_16 b = 130.6;
SFixed16_16 c = 223.3;

if((a < b) || (b >= c) || (a < 23.5))
    a *= 25;

Limitations

The primary differences between our fixed point type and a hypothetical built-in version are:

Interface

C interface:

typedef int32_t EAFixed16;
 
#define    EAMAX_FIXED16        0x7fffffff
#define    EAMIN_FIXED16        0x80000000

#define    EAFixed16ToInt(a)    ((int32_t)(a) >> 16)
#define    EAIntToFixed16(a)    ((EAFixed16)((a) << 16))
#define    EAFixed16ToDouble(a) (((double)a) / 65536.0)
#define    EADoubleToFixed16(a) ((EAFixed16)((a) * 65536.0))
#define    EAFixed16ToFloat(a)  (((float)a) / 65536.f)
#define    EAFloatToFixed16(a)  ((EAFixed16)((a) * 65536.f))
#define    EAFixed16Negate(a)   (-a)

EAFixed16  EAFixed16Mul         (EAFixed16 a, EAFixed16 b);
EAFixed16  EAFixed16Div         (EAFixed16 a, EAFixed16 b);
EAFixed16  EAFixed16DivSafe     (EAFixed16 a, EAFixed16 b);
EAFixed16  EAFixed16MulDiv      (EAFixed16 a, EAFixed16 b, EAFixed16 c);
EAFixed16  EAFixed16MulDivSafe  (EAFixed16 a, EAFixed16 b, EAFixed16 c);
EAFixed16  EAFixed16Mod         (EAFixed16 a, EAFixed16 b);
EAFixed16  EAFixed16ModSafe     (EAFixed16 a, EAFixed16 b);
EAFixed16  EAFixed16Abs         (EAFixed16 a);

C++ interface, by example of SFixed16_16. Note that nearly all the functions below are implemented as simple inlines:

struct SFixed16_16
{
    SFixed16_16();
    SFixed16_16(const SFixed16_16& value);
    SFixed16_16(const int& value);
    SFixed16_16(const unsigned int& value);
    SFixed16_16(const long& value);
    SFixed16_16(const unsigned long& value);
    SFixed16_16(const float& value);
    SFixed16_16(const double& value);

    void    FromFixed(const int& value);
    int32_t AsFixed();

    int           AsInt() const;
    unsigned int  AsUnsignedInt() const;
    long          AsLong() const;
    unsigned long AsUnsignedLong()const;
    float         AsFloat() const;
    double        AsDouble() const;

    SFixed16_16& operator=(const SFixed16_16& value);
    SFixed16_16& operator=(const int& value);
    SFixed16_16& operator=(const unsigned int& value);
    SFixed16_16& operator=(const long& value);
    SFixed16_16& operator=(const unsigned long& value);
    SFixed16_16& operator=(const float& value);
    SFixed16_16& operator=(const double& value);

    bool operator< (const SFixed16_16& value) const;
    bool operator> (const SFixed16_16& value) const;
    bool operator>=(const SFixed16_16& value) const;
    bool operator<=(const SFixed16_16& value) const;
    bool operator==(const SFixed16_16& value) const;
    bool operator!=(const SFixed16_16& value) const;
     
    bool operator< (const int& value) const;
    bool operator> (const int& value) const;
    bool operator>=(const int& value) const;
    bool operator<=(const int& value) const;
    bool operator==(const int& value) const;
    bool operator!=(const int& value) const;
     
    bool operator< (const unsigned int& value) const;
    bool operator> (const unsigned int& value) const; 
    bool operator>=(const unsigned int& value) const;
    bool operator<=(const unsigned int& value) const;
    bool operator==(const unsigned int& value) const;
    bool operator!=(const unsigned int& value) const;
     
    bool operator< (const long& value) const;
    bool operator> (const long& value) const;
    bool operator>=(const long& value) const;
    bool operator<=(const long& value) const;
    bool operator==(const long& value) const;
    bool operator!=(const long& value) const;
     
    bool operator< (const unsigned long& value) const;
    bool operator> (const unsigned long& value) const;
    bool operator>=(const unsigned long& value) const;
    bool operator<=(const unsigned long& value) const;
    bool operator==(const unsigned long& value) const;
    bool operator!=(const unsigned long& value) const;

    bool operator< (const float& value) const;
    bool operator> (const float& value) const;
    bool operator>=(const float& value) const;
    bool operator<=(const float& value) const;
    bool operator==(const float& value) const;
    bool operator!=(const float& value) const;
     
    bool operator< (const double& value) const;
    bool operator> (const double& value) const;
    bool operator>=(const double& value) const;
    bool operator<=(const double& value) const;
    bool operator==(const double& value) const;
    bool operator!=(const double& value) const;
    bool operator! () const;
     
    SFixed16_16 operator~() const;
    SFixed16_16 operator-() const;
    SFixed16_16 operator+() const;

    SFixed16_16& operator+=(const SFixed16_16& value);
    SFixed16_16& operator+=(const int& value);
    SFixed16_16& operator+=(const unsigned int& value);
    SFixed16_16& operator+=(const long & value);
    SFixed16_16& operator+=(const unsigned long& value);
    SFixed16_16& operator+=(const float& value);
    SFixed16_16& operator+=(const double& value);

    SFixed16_16& operator-=(const SFixed16_16& value);
    SFixed16_16& operator-=(const int& value);
    SFixed16_16& operator-=(const unsigned int& value);
    SFixed16_16& operator-=(const long& value);
    SFixed16_16& operator-=(const unsigned long& value);
    SFixed16_16& operator-=(const float& value);
    SFixed16_16& operator-=(const double& value);

    SFixed16_16& operator*=(const SFixed16_16& value);
    SFixed16_16& operator*=(const int& value)
    SFixed16_16& operator*=(const unsigned int& value)
    SFixed16_16& operator*=(const long& value)
    SFixed16_16& operator*=(const unsigned long& value);
    SFixed16_16& operator*=(const float& value);
    SFixed16_16& operator*=(const double& value);

    SFixed16_16& operator/=(const SFixed16_16& value);
    SFixed16_16& operator/=(const int& value);
    SFixed16_16& operator/=(const unsigned int& value);
    SFixed16_16& operator/=(const long& value);
    SFixed16_16& operator/=(const unsigned long& value);
    SFixed16_16& operator/=(const float& value);
    SFixed16_16& operator/=(const double& value);

    SFixed16_16& operator%=(const SFixed16_16& value);
    SFixed16_16& operator%=(const int& value);
    SFixed16_16& operator%=(const unsigned int& value);
    SFixed16_16& operator%=(const long& value);
    SFixed16_16& operator%=(const unsigned long& value);
    SFixed16_16& operator%=(const float& value);
    SFixed16_16& operator%=(const double& value);

    SFixed16_16& operator|=(const SFixed16_16& value);
    SFixed16_16& operator|=(const int& value);

    SFixed16_16& operator&=(const SFixed16_16& value);
    SFixed16_16& operator&=(const int& value);

    SFixed16_16& operator^=(const SFixed16_16& value);
    SFixed16_16& operator^=(const int& value);

    SFixed16_16 operator<<(int numBits) const;
    SFixed16_16 operator>>(int numBits) const;

    SFixed16_16& operator<<=(int numBits);
    SFixed16_16& operator>>=(int numBits);

    SFixed16_16& operator++();
    SFixed16_16& operator--();
    SFixed16_16  operator++(int);
    SFixed16_16  operator--(int);

    SFixed16_16  Abs();
    SFixed16_16  DivSafe(const SFixed16_16& denominator);
    SFixed16_16& DivSafeAssign(const SFixed16_16& denominator);
};