Paradox Community
Search:

 Welcome |  What is Paradox |  Paradox Folk |  Paradox Solutions |
 Interactive Paradox |  Paradox Programming |  Internet/Intranet Development |
 Support Options |  Classified Ads |  Wish List |  Submissions 


Paradox Newsgroups  |  Paradox Web Sites  |  Paradox Book List  |  FAQs From The Corel FAQ Newsgroup  



Subject: TIP:PdoxWin:Floating Point Arithmetic:2001.05.29

Version 1.0 (2001.05.29)
edited by Paradox FAQ Team
Many thanks to Alberto Squassabia of HP.

====================
0. Introduction
====================

This FAQ addresses the matter of working with real numbers
when one needs to determine equality or inequality.

-------------------------------
 0.1 Legal Info and Disclaimers
-------------------------------

Paradox is a trademark of Corel.
Borland Database Engine (BDE) is a trademark of Inprise.

The information provided in this FAQ is provided "as is"
and is not warranted in any way. The information provided
in this FAQ is not endorsed or authorized by Corel or
Inprise in any shape, form, or manner.

The editors claim NO responsibility for ANY illegal
activity regarding this file, or as a result of someone
reading this file.

You may distribute this file, as long as the copies are
complete, unaltered, and are in electronic form only.

-------------
 0.2 Feedback
-------------

Please send feedback in a Corel Paradox newsgroup or the
news:comp.databases.Paradox newsgroup to any of the FAQ
Team mentioned in the "FAQ: FAQ FAQ" document.
Please preface the subject of your post with the string
"PDXWIN FAQ" to alert Team members to the function of
the message.

Please specify the FAQ name and section number the
comment applies to, if any.

==============================
1. General Information
==============================
This FAQ is derived from a paper by Alberto Squassabia of 
Hewlett-Packard. Alberto Squassabia is a software engineer 
with Hewlett-Packard and works for CSL in Fort Collins. 
He can be contacted at alsq@fc.hp.com. Please note that Mr. 
Squassabia is not familiar with ObjectPAL; his normal
programming language is C++.

The original paper can be found at

http://www.creport.com/html/from_pages/
view_recent_articles_c.cfm?ArticleID=396

and is couched in the programming language familiar to 
most of his expected readers, C++.

On reading it one is made aware of some of the 
shortcomings of that language, and, as it ison a subject 
that is of interest to programmers of database systems, we 
have transposed it to use ObjectPAL.

--------------------------
1.1 The Number Type in ObjectPAL
--------------------------

The following is the definition of the Number type in
ObjectPAL.

Number variables represent floating-point values consisting 
of a significand (fractional portion, for example, 3.224) 
multiplied by a power of 10. The significand contains up to
18 significant digits, and the power of 10 ranges from 
± 3.4E-4930 to ± 1.1E4930. Assigning values outside of this 
range to a Number variable causes an error.

This makes it clear that ObjectPAL uses 80-bit numbers. 
Paradox has always used 64-bit numbers; up until the
transition from Borland C++ to Microsoft C++, ObjectPAL used 
80-bit numbers internally, performing conversions as the 
values moved between internal storage and disk. 

While the ObjectPAL 9 Help claims 80-bit numbers, 
information from Corel indicates that ObjectPAL now uses
64-bit numbers, and therefore the ranges are as follows:

Negative numbers from a large negative value of
-1.797693134862316 E+308
to a small value approaching zero of
-2.225073858507201 E-308
Positive numbers start at
2.225073858507201 E-308
and continue to
1.797693134862316 E+308
Zero is also included within this range.

--------------------------
1.2 Notes on Real Values
--------------------------

In the computing world there are two kinds of numerical 
value:

a. Integer
b. Real

...........................
1.2.1 Integers
...........................
Integers can be divided into "native integers" and 
Binary Coded Decimals, but, for our purposes here we shall 
ignore BCD. An Integer has two properties that define it:

a. Step
b. Range

For an Integer, the Step is always 1. That is to say, the 
result of subtracting two adjacent Integers is always 1, 
no matter what the Range.
The Range of an Integer depends entirely on the storage
size: the Range is defined as the difference between the 
largest and the smallest numbers representable by that 
Integer.

If you allocate 16 bits to store an integer, you can use 
all 16 and have a range of 65536 - from 0 to 65536. On the
other hand, you can also use one bit to determine whether 
the number is negative and also have a range of 65536 - 
from -32768 to +32767 (0 is counted as a positive number).

Paradox offers smallInts (signed 16-bit integers) and 
longInts (signed 32-bit numbers). LongInts have a range of
4294967296 (well, from -2147483648 to +2147483647). Again, 
the Step is always 1.

There is one very interesting thing about Integers; 
something so obvious that most of us miss it completely. 
That is that Integers cannot represent most values ! In
fact, they have a huge step value, and between the print of 
one value and the next nothing can be represented. There 
is, for example, nothing between 1 and 2 for an integer !

...........................
1.2.2 Numbers
...........................

Numbers are very different from Integers. A number can 
also be defined by Step and Range. The Paradox Help states 
that the Range of a Number is ± 3.4E-4930 to ± 1.1E4930. 
What do these strange little "±" signs mean ? Well, they're
math-speak for "approximately" ! Anyhow, these are pretty 
impressive numbers !

However, another part of the spec says that a Number is 
composed of two parts, the Significand and the Operand.

The Number, then, is made up of a significand (fractional 
portion, for example, 3.224) multiplied by a power of 10. 
The significand contains up to 18 significant digits.

This tells us that our Operand ( the power-of-ten bit ) is 
taking up a lot less than 64 bits !

Unlike Integers, Real Numbers have two interesting things, 
and these are rather important, because they bite us 
rather regularly !

a. The Step, while seeming a lot finer that that of an
Integer, does actually have a finite width. This means 
that there some values that just can't be represented by a 
Real Number !

b. The Step is not constant ! Notice that the value is 
defined by a Significand and an Operand. The number of
values provided by the Significand is spread across each 
separate value of the Operand. 

If the 18 digits of the Significand are spread between 10 to 
the power 1 and 10 to the power 2, then that means that we 
have them spread between 10 and 100 - they cover an integer
range of 90, so there are 10 to the 18 numbers to cover all 
the values between 10 and 100 - or the Coverage is 10 to the 
17 (about) values per Integer Step. Pretty dense ! Even then 
they don't get every one, as we notice when we look at the 
representation of 1/3 !

However, now let's look at the coverage for larger 
numbers. Let's take the range between 10 to the 30 and 10 
to the 31. Again, the Operand value has been incremented 
by just one to provide the range limits. Again, we have 10 
to the 18 numbers to provide the Coverage. However, the 
difference between 10 to the 30 and 10 to the 31 is not
90, but 9 to the power 30 ! Coverage is now spread so 
thinly that there is a gap of about 10 to the 12 between 
adjacent points !

Ok. Now I've scared you with numbers, here's Alberto's paper.

==============================
2. Comparing Numbers
==============================

How To Determine if Floating Quantities Are Close Enough 
Once a Tolerance Has Been Reached

UNLIKE INTEGERS, which are either different or equal, 
floating point numbers may be different, equal, or 
"close enough." This is a down-to-earth practitioner's 
discussion on how to determine, in a robust manner, whether 
or not floating quantities are close enough, once a
tolerance (the meaning of "close enough") has been assigned. 
The goal is a comparison procedure that is correct more 
often than not and reasonably efficient. Floating point 
jargon is taken from C/C++, and float means a 32-bit IEEE 
float. All reasoning can be extended to double precision 
and other floating point quantities.

While the inequality of floating point numbers is easily 
established using operators > or <, "close enough," or 
fuzzy equality requires more work. Consider the following 
numbers:


var 
   nuOne        Number
   nuTwo        Number
endVar
   nuOne = 0.123456
   nuTwo = 0.123457

Are nuOne and nuTwo equal? Strictly speaking, (nuOne = nuTwo) 
will evaluate as False, but are nuOne and nuTwo close enough 
to be considered equal? This decision requires some knowledge 
of the problem domain, usually represented by a tolerance.
Let's call it nuFeps for "float epsilon". The tolerance is 
related to how many significant digits must match so that 
two numbers are close enough to be, for all practical 
purposes, equal.

Since IEEE floats have at least six significant digits in
decimal notation, this discussion will arbitrarily allow
losing approximately one digit to error and will require
five matching digits for equality by establishing a
tolerance like

Since IEEE floats have at least six significant digits in
decimal notation, this discussion will arbitrarily allow 
losing approximately one digit to error and will require 
five matching digits for equality by establishing a 
tolerance like

var
   nuFeps       Number
endVar
   nuFeps = 0.00001

Thus, 

( nuFeps > (nuOne - nuTwo) ) 

will return true, indicating that a1 and b1 are equal, as 
will 

( nuFeps > (nuTwo - nuOne) )
 
The "close enough" equality test has become an inequality 
test against a difference. 

However, if you consider:


var
   nuThree      Number
   nuFour       Number
endVar
   nuThree = 0.123456
   nuFour = -0.123457


( nuFeps > (nuThree - nuFour) )
 
will evaluate as

( nuFeps > 0.246913 )

which is false, while 

( nuFeps > (nuFour - nuThree) )

will evaluate as

( nuFeps > -0.246913 )
 
which is incorrectly true — oops!

Equality (and inequality as well) is symmetric: 

if  ( a <> b ) is true, then ( b <> a ) must also be true. 

Fix:

( nuFeps > abs(nuFour - nuThree) )

will return false and so will 

( nuFeps > abs(nuFour - nuThree) )

at the cost of one abs() per comparison. Turning a comparison 
into an inequality against a difference, however, has 
shortcomings. 

var
   nuBig1        Number
   nuBig2        Number
   nuSmall1      Number
   nuSmall2      Number
   nuSmall3      Number
   nuSmall4      Number
endVar
	float nuBig1 = 1.23456e28
	float nuBig2 = 1.23457e28
	float nuSmall1 = 1.23456e-10
	float nuSmall2 = 3.45678e-11
	float nuSmall3 =  0.000004
	float nuSmall4 = -0.000004

Are nuBig1 and nuBig2 equal? Their difference is a quantity 
of the order of 1.0e23, or, in U.S. dollars, much larger 
than the net worth of Bill Gates. If both are the result of 
computations involving thousands of floating point operations,
however, nuBig1 and nuBig2 are most likely the same number. 
Yet 

( nuFeps > abs(nuBig1 - nuBig2) )

will return false. How about setting nuFeps=2.0e23?
Just kidding ! There's more; look at the next pair of numbers.
Here, nuSmall1 and nuSmall2 are truly different, and 
generously bigger than the smallest IEEE float larger than 
zero. However, 

( nuFeps > abs(nuSmall1 - nuSmall2) )

returns true so that nuSmall1 and nuSmall2 incorrectly appear 
to be equal. The same error happens if the final pair of 
small values are used. Of course, you can always change 
nuFeps to a suitably smaller value. Or can you? Let:

nuFeps = 1.0e-10

Now 

( nuFeps > abs(nuSmall3 - nuSmall4) )

is false, so they're different. But 

( nuFeps > abs(nuOne - nuTwo) )

is also false, <sigh>. An nuFeps this small tests for too 
many significant digits (more than a float can afford).
Consequently, it is too often as persnickety as the = 
operator. In addition,
 
( nuFeps > abs(nuSmall1 - nuSmall2) )

is still incorrectly true; the same happens with

( nuFeps > abs(nuBig1 - nuBig2) )

which is still (incorrectly) false.

Here is the point: a tolerance tested for inequality against
a difference is a practical way to test "close enough" 
equality to zero. However, this method cannot account for 
"close enough" equality of two float numbers over their 
entire dynamic range.

To test fuzzy equality over the possible range of two float
numbers, the ratio (vs. the absolute difference) of the 
numbers must be "close enough" to unity (vs. smaller than 
nuFeps). Taking a ratio for a comparison is perhaps expensive 
in some cases, but is usually better than returning an 
incorrect result.

The value of abs(nuOne/nuTwo) is ~0.999992. 
That of abs(nuTwo/nuOne) is ~1.000008; 
1+nuFeps is 1.00001, and 1-nuFeps is 0.99999. 
A seemingly correct ratio-test predicate such as:

(((1-nuFeps) < abs(nuN/nuD)) and (abs(nuN/nuD) < (1+nuFeps)))

where nuN=numerator and nuD=denominator is true for nuN < nuD 
or nuN>nuD indifferently so long as nuN is "close enough" to 
nuD.

The value of abs(nuBig1/nuBig2) is the same as
abs(nuOne/nuTwo). The value of abs(nuSmall1/nuSmall2) is 
3.57146, that of abs(nuSmall2/nuSmall1) is 0.280001; the 
ratio-test predicate is correctly false for both, indicating 
that nuSmall1 and nuSmall2 are different.

Here is the catch: the value of abs(nuThree/nuFour) is
~1.000008, that of abs(nuFour/nuThree) is ~0.999992 and the 
value of abs(nuSmall3/nuSmall4) is 1.0: in all three of 
these cases the seemingly correct ratio-test predicate is 
incorrectly true. 

But wait: abs is a carryover from the difference-test
predicate. In fact, the ratio-test must not use abs, so the 
ratio must be signed: (-)/(-) or (+)/(+) is positive, hence 
potentially close to positive 1.0 and capable of scoring 
true, but (-)/(+) or (+)/(-) is negative and automatically 
disqualified from scoring true, as it should be (there are 
exceptions as noted later on).

The correct predicate for the ratio-test is then:

(((1-nuFeps) < (nuN/nuD)) and ((nuN/nuD) < (1+nuFeps)))

Now the value of (nuSmall3/nuSmall4) is -1.0 and false,
which is correct. The same is true for nuOne and nuTwo. But 
don't breathe a sigh of relief just yet! Now, let:

var
   nuFive        Number
   nuSix         Number
endVar
   nuFive = 1.0e36
   nuSix = 1.0e-4

In this case, nuN/nuD can be either 1.0e40 (oops: IEEE float 
overflow !), or 1.0e-40 (oops: IEEE float underflow !).

Perhaps 1.0e40 will be interpreted as 0.0F, and perhaps not. 
Almost certainly 1.0e40 will cause indigestion to your code. 
Here is the fix: before taking the ratio, check for overflow 
or underflow. Here is how.

The predicate 

if ( nuD < 1.0 ) then
   if ( nuN > nuD * maxNumber) ) then

is true (and safe), even if nuN/nuD would overflow; if true,
obviously the equality is false. The part (d<1.0) is needed 
because d*maxFloat may otherwise cause overflow.

The predicate 

if ( nuD > 1.0 ) then
   if ( nuN < nuD * minNumber) ) then

is true (and safe), even if nuN/nuD would underflow. 
If ( nuD > 1.0 ), then (nuD * minNumber ) will not underflow. 
So long to numbers !

The original paper continues to consider larger number 
types and the dangers of mixed-mode arithmetic. However, 
ObjectPAL programmers need not be concerned with this.


Paradox Community Newsgroups


 Feedback |  Paradox Day |  Who Uses Paradox |  I Use Paradox |  Downloads 


 The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation.
 Paradox is a registered trademark of Corel Corporation.


 Modified: 15 May 2003
 Terms of Use / Legal Disclaimer


 Copyright © 2001- 2003 Paradox Community. All rights reserved. 
 Company and product names are trademarks or registered trademarks of their respective companies. 
 Authors hold the copyrights to their own works. Please contact the author of any article for details.