![]() |
![]() |
|
![]() |
Subject: TIP:PdoxWin:Floating Point Arithmetic:2001.05.29 Version 1.0 (2001.05.29) edited by Paradox FAQ Team Many thanks to Alberto Squassabia of HP. ==================== 0. Introduction ==================== This FAQ addresses the matter of working with real numbers when one needs to determine equality or inequality. ------------------------------- 0.1 Legal Info and Disclaimers ------------------------------- Paradox is a trademark of Corel. Borland Database Engine (BDE) is a trademark of Inprise. The information provided in this FAQ is provided "as is" and is not warranted in any way. The information provided in this FAQ is not endorsed or authorized by Corel or Inprise in any shape, form, or manner. The editors claim NO responsibility for ANY illegal activity regarding this file, or as a result of someone reading this file. You may distribute this file, as long as the copies are complete, unaltered, and are in electronic form only. ------------- 0.2 Feedback ------------- Please send feedback in a Corel Paradox newsgroup or the news:comp.databases.Paradox newsgroup to any of the FAQ Team mentioned in the "FAQ: FAQ FAQ" document. Please preface the subject of your post with the string "PDXWIN FAQ" to alert Team members to the function of the message. Please specify the FAQ name and section number the comment applies to, if any. ============================== 1. General Information ============================== This FAQ is derived from a paper by Alberto Squassabia of Hewlett-Packard. Alberto Squassabia is a software engineer with Hewlett-Packard and works for CSL in Fort Collins. He can be contacted at alsq@fc.hp.com. Please note that Mr. Squassabia is not familiar with ObjectPAL; his normal programming language is C++. The original paper can be found at http://www.creport.com/html/from_pages/ view_recent_articles_c.cfm?ArticleID=396 and is couched in the programming language familiar to most of his expected readers, C++. On reading it one is made aware of some of the shortcomings of that language, and, as it ison a subject that is of interest to programmers of database systems, we have transposed it to use ObjectPAL. -------------------------- 1.1 The Number Type in ObjectPAL -------------------------- The following is the definition of the Number type in ObjectPAL. Number variables represent floating-point values consisting of a significand (fractional portion, for example, 3.224) multiplied by a power of 10. The significand contains up to 18 significant digits, and the power of 10 ranges from ± 3.4E-4930 to ± 1.1E4930. Assigning values outside of this range to a Number variable causes an error. This makes it clear that ObjectPAL uses 80-bit numbers. Paradox has always used 64-bit numbers; up until the transition from Borland C++ to Microsoft C++, ObjectPAL used 80-bit numbers internally, performing conversions as the values moved between internal storage and disk. While the ObjectPAL 9 Help claims 80-bit numbers, information from Corel indicates that ObjectPAL now uses 64-bit numbers, and therefore the ranges are as follows: Negative numbers from a large negative value of -1.797693134862316 E+308 to a small value approaching zero of -2.225073858507201 E-308 Positive numbers start at 2.225073858507201 E-308 and continue to 1.797693134862316 E+308 Zero is also included within this range. -------------------------- 1.2 Notes on Real Values -------------------------- In the computing world there are two kinds of numerical value: a. Integer b. Real ........................... 1.2.1 Integers ........................... Integers can be divided into "native integers" and Binary Coded Decimals, but, for our purposes here we shall ignore BCD. An Integer has two properties that define it: a. Step b. Range For an Integer, the Step is always 1. That is to say, the result of subtracting two adjacent Integers is always 1, no matter what the Range. The Range of an Integer depends entirely on the storage size: the Range is defined as the difference between the largest and the smallest numbers representable by that Integer. If you allocate 16 bits to store an integer, you can use all 16 and have a range of 65536 - from 0 to 65536. On the other hand, you can also use one bit to determine whether the number is negative and also have a range of 65536 - from -32768 to +32767 (0 is counted as a positive number). Paradox offers smallInts (signed 16-bit integers) and longInts (signed 32-bit numbers). LongInts have a range of 4294967296 (well, from -2147483648 to +2147483647). Again, the Step is always 1. There is one very interesting thing about Integers; something so obvious that most of us miss it completely. That is that Integers cannot represent most values ! In fact, they have a huge step value, and between the print of one value and the next nothing can be represented. There is, for example, nothing between 1 and 2 for an integer ! ........................... 1.2.2 Numbers ........................... Numbers are very different from Integers. A number can also be defined by Step and Range. The Paradox Help states that the Range of a Number is ± 3.4E-4930 to ± 1.1E4930. What do these strange little "±" signs mean ? Well, they're math-speak for "approximately" ! Anyhow, these are pretty impressive numbers ! However, another part of the spec says that a Number is composed of two parts, the Significand and the Operand. The Number, then, is made up of a significand (fractional portion, for example, 3.224) multiplied by a power of 10. The significand contains up to 18 significant digits. This tells us that our Operand ( the power-of-ten bit ) is taking up a lot less than 64 bits ! Unlike Integers, Real Numbers have two interesting things, and these are rather important, because they bite us rather regularly ! a. The Step, while seeming a lot finer that that of an Integer, does actually have a finite width. This means that there some values that just can't be represented by a Real Number ! b. The Step is not constant ! Notice that the value is defined by a Significand and an Operand. The number of values provided by the Significand is spread across each separate value of the Operand. If the 18 digits of the Significand are spread between 10 to the power 1 and 10 to the power 2, then that means that we have them spread between 10 and 100 - they cover an integer range of 90, so there are 10 to the 18 numbers to cover all the values between 10 and 100 - or the Coverage is 10 to the 17 (about) values per Integer Step. Pretty dense ! Even then they don't get every one, as we notice when we look at the representation of 1/3 ! However, now let's look at the coverage for larger numbers. Let's take the range between 10 to the 30 and 10 to the 31. Again, the Operand value has been incremented by just one to provide the range limits. Again, we have 10 to the 18 numbers to provide the Coverage. However, the difference between 10 to the 30 and 10 to the 31 is not 90, but 9 to the power 30 ! Coverage is now spread so thinly that there is a gap of about 10 to the 12 between adjacent points ! Ok. Now I've scared you with numbers, here's Alberto's paper. ============================== 2. Comparing Numbers ============================== How To Determine if Floating Quantities Are Close Enough Once a Tolerance Has Been Reached UNLIKE INTEGERS, which are either different or equal, floating point numbers may be different, equal, or "close enough." This is a down-to-earth practitioner's discussion on how to determine, in a robust manner, whether or not floating quantities are close enough, once a tolerance (the meaning of "close enough") has been assigned. The goal is a comparison procedure that is correct more often than not and reasonably efficient. Floating point jargon is taken from C/C++, and float means a 32-bit IEEE float. All reasoning can be extended to double precision and other floating point quantities. While the inequality of floating point numbers is easily established using operators > or <, "close enough," or fuzzy equality requires more work. Consider the following numbers: var nuOne Number nuTwo Number endVar nuOne = 0.123456 nuTwo = 0.123457 Are nuOne and nuTwo equal? Strictly speaking, (nuOne = nuTwo) will evaluate as False, but are nuOne and nuTwo close enough to be considered equal? This decision requires some knowledge of the problem domain, usually represented by a tolerance. Let's call it nuFeps for "float epsilon". The tolerance is related to how many significant digits must match so that two numbers are close enough to be, for all practical purposes, equal. Since IEEE floats have at least six significant digits in decimal notation, this discussion will arbitrarily allow losing approximately one digit to error and will require five matching digits for equality by establishing a tolerance like Since IEEE floats have at least six significant digits in decimal notation, this discussion will arbitrarily allow losing approximately one digit to error and will require five matching digits for equality by establishing a tolerance like var nuFeps Number endVar nuFeps = 0.00001 Thus, ( nuFeps > (nuOne - nuTwo) ) will return true, indicating that a1 and b1 are equal, as will ( nuFeps > (nuTwo - nuOne) ) The "close enough" equality test has become an inequality test against a difference. However, if you consider: var nuThree Number nuFour Number endVar nuThree = 0.123456 nuFour = -0.123457 ( nuFeps > (nuThree - nuFour) ) will evaluate as ( nuFeps > 0.246913 ) which is false, while ( nuFeps > (nuFour - nuThree) ) will evaluate as ( nuFeps > -0.246913 ) which is incorrectly true — oops! Equality (and inequality as well) is symmetric: if ( a <> b ) is true, then ( b <> a ) must also be true. Fix: ( nuFeps > abs(nuFour - nuThree) ) will return false and so will ( nuFeps > abs(nuFour - nuThree) ) at the cost of one abs() per comparison. Turning a comparison into an inequality against a difference, however, has shortcomings. var nuBig1 Number nuBig2 Number nuSmall1 Number nuSmall2 Number nuSmall3 Number nuSmall4 Number endVar float nuBig1 = 1.23456e28 float nuBig2 = 1.23457e28 float nuSmall1 = 1.23456e-10 float nuSmall2 = 3.45678e-11 float nuSmall3 = 0.000004 float nuSmall4 = -0.000004 Are nuBig1 and nuBig2 equal? Their difference is a quantity of the order of 1.0e23, or, in U.S. dollars, much larger than the net worth of Bill Gates. If both are the result of computations involving thousands of floating point operations, however, nuBig1 and nuBig2 are most likely the same number. Yet ( nuFeps > abs(nuBig1 - nuBig2) ) will return false. How about setting nuFeps=2.0e23? Just kidding ! There's more; look at the next pair of numbers. Here, nuSmall1 and nuSmall2 are truly different, and generously bigger than the smallest IEEE float larger than zero. However, ( nuFeps > abs(nuSmall1 - nuSmall2) ) returns true so that nuSmall1 and nuSmall2 incorrectly appear to be equal. The same error happens if the final pair of small values are used. Of course, you can always change nuFeps to a suitably smaller value. Or can you? Let: nuFeps = 1.0e-10 Now ( nuFeps > abs(nuSmall3 - nuSmall4) ) is false, so they're different. But ( nuFeps > abs(nuOne - nuTwo) ) is also false, <sigh>. An nuFeps this small tests for too many significant digits (more than a float can afford). Consequently, it is too often as persnickety as the = operator. In addition, ( nuFeps > abs(nuSmall1 - nuSmall2) ) is still incorrectly true; the same happens with ( nuFeps > abs(nuBig1 - nuBig2) ) which is still (incorrectly) false. Here is the point: a tolerance tested for inequality against a difference is a practical way to test "close enough" equality to zero. However, this method cannot account for "close enough" equality of two float numbers over their entire dynamic range. To test fuzzy equality over the possible range of two float numbers, the ratio (vs. the absolute difference) of the numbers must be "close enough" to unity (vs. smaller than nuFeps). Taking a ratio for a comparison is perhaps expensive in some cases, but is usually better than returning an incorrect result. The value of abs(nuOne/nuTwo) is ~0.999992. That of abs(nuTwo/nuOne) is ~1.000008; 1+nuFeps is 1.00001, and 1-nuFeps is 0.99999. A seemingly correct ratio-test predicate such as: (((1-nuFeps) < abs(nuN/nuD)) and (abs(nuN/nuD) < (1+nuFeps))) where nuN=numerator and nuD=denominator is true for nuN < nuD or nuN>nuD indifferently so long as nuN is "close enough" to nuD. The value of abs(nuBig1/nuBig2) is the same as abs(nuOne/nuTwo). The value of abs(nuSmall1/nuSmall2) is 3.57146, that of abs(nuSmall2/nuSmall1) is 0.280001; the ratio-test predicate is correctly false for both, indicating that nuSmall1 and nuSmall2 are different. Here is the catch: the value of abs(nuThree/nuFour) is ~1.000008, that of abs(nuFour/nuThree) is ~0.999992 and the value of abs(nuSmall3/nuSmall4) is 1.0: in all three of these cases the seemingly correct ratio-test predicate is incorrectly true. But wait: abs is a carryover from the difference-test predicate. In fact, the ratio-test must not use abs, so the ratio must be signed: (-)/(-) or (+)/(+) is positive, hence potentially close to positive 1.0 and capable of scoring true, but (-)/(+) or (+)/(-) is negative and automatically disqualified from scoring true, as it should be (there are exceptions as noted later on). The correct predicate for the ratio-test is then: (((1-nuFeps) < (nuN/nuD)) and ((nuN/nuD) < (1+nuFeps))) Now the value of (nuSmall3/nuSmall4) is -1.0 and false, which is correct. The same is true for nuOne and nuTwo. But don't breathe a sigh of relief just yet! Now, let: var nuFive Number nuSix Number endVar nuFive = 1.0e36 nuSix = 1.0e-4 In this case, nuN/nuD can be either 1.0e40 (oops: IEEE float overflow !), or 1.0e-40 (oops: IEEE float underflow !). Perhaps 1.0e40 will be interpreted as 0.0F, and perhaps not. Almost certainly 1.0e40 will cause indigestion to your code. Here is the fix: before taking the ratio, check for overflow or underflow. Here is how. The predicate if ( nuD < 1.0 ) then if ( nuN > nuD * maxNumber) ) then is true (and safe), even if nuN/nuD would overflow; if true, obviously the equality is false. The part (d<1.0) is needed because d*maxFloat may otherwise cause overflow. The predicate if ( nuD > 1.0 ) then if ( nuN < nuD * minNumber) ) then is true (and safe), even if nuN/nuD would underflow. If ( nuD > 1.0 ), then (nuD * minNumber ) will not underflow. So long to numbers ! The original paper continues to consider larger number types and the dangers of mixed-mode arithmetic. However, ObjectPAL programmers need not be concerned with this. Paradox Community Newsgroups |
![]() Feedback | Paradox Day | Who Uses Paradox | I Use Paradox | Downloads ![]() |
|
![]() The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation. Paradox is a registered trademark of Corel Corporation. ![]() |
|
![]() Modified: 15 May 2003 Terms of Use / Legal Disclaimer ![]() |
![]() Copyright © 2001- 2003 Paradox Community. All rights reserved. Company and product names are trademarks or registered trademarks of their respective companies. Authors hold the copyrights to their own works. Please contact the author of any article for details. ![]() |
![]() |
|