Software Development - Business Programming - RoboProg's

RoboProg's / Software Development

Last Month


Feb 5, 2010

It Doesn't Add Up!

In order to pay the rent while I was going to college, I had to do some programming work at a small local company. The company resold an accounting package that came with source. We, meaning I, also did a fair amount of work doing some vertical application add-on work to support individual client needs. Doing such work, I had ample opportunity to learn that when an accountant looks at a decimal point, it does not mean the same thing to him/her as it does to say a chemist, or perhaps many computer scientists. E.g. - measure the amount of liquid in a beaker or silo, approximate the velocity of a rude sprite flying across a virtual reality arena.

It I were hiring somebody to work on a business application that did any accounting, one of the things I would be sure to ask is to show me some code to calculate (and print, perhaps) the figures for an invoice. Given a list (perhaps in a CSV data file?) such as


Units,Unit Cost,Description
1,55500.555,"Widget"
1,55500.555,"Widget Accessory"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
1,55500.555,"Xxx"
	

Produce the extended cost for each line, and the grand total. The list above is of course contrived, but the values chosen will make sense when you see the results of the sample code below.

Now, I don't really care which language they express the solution in, what I'm after -- the turnip truck test, if you will -- is this: Do they know enough to quantize the result of each step at some fixed precision, or do they fall into simply (and incorrectly) using some kind of floating point arithmetic?

For the sample program below, I have stripped out most of the problem (such as actually reading and/or parsing input; multiplying out the extended cost), and have set up a situation that breaks simple addition.

CodeNotes

import java.util.*;
import java.math.*;
public class Flotsam {
	static final String INPUT_FLD_AMT = "55500.555";

	static final Formatter printf = new Formatter( System.out);
	

Some set up: pull in the names for some utility classes; define the unit cost which was repeated in the data file; create a formatting tool to print the numbers.


	public static void main( String [] argv) {
		System.out.println( "\n--- The WRONG way ---");
		wrong();
		System.out.println( "\n--- The RIGHT way ---");
		right();
	}
	

First we will see a way to screw it up, then a way to get correct answers, the same ones, every time! (but of course TMTOWTDI)


	static void wrong() {
		float x = Float.parseFloat( INPUT_FLD_AMT);
		printf.format( "%11.3f       Line item value (in mils)\n", x);
		printf.format( "%16.8f  (looking deeper into trailing digits)\n", x);
		float tot = 0;
		for ( int i = 10; i > 0; i--)
			{ tot += x; }
		System.out.println( "================");
		printf.format( "%11.3f       Grand total (10 of the same items added)\n", tot);
		printf.format( "%16.8f  (looking deeper, again)\n", tot);
	}
	

OK, let's measure / estimate the amount of dollars we want to scoop into the grand total bucket. What's that, you say? I should use double instead of float? Yes, that would hide the problem longer, but it would not really solve it.
This simulates the problem represented in the sample data by putting the cost in one variable, then adding it to an accumulator variable 10 times.
As the name suggests, this implementation of the accumulation logic has some issues, as we are about to see.


--- The WRONG way ---
  55500.555       Line item value (in mils)
  55500.55468750  (looking deeper into trailing digits)
================
 555005.563       Grand total (10 of the same items added)
 555005.56250000  (looking deeper, again)
	
  1. OK, that looks a lot like $55,500.555 [1]
  2. Hey, why don't I have a nice bunch of trailing 0s here? If you had only used "double" for your variables this would have looked so much nicer (due to the larger mantissa).
  3. So, 10 x $55,500.555 is $555,005.55, right? What?!? How the xxxx did this happen? $555,005.56? Oh well, it's only a penny, the accountants probably won't even notice, right?
  4. Yeah, sure, there's a mess looking at even more trailing digits. Something about those last couple digits seems eerily familiar, though. How about we ignore the ".5", and point out that 0.0625 = 1/16th -- that is, 1 divided by a power of 2. Actually, 0.5 = 1/2 -- also 1 divided by a power of 2 (2^1). Hmmm.

	static void right() {
		BigDecimal x = new BigDecimal( INPUT_FLD_AMT);
		printf.format( "%11.3f       Line item value (in mils)\n", x);
		printf.format( "%16.8f  (looking deeper into trailing digits)\n", x);
		BigDecimal tot = new BigDecimal( "0.000");
		for ( int i = 10; i > 0; i--)
			{ tot = tot.add( x); }
		System.out.println( "================");
		printf.format( "%11.3f       Grand total (10 of the same items added)\n", tot);
		printf.format( "%16.8f  (looking deeper, again)\n", tot);
	}
}
	

This time, we are going to count out the mils, but we will remember where the decimal point goes as we are using dollars for our units. This is fixed precision arithmetic, as opposed to floating point.
The BigDecimal type in java supports fixed precision. Look for things described as "binary coded decimal" or "packed decimal" in other progamming environments. [2]
Yes, I am just letting BigDecimal default its rounding rule, instead of explicitly specifying one. Consider it extra credit to figure out how to monkey with rounding.


--- The RIGHT way ---
  55500.555       Line item value (in mils)
  55500.55500000  (looking deeper into trailing digits)
================
 555005.550       Grand total (10 of the same items added)
 555005.55000000  (looking deeper, again)
	

  1. Looks like $55,500.555, just like last time at this point.
  2. Still looks like $55,500.555
  3. 10 x $55,500.555 is indeed $555,005.55. The accountants will be so pleased with us. Or at least not have to wonder how we manage to dress ourselves in the morning without help.
  4. Nothing up my sleeve! No trailing digits lurking out here, either.

The root of this common problem in business applications has to do with how binary floating point numbers are composed. Binary numbers are of course composed of the sum of a series of positional values, 0 or 1 times a power of 2. However, the decimal numbers that people are used to seeing (in most cultures) are composed of the digits 0..9 times a power of 10. Of course, if the exponents are negative, then the position in question is part of a rational number (a fraction), rather than an integer.

So, how do you represent 1/10th -- (2 * 5)^-1 in terms of prime factors -- as a sum of negative powers of two? Not very well, it turns out. The table below tries to estimate 1/10th using an increasing number of binary digits.

Binary Digits (as fractions) Decimal Equivalent Error
(vs target = 0.1)
0/20.0-0.1
0/2 + 0/40.0-0.1
0/2 + 0/4 + 1/80.125+0.025
0/2 + 0/4 + 1/8 + 0/160.125+0.025
0/2 + 0/4 + 0/8 + 1/16 + 1/320.09375-0.00625
0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 1/64 0.109375+0.009375 (oops! that's worse!)
0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 0.09375-0.00625
0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128 0.1015625+0.0015625

You can come as close as you like by adding more digits, but you cannot get an exact match for a decimal fraction with binary. Ask a real mathematician: 2 and 5 are both prime numbers, and trying to create the sum of various powers of one out of the sum of various powers of another is just not going to work. Not exactly, anyway.


Notes:

  1. Yes, about the only place you are likely to see mils is at the gas station. This probably made sense when gasoline was thirty cents a gallon, not so much when it is $3. You might also use mils on line items for measuring out cheap, bulk, commodities. A mil is of course 1/1000 of a dollar.

  2. In particularly impoverished environments, in a pinch, you can fake it by storing values such as mils in the biggest integer type you can find, being oh so careful to do the scaling division and multiplication at the right time, but I can't say I recommend this (come on, at least slap a class, or other abstract data type mechansim, around it)



Contact me:
r
o
b
o
p
r
o
g
@
yahoo.com

Copyright 2010, Robin R Anderson