A Distribution Approach to Fraud Detection
Say the Dow Jones Industrial Average rose on average 20% per year (I am not hallucinating, just want to make a point). If the Dow Jones average started at 1000, in order to get to 2000, a 1000 point increase, it would have to rise 100%. That is a very large percentage increase and would require about 5 years (simple growth not compounded). But if the Dow was at 9000 and experienced 20% growth in order to get to 10,000, also a 1000 point increase, it would take about 6.5 months. That means that the number 1, the first digit in the Dow Jones at 1000 would appear as the first digit for about 5 years and the number 9, the first digit in the Dow Jones at 9000 would have the honor of being in that position for about 6 months.
If you mapped out all the digits that could appear as the first digit in the Dow, 1, 2, 3, 4, 5, 6, 7, 8, 9, and the time duration in which they would hold that position, with a fixed rate of growth, you would find that the duration of each digit from 1 to 9 would conform to a logarithmic scale. The number 1 would be the first digit about 30% of the time, the number 2 would be the first digit about 17.6% of the time, etc. A table showing the likelihood of the duration of how long each digit would be the first digit follows:
1 – 30.1%
2 - 17.6%
3 - 12.5%
4 - 9.7%
5 – 7.9%
6 – 6.7%
7 – 5.8%
8 – 5.1%
9 – 4.6%
This is a logarithmic scale and the notion that this scale can be applied as an explanation of various naturally occurring events was first elucidated by Simon Newcomb in 1881 and then by Frank Benford, a GE physicist in 1938 and it became known as Benford’s Law. The application of this law to the real world is fairly astounding and Dr. Benford applied his law to 22,229 sets of numbers including topics as unrelated as areas of rivers, baseball statistics, street addresses and numbers appearing in magazine articles.
Here is one example. If there were 100 bacteria in a sample and the bacteria could double in number each day, it would take the first full day for the number of bacteria to move from 100 to 200, finally reaching the number 2 as the first digit at the end of the first day. But on the second day it would double from 200 to 400, moving past the number 3 as the leading digit fairly quickly, and on the third day from 400 to 800, moving past the numbers 5, 6, and 7 even faster. Each higher digit would spend less and less time as the leading digit. As the order of magnitude of the number increased the pattern would repeat and it would again take a full day to move from 1000 to 2000, or 10,000 to 20,000 or 100,000 to 200,000, in total once again giving the number 1 the lead position about 30 percent of the time.
Here is another example. Say there were a group of 100 people and you split the group into two equal groups. One group is to toss a coin 200 times and write down whether it came up heads or tails. The second group is going to try to perpetuate a fraud. They will skip the toss and just write down heads or tails 200 times. A knowing eye looking at the patterns of responses will likely be able to determine to which group each person was assigned. While most people know that over 200 coin tosses the end results would be roughly a 50/50 split between heads and tails, they are unaware of patterns likely to emerge during those 200 coin tosses, patterns which Benford’s Law predicts. Applying Benford’s Law to the written down list of heads vs. tails invariably correctly categorizes the participants in the experiment, those who honestly tossed the coin vs. those who attempted to commit the fraud.
Benford’s law has been used in forensic accounting to detect fraud as the pattern of the fraud typically deviates from the patterns that one would expect naturally to emerge in a set of books at a macro-level. It works at a mirco-level as well and in expense reports the number 24 tends to show up more often than expected due to the policy that many companies have that receipts are required for amounts over $25. This results in a greater number of receipts being submitted, than would naturally occur for $24 dollars and change.
Most recently Benford’s Law has been used by Walter Mebane a statistician at the University of Michigan to demonstrate that the last Iranian election was likely fraudulent. Mebane has studied election results from the USA, Russia and Mexico and has demonstrated that they conform to Benford’s Law in the second digit. He indicated that in any fair election there are a certain number of votes that are invalid or otherwise have to be discarded. In fraudulent elections, those stuffing the ballot box for their candidate, often fail to add the appropriate number of invalid ballots. When he analyzed the results from Iranian polling stations he found that the number of votes cast for Ahmadinejad and two of the minor candidates didn’t conform to Benford’s Law at all. And in fact in 172 out of 320 polling locations where he had data the election results did not conform to the statistical law’s expectations. He is careful to say that this is not a method that proves with 100% certainty that fraud took place but it increases the chances of the results being fraudulent and worthy of follow-up analysis. While it may not matter in the long run as you could argue that both candidates were hand selected to run anyway, so by definition it was not a free and fair election even before the voting took place, if you are going to cheat you should at least do your homework so you are perhaps less likely to get caught.
I was drawn into this evolving story and explored the technique a bit to see if it was applicable to detecting fraud in survey results. One worry that some client’s express is that the fix was in and their survey results are not to be believed because someone, a disgruntled party, perhaps a union, told all their members to respond negatively. While these concerns are most of the time unfounded, a conclusion I draw based on other analyses I have run, I am afraid that this particular technique for uncovering fraud will not provide conclusive evidence one way or the other regarding survey fraud, since the typical survey scale is not logarithmic in character. However, if this was a pressing issue for any particular client, it may be possible to change the scale to one that is logarithmic in nature.
© 2010 by Jeffrey M. Saltzman. All rights reserved.
Visit OV: http://www.orgvitality.com