Data can have characteristic that surprise us. One such surprise is that across numbers…
Background and Aims
Data can have characteristic that surprise us. One such surprise is that across numbers
representing many types of data the first digit has a systematic and predictable distribution, in particular it
has a log distribution such that about 30% of the time the first digit is 1, 18% it is 2, down to 5% for the
digit 9. This was first shown by the engineer Frank Benford, so this first digit distribution is known as
Benford’s law. Since he demonstrated it in Benford (1938) it has been shown to hold for a large amount
Nigrini (1999) reports on the use of Benford’s law as a tool for detecting fraud. If lots of normal
financial data conforms Benford’s law, then deviation from Benford’s law could be evidence of human
manipulation of the data. He presents examples of Benford’s law being used successfully to detect fraud.
However this application of Benford’s law assumes that people do not produce numbers that fit to
Benford’s law when they try to generate data. In the experiment we ran in ATHK1001 tutorials in Week 2
we tested this assumption by having participants generate numerical answers to question that few knew
the answer to. As well as general knowledge question, we asked participants to estimate quantities. If
these numbers follow Benford’s law then it may be harder to use Benford’s law to detect fraud because it
would demonstrate that people may naturally conform to the law, at least under some circumstances.
Nigrini (1999) also pointed out that not all numbers fit to Benford’s law, in particular, arbitrary
numbers like receipt numbers would not be expected to. So in this experiment we tested if people’s
responses would also show a distinction between meaningful and meaningless numbers.
If people do generate data that tends to conform to Benford’s law then there may individuals who
do so more than others, and they may be consistent across tasks. Therefore we examined whether there
was a correlation between a measure of how close individuals were to Benford’s law for meaningful
items and estimation items.