By Emily Chen
Hereâ€™s a puzzle for you: what does the graph on the left represent?
The level of motivation to fulfill New Yearâ€™s resolutions over the month of January?Â Christmas cheer vs. the number of times the same song is repeated on the radio?Â
Though the above explanations are plausible, the graph is actually an illustration of Benfordâ€™s Law, one of the most mysteriously universal rules known to scientists today.Â Benfordâ€™s Law, in short, describes the probability that the first digit of a long number, such as 123456, is a specific number (in this case, the first digit is 1.)Â As the graph illustrates, when you have a large sample of numbers, the 1 is the first digit about 30% of the time, 2 is the digit about 18% of the time, and so on. This seems very strange initially—after all, why should certain numbers appear as first digits more than others? This search for that answer has a history spanning over a century.
In 1881, when scientists still used logarithm tables, a particularly observant astronomer named Simon Newcomb noticed that the pages in the beginning of the books of log tables were more worn than the pages at the end.Â Newcomb worked on describing this phenomenon mathematically, and published his model, proposing that the probability of a number, N, being the first digit of a longer number, was log(N+1) – log(N). For Newcomb, it both started and ended with his book of log tables.
More than fifty years later, the physicist Frank Benford decided to put Newcombâ€™s model to the test.Â He used over 20,000 numbers from a wide variety of sources, including Readerâ€™s Digest, population sizes, and the surface areas of rivers.Â His results confirmed Newcombâ€™s model.Â And as it happens in history, the law was named not after its originator but its publicizer.
Benfordâ€™s law applies to a huge variety of data sets, and has been supported by datasets ranging from genomic data to addresses. Indeed, Benfordâ€™s law has been verified to such an extent that violations of it may be a sign of fraud. It is used in audits to check for fudged or fabricated numbers, and as the applications grow, more sophisticated models involving predictions of the second or third digits have been developed. Â
We canâ€™t go applying Benfordâ€™s law willy-nilly, though. In datasets where there is a significant psychological component, for example, the data may be skewed toward â€œsimplerâ€ numbers such as multiples of 5 or 10.Â For example, if a group of volunteers were asked to list how many minutes of sleep they got a night, they might list multiples of 60 (an hour) or 30 (half an hour) more often.Â And for certain distributions, such as the distribution of heights or IQs across populations, Benfordâ€™s law simply doesnâ€™t apply. However, Benfordâ€™s law does work very well for sets of numbers resulting from calculations or combinations of other numbers.Â So if you were to multiply the IQ of each person by their height to form a new set of numbers, Benfordâ€™s law might well apply. And of course, the larger the dataset, the more closely mathematical trends (like Benfordâ€™s law) fit the data.
So now we come to be big question: why does Benfordâ€™s law work, and why does it apply to so many different datasets? We still donâ€™t know for sure. But scientists, being scientists, have come up with a few possible explanations.Â
One possibility is that Benfordâ€™s law results from the prevalence of functions involving exponential growth. But though this explains some instances where it applies, it leaves many other questions unanswered.Â Â Another explanation is that Benfordâ€™s law applies to combinations of certain distributions that occur naturally (the distributions, like height and IQ, that Benfordâ€™s Law doesnâ€™t apply to!) and we see these combinations in many datasets. This, too, brings up a lot of questions.Â
The answer we are left with is one that is both exciting and frustrating to scientists: we donâ€™t know. But never before has mankind had access to such large datasets and such powerful mathematical modeling tools. Perhaps we will be the ones to finally crack the mystery of Benfordâ€™s Law.
Note: For the more mathematically inclined, Terence Tao wrote an excellent blog post on Benfordâ€™s Law and other universal laws, which can be found here:Â http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/.Â
For the not-so-mathematically -inclined, the application of Benfordâ€™s Law to tax fraud is demonstrated here: http://www.intuitor.com/statistics/Benford’s%20Law.html
Copyright 2012 Yiqing Chen.