The Benford’s Law

Business Insider produced a short video highlighting the technique called Benford’s law. (click here for the video). It is only two-minute long. Just in the case the link is broken or removed in the future, the following is a transcription of the first 47 seconds.

Want to be amazed? Take a look at a big natural dataset. For example, the population of every county in the U.S. If you look at all 3000+ values, you will find that the first digit is 1 about 30% of the time, the first digit is 2 about 18% of the time and 9 only about 5% of the time. One shows up as a first digit about 6 times as much as 9. What? This is called the Benford’s law. …. Benford’s law actually has practical applications for business because it’s also true for many datasets such as company financial statements. ……

The video shows a graph similar to the following graph.

Figure 1 – Benford’s Law

Benford’s law describes a phenomenon that is observed in many real life datasets. The Business Insider video uses the population counts of the 3000+ counties in the United States. The Benford’s describes the first digits amazingly accurately on many natural data sets. The most striking thing about the law is the lopsided frequencies in favor of the lower digits, with 1 showing up around 30% of the times and with the frequencies progressively getting smaller.

In addition to population data, the law also applies to geographic data (e.g. areas of rivers), baseball statistics, numbers in magazine articles, numbers in street addresses, house prices and stock market trading data (prices and trading volumes).

Best of all, the Benford’s law is also applicable to financial data – income tax data, corporate expense data, corporate financial statements. To demonstrate, the following graph is the summary of the financial statements of Google according to first digits (found here).

Figure 2 – First Digits in Goolge’s Financial Statements

The above graph is based on the annual balance sheets, cashflow statements and income statements for Google in the period 2013 to 2016. The graph is a summary of 277 first digits. The digit 1 appears about 35.7% of the time in these financial statements, which is more than what the Benford’s law calls for. However, the overall shape of the graph agrees with the Benford’s law. The following shows the two bar graphs side by side.

Figure 3 – Side by Side Comparison of Google and Benford

There are clear differences between the distribution of first digits of Google and the theoretical distribution of Benford’s law. Google is higher than Benford’s on some digits and lower is other digits. For example, there are more 1’s in Google’s statements than predicted by Benford’s law. However, the overall shape of the Google distribution is in agreement with the Benford’s law.

Do the Google’s first digits differ significantly from Benford’s law? For example, in Google’s statements, the first digit is 1 about 35% of the time versus 30% that is predicted by Benford’s law. Is this difference statistically significant? Using the chi-squared test, we found that the observed differences are not significant and that there is reason to believe that the Google’s first digits follow the Benford’s law. The details of the comparison are given in the last section below.

The Benford’s law does not describe all data. It certainly does not fit numbers that are randomly generated (e.g. lottery numbers). Even some naturally generated numbers do not follow the Benford’s law. Examples include heights of human adults and IQ scores. Data for which the Benford’s law is applicable tend to spread out across multiple orders of magnitude. For example, in such data sets, numbers would migrate through 10, 100, 1,000, 10,000 and 100,000, etc. For example, the figures in Google’s statements are in millions. The numbers in these statements are in these ranges: millions, tens of millions, hundreds of millions, thousands of millions, and tens of thousands of millions. In contrast, heights of human adults are within one order of magnitude since heights are usually less than 100 inches (about 8 feet).

Why do financial figures and numbers in other data sets favor the lower digits as suggested by the Benford’s law? One plausible explanation is that legitimate numbers stay with lower first digits in a longer period of time. Hence they are recorded more often. For example, let’s say a company’s current annual revenue is around \$1 million and is currently growing at the rate of 10% a year. At that rate, it would take about 7 years to reach \$2 million in revenue. Hence in these 7 years, the total revenue amounts would all start with the digit 1. When the annual revenue reaches \$9 million, it would only take about one year to reach \$10 million (assuming the same growth rate of 10% a year). So if the financial data are recorded correctly, there should be a skew toward the lower digits.

__________________________________________________________________________

Applications

The comparison between Google’s financial statements and the Benford’s law lies at the heart of the reason why Benford’s law is a powerful tool for financial fraud detection. Compare the actual frequencies of the first digits in a set of financial statements in question with the predicted frequencies according to the Benford’s law. If the fraudster produced numbers that distribute across the digits fairly uniformly, such a simple comparison will raise a giant red flag. Thus anyone who fake data at random will not produce data that can withstand even a casual analysis such as the one shown in Figure 3.

Even when the first digits do not distribute evenly, too big of a discrepancy between the actual first digits and the Benford’s law (e.g. too few 1’s or too many 7’s, 8’s and 9’s) will be enough to raise suspicion, at which time the investigator can use more sophisticated tests for further evaluation.

The Ponzi scheme perpetrated by Bernie Madoff would have a constant need for faking data in order to keep up the appearance that legitimate investing was taking place. Auditors and regulators could have exposed the fraud sooner. The Benford’s law could be the tool to do that. In fact, this research paper compared the Benford’s law with the monthly returns of Fairfield Sentry Fund, a feeder fund that invested solely with Bernie Madoff. It found the first digits in the monthly returns from a 215-month period did not conform to the Benford’s law.

Bear in mind that whether the tested data are or are not close to the Benford’s law proves nothing. But too big of a discrepancy should raise suspicion. Then the investigator can further test or evaluate using more sophisticated methods.

There are other applications in addition to fraud detection and forensic accounting. According to this article. Benford’s law can be used to detect changes in natural processes (e.g. earthquake detection) and as a tool to assess the appropriateness of mathematical models. For example, since population counts in the counties of the United States follow the Benford’s law, the Benford’s law can be used to evaluate any proposed set of predicted population counts.

__________________________________________________________________________

Further Information

Here is an abbreviated version of this blog post.

An Introduction to Benford’s Law is an excellent book on the Benford’s law. The Benford’s law is now widely known and widely used. Thus any Google search will yield a lot of information and discussion on this topic. This piece is a brief discussion on Benford’s law by one of the authors of the book just mentioned.

The author of this blog had previously on Benford’s law. These articles can be found here, here and here. The piece in the last link is a statistical comparison of the county population data and the Benford’s law. So it is a nice complement to the Business Insider video.

This link is the research paper indicated above on using Benford’s law to evaluate financial statements. This is a discussion on Benford’s law in forbes.com. This is an archived article from forbes.com on Bernie Madoff’s Ponzi scheme. It describes in some details on its fake number operation.

__________________________________________________________________________

The source of the Google data is here. The link provides quarterly as well as annual income statements, balance sheets and cashflow statements from 2013 to 2016. We use only annual statements. The following table is a summary of the first digits in these financial statements.

Summary of First Digits in Google’s Financial Statements

1 99 35.7% 30.1%
2 44 15.9% 17.6%
3 36 13.0% 12.5%
4 18 6.5% 9.7%
5 21 7.6% 7.9%
6 17 6.1% 6.7%
7 17 6.1% 5.8%
8 14 5.1% 5.1%
9 11 4.0% 4.6%
Total 277 100% 100%

The comparison (Figure 3 and the table) shows that the first digits in Google statements are close to the predicted percentages in the Benford’s law. However, there are noticeable discrepancies. The most obvious ones are on Digit 1 and Digit 4. Are these differences significant?

To find out, we use the chi-squared test. The chi-squared statistic is 6.833 with 8 degrees of freedom and the p-value is 0.55. When the p-value is small (close to zero), we conclude that the differences between the observed data and the Benford’s law percentages cannot be attributed to random chance alone, in which case we conclude that the data in question do not follow Benford’s law. But in this case, the p-value is large (0.55). So the conclusion is that the differences that we see between the observed percentages in the table and the expected percentages according to the Benford’s law are not sufficient evidence for us to believe that the first digits in Google’s financial statements do not follow the Benford’s law. For a more detailed discussion on using the chi-squared test, see this discussion on Benford’s law and census population counts.

__________________________________________________________________________
$\copyright$ 2017 – Dan Ma