The Benford’s Law

Business Insider produced a short video highlighting the technique called Benford’s law. (click here for the video). It is only two-minute long. Just in the case the link is broken or removed in the future, the following is a transcription of the first 47 seconds.

Want to be amazed? Take a look at a big natural dataset. For example, the population of every county in the U.S. If you look at all 3000+ values, you will find that the first digit is 1 about 30% of the time, the first digit is 2 about 18% of the time and 9 only about 5% of the time. One shows up as a first digit about 6 times as much as 9. What? This is called the Benford’s law. …. Benford’s law actually has practical applications for business because it’s also true for many datasets such as company financial statements. ……

The video shows a graph similar to the following graph.

Figure 1 – Benford’s Law

Benford’s law describes a phenomenon that is observed in many real life datasets. The Business Insider video uses the population counts of the 3000+ counties in the United States. The Benford’s describes the first digits amazingly accurately on many natural data sets. The most striking thing about the law is the lopsided frequencies in favor of the lower digits, with 1 showing up around 30% of the times and with the frequencies progressively getting smaller.

In addition to population data, the law also applies to geographic data (e.g. areas of rivers), baseball statistics, numbers in magazine articles, numbers in street addresses, house prices and stock market trading data (prices and trading volumes).

Best of all, the Benford’s law is also applicable to financial data – income tax data, corporate expense data, corporate financial statements. To demonstrate, the following graph is the summary of the financial statements of Google according to first digits (found here).

Figure 2 – First Digits in Goolge’s Financial Statements

The above graph is based on the annual balance sheets, cashflow statements and income statements for Google in the period 2013 to 2016. The graph is a summary of 277 first digits. The digit 1 appears about 35.7% of the time in these financial statements, which is more than what the Benford’s law calls for. However, the overall shape of the graph agrees with the Benford’s law. The following shows the two bar graphs side by side.

Figure 3 – Side by Side Comparison of Google and Benford

There are clear differences between the distribution of first digits of Google and the theoretical distribution of Benford’s law. Google is higher than Benford’s on some digits and lower is other digits. For example, there are more 1’s in Google’s statements than predicted by Benford’s law. However, the overall shape of the Google distribution is in agreement with the Benford’s law.

Do the Google’s first digits differ significantly from Benford’s law? For example, in Google’s statements, the first digit is 1 about 35% of the time versus 30% that is predicted by Benford’s law. Is this difference statistically significant? Using the chi-squared test, we found that the observed differences are not significant and that there is reason to believe that the Google’s first digits follow the Benford’s law. The details of the comparison are given in the last section below.

The Benford’s law does not describe all data. It certainly does not fit numbers that are randomly generated (e.g. lottery numbers). Even some naturally generated numbers do not follow the Benford’s law. Examples include heights of human adults and IQ scores. Data for which the Benford’s law is applicable tend to spread out across multiple orders of magnitude. For example, in such data sets, numbers would migrate through 10, 100, 1,000, 10,000 and 100,000, etc. For example, the figures in Google’s statements are in millions. The numbers in these statements are in these ranges: millions, tens of millions, hundreds of millions, thousands of millions, and tens of thousands of millions. In contrast, heights of human adults are within one order of magnitude since heights are usually less than 100 inches (about 8 feet).

Why do financial figures and numbers in other data sets favor the lower digits as suggested by the Benford’s law? One plausible explanation is that legitimate numbers stay with lower first digits in a longer period of time. Hence they are recorded more often. For example, let’s say a company’s current annual revenue is around $1 million and is currently growing at the rate of 10% a year. At that rate, it would take about 7 years to reach $2 million in revenue. Hence in these 7 years, the total revenue amounts would all start with the digit 1. When the annual revenue reaches $9 million, it would only take about one year to reach $10 million (assuming the same growth rate of 10% a year). So if the financial data are recorded correctly, there should be a skew toward the lower digits.



The comparison between Google’s financial statements and the Benford’s law lies at the heart of the reason why Benford’s law is a powerful tool for financial fraud detection. Compare the actual frequencies of the first digits in a set of financial statements in question with the predicted frequencies according to the Benford’s law. If the fraudster produced numbers that distribute across the digits fairly uniformly, such a simple comparison will raise a giant red flag. Thus anyone who fake data at random will not produce data that can withstand even a casual analysis such as the one shown in Figure 3.

Even when the first digits do not distribute evenly, too big of a discrepancy between the actual first digits and the Benford’s law (e.g. too few 1’s or too many 7’s, 8’s and 9’s) will be enough to raise suspicion, at which time the investigator can use more sophisticated tests for further evaluation.

The Ponzi scheme perpetrated by Bernie Madoff would have a constant need for faking data in order to keep up the appearance that legitimate investing was taking place. Auditors and regulators could have exposed the fraud sooner. The Benford’s law could be the tool to do that. In fact, this research paper compared the Benford’s law with the monthly returns of Fairfield Sentry Fund, a feeder fund that invested solely with Bernie Madoff. It found the first digits in the monthly returns from a 215-month period did not conform to the Benford’s law.

Bear in mind that whether the tested data are or are not close to the Benford’s law proves nothing. But too big of a discrepancy should raise suspicion. Then the investigator can further test or evaluate using more sophisticated methods.

There are other applications in addition to fraud detection and forensic accounting. According to this article. Benford’s law can be used to detect changes in natural processes (e.g. earthquake detection) and as a tool to assess the appropriateness of mathematical models. For example, since population counts in the counties of the United States follow the Benford’s law, the Benford’s law can be used to evaluate any proposed set of predicted population counts.


Further Information

Here is an abbreviated version of this blog post.

An Introduction to Benford’s Law is an excellent book on the Benford’s law. The Benford’s law is now widely known and widely used. Thus any Google search will yield a lot of information and discussion on this topic. This piece is a brief discussion on Benford’s law by one of the authors of the book just mentioned.

The author of this blog had previously on Benford’s law. These articles can be found here, here and here. The piece in the last link is a statistical comparison of the county population data and the Benford’s law. So it is a nice complement to the Business Insider video.

This link is the research paper indicated above on using Benford’s law to evaluate financial statements. This is a discussion on Benford’s law in This is an archived article from on Bernie Madoff’s Ponzi scheme. It describes in some details on its fake number operation.


Data on Google-Benford Comparison

The source of the Google data is here. The link provides quarterly as well as annual income statements, balance sheets and cashflow statements from 2013 to 2016. We use only annual statements. The following table is a summary of the first digits in these financial statements.

Summary of First Digits in Google’s Financial Statements

First Digit Google Google (%) Benford (%)
1 99 35.7% 30.1%
2 44 15.9% 17.6%
3 36 13.0% 12.5%
4 18 6.5% 9.7%
5 21 7.6% 7.9%
6 17 6.1% 6.7%
7 17 6.1% 5.8%
8 14 5.1% 5.1%
9 11 4.0% 4.6%
Total 277 100% 100%

The comparison (Figure 3 and the table) shows that the first digits in Google statements are close to the predicted percentages in the Benford’s law. However, there are noticeable discrepancies. The most obvious ones are on Digit 1 and Digit 4. Are these differences significant?

To find out, we use the chi-squared test. The chi-squared statistic is 6.833 with 8 degrees of freedom and the p-value is 0.55. When the p-value is small (close to zero), we conclude that the differences between the observed data and the Benford’s law percentages cannot be attributed to random chance alone, in which case we conclude that the data in question do not follow Benford’s law. But in this case, the p-value is large (0.55). So the conclusion is that the differences that we see between the observed percentages in the table and the expected percentages according to the Benford’s law are not sufficient evidence for us to believe that the first digits in Google’s financial statements do not follow the Benford’s law. For a more detailed discussion on using the chi-squared test, see this discussion on Benford’s law and census population counts.

\copyright 2017 – Dan Ma


Could Madoff be caught sooner?

In financial crimes, the physical evidences are often the data in financial statements and other documents. These data are faked with the purpose of giving an appearance of normalcy, that nothing is amiss or that the financial results are better than the actual reality. How can a detective uncover financial frauds based on data? This post underscores the need for vigilance on the part of everyone, from investors to financial intermediaries to regulators. The next post discusses a mathematical modeling that is a powerful and relatively simple tool for detecting potential financial frauds or errors.

Examples of financial frauds that may require deception using faked data include investment frauds. One type of investment frauds is that of Ponzi schemes. In such schemes, investors are promised a high rate of returns with little or no downside risks. The profits to the investors are usually very good at the beginning, which actually are not proceeds from legitimate investments but are funds collected from new investors. In other words, such schemes are one big lie and can only be sustained by having a constant stream of investors, essentially new suckers paying existing ones. When new money stops coming in, the game is over. There is definitely an incentive for the operators in these Ponzi schemes to put up the appearance of real investing, hence faking data.

The longest running Ponzi scheme that is also the largest in scope is the one perpetrated by Bernard L. Madoff.

Mugshot of Bernie Madoff

Madoff’s fraud scheme had gone on for decades with investors profiting handsomely year in and year out. It collapsed in December of 2008 when the great recession was underway. At that time, new moneys had slowed to a trickle. Then it was inevitable that the Ponzi scheme was exposed due to the fact that Madoff could not keep up with the avalanche of withdrawals.

On the book, the amount of losses suffered by the investors was estimated to be $50 billion. The exact amount of losses was hard to estimate since the amount of $50 billion included the fictitious profits reported to the clients. However, it is clear that the scope of the losses would be in the billions. Madoff started his investment fund in 1960. Though it is not clear when the investment fund became a Ponzi scheme, it is clear that the fraud went on for decades. To perpetrate the fraud in such massive scale and for so long, Madoff must have a team of people who helped him create the appearance that there were returns and helped him forge books, and file reports.

Administratively, the fraudulent enterprise was a massive undertaking that included a constant need for faking numbers. For example, the statements to the clients would show the trading activities with the trade prices and volumes. The faked numbers had to be good enough to look believable at least to the clients who do not have to professional expertise to scrutinize or who chose not to look closely. How about the professionals? Can the professional experts detect the frauds by poring over the statements and other documents?

Could Madoff be stopped sooner? The answer is almost certainly yes. If he was exposed 10 years earlier, many more families would be saved from financial ruins and emotional devastation.

The investment world is a competitive industry. The consistent and outsize returns of Madoff always raised suspicion among the competitors. Naturally the competitors would love the replicate the same kind of returns of Madoff. One analyst, Harry Markopolos, failed to find a way to replicate and concluded that Madoff’s scheme was either a Ponzi scheme or front running (buying stock for his own account based on knowledge of his clients’ orders). He alerted the Security and Exchange Commission (SEC) numerous times.

There are numerous other red flags that were raised over the course of the years. See the Wikipedia entry on Madoff’s investment scheme for details. Numerous entities, from SEC to the feeder funds the channel money to Madoff, all failed to detect the frauds. Maybe they chose not to look closely. In the case of the feeder funds (these are intermediaries that steered investors to Madoff), they had the incentive to not look closely. One such feeder was Fairfield Greenwich Group. They did not want to rock the boat. The gravy train was too good to pass up.

According to the Wikipedia entry on Madoff’s investment scheme, Madoff Securities LLC was investigated at least eight times over a 16-year period by the U.S. Securities and Exchange Commission (SEC) and other regulatory authorities. SEC investigated Madoff several times. In each instance, either Madoff was cleared or the investigation resulted in neither a finding of fraud nor a referral to the SEC Commissioners for legal action. Why did SEC not not uncover the fraud? Was it because Madoff covered his tracks so well that even the experts in SEC could not see anything wrong? Or they chose to not look too closely?

There are many intermediary entities involved in the Madoff’s fraud scheme, from the feeder funds to banks. The bulk of Madoff’s money was deposited at JP Morgan Chase. The top-notch bankers at Chase failed to detect anything wrong with Madoff either. Why? The fees were too good to pass off. Like the feeder funds, the gravy train was too good to pass up. Nonetheless, Chase was included in a law suit filed by Irving Picard, the court appointed trustee in charge of recovering assets from the Madoff investment scandal. According to the suit, Chase “was at the very center of that fraud and thoroughly complicit in it.” As a sophisticated financial institution, JPMorgan was “uniquely situated to see the likely fraud.” The dark role played by the bank and the feeder funds are described in this Fox Business article.

The investors in Madoff’s scheme are at a double bind. They lost the investments, in many cases their life savings. At best they can only recover a small part of the investments (only if they are direct investors with Madoff). If they are investors via a feeder fund, they are out of luck. Any legal action for them would likely be dragged on for years.

Could Madoff be caught sooner? The tools are available. Only if the people involved are willing to use them.

Mathematical modeling plays a pivotal role in detecting financial frauds, in particular, in raising red flags about fraudulent numbers. There are actually “simple to understand” and “simple to use” tools that are very effective (tools that SEC probably chose not to use). The next post is on Benford’s law, which is a great frontline tool for fraud detection and forensic accounting.

\copyright 2017 – Dan Ma