The Benford’s Law

Business Insider produced a short video highlighting the technique called Benford’s law. (click here for the video). It is only two-minute long. Just in the case the link is broken or removed in the future, the following is a transcription of the first 47 seconds.

Want to be amazed? Take a look at a big natural dataset. For example, the population of every county in the U.S. If you look at all 3000+ values, you will find that the first digit is 1 about 30% of the time, the first digit is 2 about 18% of the time and 9 only about 5% of the time. One shows up as a first digit about 6 times as much as 9. What? This is called the Benford’s law. …. Benford’s law actually has practical applications for business because it’s also true for many datasets such as company financial statements. ……

The video shows a graph similar to the following graph.

Figure 1 – Benford’s Law

Benford’s law describes a phenomenon that is observed in many real life datasets. The Business Insider video uses the population counts of the 3000+ counties in the United States. The Benford’s describes the first digits amazingly accurately on many natural data sets. The most striking thing about the law is the lopsided frequencies in favor of the lower digits, with 1 showing up around 30% of the times and with the frequencies progressively getting smaller.

In addition to population data, the law also applies to geographic data (e.g. areas of rivers), baseball statistics, numbers in magazine articles, numbers in street addresses, house prices and stock market trading data (prices and trading volumes).

Best of all, the Benford’s law is also applicable to financial data – income tax data, corporate expense data, corporate financial statements. To demonstrate, the following graph is the summary of the financial statements of Google according to first digits (found here).

Figure 2 – First Digits in Goolge’s Financial Statements

The above graph is based on the annual balance sheets, cashflow statements and income statements for Google in the period 2013 to 2016. The graph is a summary of 277 first digits. The digit 1 appears about 35.7% of the time in these financial statements, which is more than what the Benford’s law calls for. However, the overall shape of the graph agrees with the Benford’s law. The following shows the two bar graphs side by side.

Figure 3 – Side by Side Comparison of Google and Benford

There are clear differences between the distribution of first digits of Google and the theoretical distribution of Benford’s law. Google is higher than Benford’s on some digits and lower is other digits. For example, there are more 1’s in Google’s statements than predicted by Benford’s law. However, the overall shape of the Google distribution is in agreement with the Benford’s law.

Do the Google’s first digits differ significantly from Benford’s law? For example, in Google’s statements, the first digit is 1 about 35% of the time versus 30% that is predicted by Benford’s law. Is this difference statistically significant? Using the chi-squared test, we found that the observed differences are not significant and that there is reason to believe that the Google’s first digits follow the Benford’s law. The details of the comparison are given in the last section below.

The Benford’s law does not describe all data. It certainly does not fit numbers that are randomly generated (e.g. lottery numbers). Even some naturally generated numbers do not follow the Benford’s law. Examples include heights of human adults and IQ scores. Data for which the Benford’s law is applicable tend to spread out across multiple orders of magnitude. For example, in such data sets, numbers would migrate through 10, 100, 1,000, 10,000 and 100,000, etc. For example, the figures in Google’s statements are in millions. The numbers in these statements are in these ranges: millions, tens of millions, hundreds of millions, thousands of millions, and tens of thousands of millions. In contrast, heights of human adults are within one order of magnitude since heights are usually less than 100 inches (about 8 feet).

Why do financial figures and numbers in other data sets favor the lower digits as suggested by the Benford’s law? One plausible explanation is that legitimate numbers stay with lower first digits in a longer period of time. Hence they are recorded more often. For example, let’s say a company’s current annual revenue is around $1 million and is currently growing at the rate of 10% a year. At that rate, it would take about 7 years to reach $2 million in revenue. Hence in these 7 years, the total revenue amounts would all start with the digit 1. When the annual revenue reaches $9 million, it would only take about one year to reach $10 million (assuming the same growth rate of 10% a year). So if the financial data are recorded correctly, there should be a skew toward the lower digits.



The comparison between Google’s financial statements and the Benford’s law lies at the heart of the reason why Benford’s law is a powerful tool for financial fraud detection. Compare the actual frequencies of the first digits in a set of financial statements in question with the predicted frequencies according to the Benford’s law. If the fraudster produced numbers that distribute across the digits fairly uniformly, such a simple comparison will raise a giant red flag. Thus anyone who fake data at random will not produce data that can withstand even a casual analysis such as the one shown in Figure 3.

Even when the first digits do not distribute evenly, too big of a discrepancy between the actual first digits and the Benford’s law (e.g. too few 1’s or too many 7’s, 8’s and 9’s) will be enough to raise suspicion, at which time the investigator can use more sophisticated tests for further evaluation.

The Ponzi scheme perpetrated by Bernie Madoff would have a constant need for faking data in order to keep up the appearance that legitimate investing was taking place. Auditors and regulators could have exposed the fraud sooner. The Benford’s law could be the tool to do that. In fact, this research paper compared the Benford’s law with the monthly returns of Fairfield Sentry Fund, a feeder fund that invested solely with Bernie Madoff. It found the first digits in the monthly returns from a 215-month period did not conform to the Benford’s law.

Bear in mind that whether the tested data are or are not close to the Benford’s law proves nothing. But too big of a discrepancy should raise suspicion. Then the investigator can further test or evaluate using more sophisticated methods.

There are other applications in addition to fraud detection and forensic accounting. According to this article. Benford’s law can be used to detect changes in natural processes (e.g. earthquake detection) and as a tool to assess the appropriateness of mathematical models. For example, since population counts in the counties of the United States follow the Benford’s law, the Benford’s law can be used to evaluate any proposed set of predicted population counts.


Further Information

Here is an abbreviated version of this blog post.

An Introduction to Benford’s Law is an excellent book on the Benford’s law. The Benford’s law is now widely known and widely used. Thus any Google search will yield a lot of information and discussion on this topic. This piece is a brief discussion on Benford’s law by one of the authors of the book just mentioned.

The author of this blog had previously on Benford’s law. These articles can be found here, here and here. The piece in the last link is a statistical comparison of the county population data and the Benford’s law. So it is a nice complement to the Business Insider video.

This link is the research paper indicated above on using Benford’s law to evaluate financial statements. This is a discussion on Benford’s law in This is an archived article from on Bernie Madoff’s Ponzi scheme. It describes in some details on its fake number operation.


Data on Google-Benford Comparison

The source of the Google data is here. The link provides quarterly as well as annual income statements, balance sheets and cashflow statements from 2013 to 2016. We use only annual statements. The following table is a summary of the first digits in these financial statements.

Summary of First Digits in Google’s Financial Statements

First Digit Google Google (%) Benford (%)
1 99 35.7% 30.1%
2 44 15.9% 17.6%
3 36 13.0% 12.5%
4 18 6.5% 9.7%
5 21 7.6% 7.9%
6 17 6.1% 6.7%
7 17 6.1% 5.8%
8 14 5.1% 5.1%
9 11 4.0% 4.6%
Total 277 100% 100%

The comparison (Figure 3 and the table) shows that the first digits in Google statements are close to the predicted percentages in the Benford’s law. However, there are noticeable discrepancies. The most obvious ones are on Digit 1 and Digit 4. Are these differences significant?

To find out, we use the chi-squared test. The chi-squared statistic is 6.833 with 8 degrees of freedom and the p-value is 0.55. When the p-value is small (close to zero), we conclude that the differences between the observed data and the Benford’s law percentages cannot be attributed to random chance alone, in which case we conclude that the data in question do not follow Benford’s law. But in this case, the p-value is large (0.55). So the conclusion is that the differences that we see between the observed percentages in the table and the expected percentages according to the Benford’s law are not sufficient evidence for us to believe that the first digits in Google’s financial statements do not follow the Benford’s law. For a more detailed discussion on using the chi-squared test, see this discussion on Benford’s law and census population counts.

\copyright 2017 – Dan Ma


Could Madoff be caught sooner?

In financial crimes, the physical evidences are often the data in financial statements and other documents. These data are faked with the purpose of giving an appearance of normalcy, that nothing is amiss or that the financial results are better than the actual reality. How can a detective uncover financial frauds based on data? This post underscores the need for vigilance on the part of everyone, from investors to financial intermediaries to regulators. The next post discusses a mathematical modeling that is a powerful and relatively simple tool for detecting potential financial frauds or errors.

Examples of financial frauds that may require deception using faked data include investment frauds. One type of investment frauds is that of Ponzi schemes. In such schemes, investors are promised a high rate of returns with little or no downside risks. The profits to the investors are usually very good at the beginning, which actually are not proceeds from legitimate investments but are funds collected from new investors. In other words, such schemes are one big lie and can only be sustained by having a constant stream of investors, essentially new suckers paying existing ones. When new money stops coming in, the game is over. There is definitely an incentive for the operators in these Ponzi schemes to put up the appearance of real investing, hence faking data.

The longest running Ponzi scheme that is also the largest in scope is the one perpetrated by Bernard L. Madoff.

Mugshot of Bernie Madoff

Madoff’s fraud scheme had gone on for decades with investors profiting handsomely year in and year out. It collapsed in December of 2008 when the great recession was underway. At that time, new moneys had slowed to a trickle. Then it was inevitable that the Ponzi scheme was exposed due to the fact that Madoff could not keep up with the avalanche of withdrawals.

On the book, the amount of losses suffered by the investors was estimated to be $50 billion. The exact amount of losses was hard to estimate since the amount of $50 billion included the fictitious profits reported to the clients. However, it is clear that the scope of the losses would be in the billions. Madoff started his investment fund in 1960. Though it is not clear when the investment fund became a Ponzi scheme, it is clear that the fraud went on for decades. To perpetrate the fraud in such massive scale and for so long, Madoff must have a team of people who helped him create the appearance that there were returns and helped him forge books, and file reports.

Administratively, the fraudulent enterprise was a massive undertaking that included a constant need for faking numbers. For example, the statements to the clients would show the trading activities with the trade prices and volumes. The faked numbers had to be good enough to look believable at least to the clients who do not have to professional expertise to scrutinize or who chose not to look closely. How about the professionals? Can the professional experts detect the frauds by poring over the statements and other documents?

Could Madoff be stopped sooner? The answer is almost certainly yes. If he was exposed 10 years earlier, many more families would be saved from financial ruins and emotional devastation.

The investment world is a competitive industry. The consistent and outsize returns of Madoff always raised suspicion among the competitors. Naturally the competitors would love the replicate the same kind of returns of Madoff. One analyst, Harry Markopolos, failed to find a way to replicate and concluded that Madoff’s scheme was either a Ponzi scheme or front running (buying stock for his own account based on knowledge of his clients’ orders). He alerted the Security and Exchange Commission (SEC) numerous times.

There are numerous other red flags that were raised over the course of the years. See the Wikipedia entry on Madoff’s investment scheme for details. Numerous entities, from SEC to the feeder funds the channel money to Madoff, all failed to detect the frauds. Maybe they chose not to look closely. In the case of the feeder funds (these are intermediaries that steered investors to Madoff), they had the incentive to not look closely. One such feeder was Fairfield Greenwich Group. They did not want to rock the boat. The gravy train was too good to pass up.

According to the Wikipedia entry on Madoff’s investment scheme, Madoff Securities LLC was investigated at least eight times over a 16-year period by the U.S. Securities and Exchange Commission (SEC) and other regulatory authorities. SEC investigated Madoff several times. In each instance, either Madoff was cleared or the investigation resulted in neither a finding of fraud nor a referral to the SEC Commissioners for legal action. Why did SEC not not uncover the fraud? Was it because Madoff covered his tracks so well that even the experts in SEC could not see anything wrong? Or they chose to not look too closely?

There are many intermediary entities involved in the Madoff’s fraud scheme, from the feeder funds to banks. The bulk of Madoff’s money was deposited at JP Morgan Chase. The top-notch bankers at Chase failed to detect anything wrong with Madoff either. Why? The fees were too good to pass off. Like the feeder funds, the gravy train was too good to pass up. Nonetheless, Chase was included in a law suit filed by Irving Picard, the court appointed trustee in charge of recovering assets from the Madoff investment scandal. According to the suit, Chase “was at the very center of that fraud and thoroughly complicit in it.” As a sophisticated financial institution, JPMorgan was “uniquely situated to see the likely fraud.” The dark role played by the bank and the feeder funds are described in this Fox Business article.

The investors in Madoff’s scheme are at a double bind. They lost the investments, in many cases their life savings. At best they can only recover a small part of the investments (only if they are direct investors with Madoff). If they are investors via a feeder fund, they are out of luck. Any legal action for them would likely be dragged on for years.

Could Madoff be caught sooner? The tools are available. Only if the people involved are willing to use them.

Mathematical modeling plays a pivotal role in detecting financial frauds, in particular, in raising red flags about fraudulent numbers. There are actually “simple to understand” and “simple to use” tools that are very effective (tools that SEC probably chose not to use). The next post is on Benford’s law, which is a great frontline tool for fraud detection and forensic accounting.

\copyright 2017 – Dan Ma

A special day for Pi

What do you make of the following restaurant receipt? The diner left \pi as tips. A few things conspire to make this a “cute” receipt. The amount of the dinner ($26.86) is such that when it is rounded up to the nearest ten, the difference is the common approximation of the number Pi (3.14).

Maybe that dining experience happened on the 14th day of March or maybe not. However, the number Pi is in the news today (with today being March 14). This is a piece from This is a piece from This is a mention of Pi from last year.

The number Pi is purpose driven math, from mundane tasks such as building a dining room table to advanced calculation of orbits for space travel. In general, pi is used to study everything that cycles, e.g. movements that repeat at fixed intervals (think heart pulses and many periodic motions in the natural world).

There are other uses of Pi that are less obvious. One is that Pi is used is used in testing computer precision. The other is that the digits of Pi are used as random numbers since they behave like random numbers. Here’s is a discussion of the digits of Pi as random numbers in a companion blog.

\copyright 2017 – Dan Ma

Knowing fractions helps find better deals

Fractions may get a bad wrap. It is undeniable that one must have a good understanding of fractions in order to be good with with numbers. Here’s an example where knowing fractions will help a consumer find a better deal.

According to this article in New York Times, many consumers in the United States funked an “arithmetic test.” The article is a long one on math educational reform efforts in the United States. The following two paragraphs are crucial for the story here.

    One of the most vivid arithmetic failings displayed by Americans occurred in the early 1980s, when the A&W restaurant chain released a new hamburger to rival the McDonald’s Quarter Pounder. With a third-pound of beef, the A&W burger had more meat than the Quarter Pounder; in taste tests, customers preferred A&W’s burger. And it was less expensive. A lavish A&W television and radio marketing campaign cited these benefits. Yet instead of leaping at the great value, customers snubbed it.

    Only when the company held customer focus groups did it become clear why. The Third Pounder presented the American public with a test in fractions. And we failed. Misunderstanding the value of one-third, customers believed they were being overcharged. Why, they asked the researchers, should they pay the same amount for a third of a pound of meat as they did for a quarter-pound of meat at McDonald’s. The “4” in “¼,” larger than the “3” in “⅓,” led them astray.

It is undeniable that many of the consumers who participated in the focus groups conducted by A&W flunked the arithmetic test in question. How about the public at large? The failure of the new burger of Whataburger in the market place suggests that a large swath of the consuming public probably flunked the test too. The take-away for any company that conducts a marketing campaign: don’t give the consumers a math test. The take-away for the consumers who are confused with fractions: 1/3 is indeed bigger than 1/4. The following pictures show why.

\text{ }

\text{ }

Figure 1
This is a whole pizza. Only one slice. The relative size of a slice is 1.

Figure 2
Cut the pizza in two equal slices. The relative size of a slice is one-half or 1/2.

Figure 3
Cut the pizza in three equal slices. The relative size of a slice is one-third or 1/3.

Figure 4
Cut the pizza in four equal slices. The relative size of a slice is one-fourth or one-quarter or 1/4.

Figure 5
Put the pizza slices together from the smallest to the largest.

In the last picture, it is clear that one-third of a pizza is indeed larger than one-quarter of a pizza. Cutting a pizza, real pizza or pictures of pizza as we are doing here, makes this point clear. This relativity holds true in other settings too. One-third of a loaf of bread is bigger than one-fourth of the same loaf. Waiting at the doctor’s office for one-third hour is a longer wait than waiting for one-quarter of an hour. One-third of a gold bar is more valuable than one-fourth of a gold bar. One third of a million dollars is more money than one-quarter of a million dollars. Of course, one-third pound of beef has more meat than one-quarter pound of beef. The list can go on and on.

The insight about fractions can follow from a general reasoning too. The fraction 1/n refers to a division of a thing into n equal pieces. The more units the division produce, the smaller each piece will be (not bigger). Take pizza for example. If more people want to share a pizza equally, the smaller the slice will be for each person (not bigger).

In dividing a million dollars ($1,000,000) equally between two people, each person gets half a million dollars ($500,000), a very large sum of money. If the same million dollars are divided equally among one million people, each share will be just one dollar. The more people sharing a quantity will lead to a smaller share for each person.

So the larger the denominator in the fraction 1/n, the smaller the number 1/n.

Knowing how to work with numbers, fractions in particular, is purpose driven math. Such skills are indispensable to academic success and career success. The example with Whataburger shows that understanding fractions can help a consumer spot a good deal too.

This blog post is adapted from a post in a companion blog.

Elizabeth Green is the author of the New York Times article mentioned above. Here’s another NY Times article that discusses the article by Elizabeth Green.

\copyright 2017 – Dan Ma

Math has a starring role in the movie Hidden Figures

The focus of this post is on three women who practiced purpose driven math, specifically the kind of purpose driven math that takes human into space. The recent movie Hidden Figures, based on a book of the same name, tells their life stories. They are Mary Jackson, Dorothy Vaughan and Katherine Johnson, played by Janelle Monáe, Octavia Spencer and Taraji P. Henson in the movie, respectively. I talked about the movie and what the life stories of these trailblazers meant to me in a post in a companion blog. In this post, I would like to talk about their mathematical achievements, especially the work of Katherine Johnson.

This Scientific American article has a short biographical sketch for each of the three mathematicians. The focus here is on Katherine Johnson.

Back in the 1950s, National Advisory Committee for Aeronautics (NACA), the predecessor to NASA, hired African American females to work as “computers”, essentially performing and verifying calculations for the engineers. Johnson did not get hired at first. However, once she started working for NACA as a computer, she eventually stood out. She was always asking questions – the how, the why and the why not. She tried to take the tasks to a higher level while most other “computers” would just do the job and never ask questions.

Her inquisitiveness was due to her life long passion for learning from a young age. Her love for mathematics began early. She was ready for school before the the traditional starting age and watched her sibling going off to school with envy. By the time she did go to school, she began her studies in second grade. By age ten, Johnson started high school. Throughout her college years (West Virginia State College), Johnson was inquisitive and immersed in the mathematics program. She honed her math skills, especially in geometry, that would enable her to make her mark in the space program.

After President John F. Kennedy charged the country to send a man to the moon, Johnson became part of the team that did the calculations for space flight. According to information from NASA, her notable achievements include the trajectory analysis for Alan Shepard’s flight in 1959 as well as the trajectory analysis for John Glenn’s orbital mission in 1962 (at Glenn’s request). In Glenn’s flight in 1962, the trajectory was for the first time computed by electronic computer. But at Glenn’s insistence, Johnson was asked to verify the calculations just to be sure everything would work as expected. Johnson was also pivotal in the calculation for the trajectory path for the first actual moon landing in 1969.

Subsequently Johnson also worked on the space shuttle and the Earth Resources Satellite. She had authored or coauthored 26 research reports.

The story of Johnson is an inspiring and a trail blazing figure. Here is a short biographical piece from NASA. Here is a piece about an exhibit that highlights the work of the human computers portrayed in Hidden Figures.

With her achievement in the space program and with her extraordinary life story, she is a natural spoke person for STEM education and careers. She would tell her audience, “We will always have STEM with us. Some things will drop out of the public eye and will go away, but there will always be science, engineering and technology. And there will always, always be mathematics. Everything is physics and math.”

She is a natural spoke person for purpose driven math too.

\copyright 2017 – Dan Ma

Add fractions like an Egyptian


Egyptians and Babylonians around 3000 BC had a different way of working with fractions. The ancient Egyptian way also served a unique purpose that we might not appreciate. For example, in paying 8 farm workers with 5 loaves of Barley bread, how can the farm owner split the loaves into 8 equal shares? In our modern notation, each worker would get \frac{5}{8} loaves. But for the ancient Egyptian farm owner, \frac{5}{8} would be more like the problem than the answer because \frac{5}{8} does not tell him how to divide the loaves of bread.

It turns out that the farm owner could give every worker half a loaf (totaling 1/2 x 8 = 4 loaves) and then give each worker one eighth of the remaining loaf. In modern notation, the thought process of the farm owner is expressed in the following way:

    \displaystyle \frac{5}{8}=\frac{1}{2}+\frac{1}{8}

Another problem that an ancient Egyptian boss might encounter – how to divide 13 loaves of bread among 12 workers? A natural way is to let each worker receive one loaf and then receive one-twelfth of the remaining loaf. If a loaf is small, then making 12 cuts may destroy the loaf. A better way is the following.

    \displaystyle \frac{13}{12}=\frac{1}{2}+\frac{1}{3}+\frac{1}{4}

Each worker would receive half a loaf (for a total of 6 loaves). Then each worker receives one-third of a loaf (for a total of 4 loaves). The remaining three loaves would produce one-quarter of a loaf for each worker.

Any fraction with 1 in the numerator and a positive integer in the denominator is called a unit fraction. In our notation, \frac{1}{3} and \frac{1}{10} are unit fractions. The ancient Egyptians expressed the unit fractions in hieroglyphic notation. For example:


The above image is found in the Wikipedia entry on Egyptian fractions.


Egyptian Fractions

An Egyptian fraction is a sum of several unit fractions where all the unit fractions are different. Thus, \frac{1}{3}+\frac{1}{10} is an Egyptian fraction. The sum \frac{5}{8}=\frac{1}{2}+\frac{1}{8} is an Egyptian fraction that encodes the instruction on how to divide 5 loaves of bread among 8 workers. The sum \frac{1}{2}+\frac{1}{3}+\frac{1}{4} is an instruction on dividing 13 units into 12 equal shares.

Fractions, in the way ancient Egyptians used them, clearly were useful, especially in a barter economy, e.g. exchanging labor for foods or other goods. Thus for ancient Egyptians, sum of unit fractions as a way to express fractions had practical advantages over other forms of expression.

On the other hand, in our modern way of life, we think of fractions differently. Fractions still tell us the relative size of a given transaction or action (e.g. the cost for half a pizza is $5). Arithmetic calculation can be performed in decimal numbers. Thus we do not have a need for the Egyptian form of fractions. However, Egyptian fractions are studied for historical purposes. Egyptian fractions are also an object of study in number theory and are also a source of interesting recreational math problems or puzzles. See here for a very comprehensive look at Egyptian fractions from a mathematical point of view. The Wikipedia entry on Egyptian fractions is also a good resource.

Egyptian fractions can also be a good way to teach fractions to children. Including Egyptian fractions into the lesson plan can provide additional insight. In preparing for this post, I worked out a few Egyptian fraction calculations to get a sense of how it works. I found that dividing loaves of bread among workers is a great way to think about fractions. The dividing of bread as a device can actually be an alternative mechanism for representing and adding fractions.


How to Obtain Egyptian Fractions

In this section, we explain how Egyptian fractions work. For example, given a fraction \frac{T}{B}, we show how to derive an Egyptian fraction:

    \displaystyle \frac{T}{B}=\frac{1}{i}+\frac{1}{j}+\frac{1}{k}+ \cdots + \frac{1}{w}

In fact, for any given fraction \frac{T}{B}, there can be infinitely many ways to express it as Egyptian fractions. Consider the following examples.

Example 1
Let’s say the farm owner pays 2 loaves of bread among 5 workers. How can this be done?

The first step is to give each worker 1/3 loaves for a total of 5/3 loaves. Then 1/3 loaves remain and each worker would get 1/5 of that or 1/15 loaves. The following is the answer.

    \displaystyle \frac{2}{5}=\frac{1}{3}+\frac{1}{3} \times \frac{1}{5}=\frac{1}{3}+\frac{1}{15}

In this scenario, each worker gets 1/3 of a loaf and then gets 1/5 of 1/3 of a loaf. How do we know to start with 1/3? The key is to find the largest unit fraction less than 2/5. Since 2/5 = 0.4, the largest unit fraction in this case is 1/3. \square

Example 2
The problem now is to divide 4 loaves of bread among 5 workers.

Note that 4/5 = 0.8. The largest unit fraction less than 0.8 is 1/2. So distribute 1/2 loaves to each worker for a total 5/2 = 2.5 loaves. Then 3/2 = 1.5 loaves remain, which are divided by 5 workers again, giving each worker 3/10 loaves. The following shows the calculation.

    \displaystyle \frac{4}{5}=\frac{1}{2}+\frac{3}{2} \times \frac{1}{5}=\frac{1}{2}+\frac{3}{10}

The fraction 3/10 is not a unit fraction. We can use the same idea to break it up. Now 3/10 is like dividing 3 loaves among 10 workers. What is the largest unit fraction less than 3/10=0.33? It is 1/4. So give each worker 1/4 loaves for a total of 10/4 = 2.5 loaves. The remaining 1/2 loaves are divided by 10 workers.

    \displaystyle \frac{4}{5}=\frac{1}{2}+\frac{3}{10}=\frac{1}{2}+ \biggl[ \frac{1}{4}+\frac{1}{2} \times \frac{1}{10} \biggr]=\frac{1}{2}+\frac{1}{4}+\frac{1}{20}

The answer is that in distributing 4 loaves to 5 workers, each worker gets 1/2 of a loaf, then 1/4 of a loaf. What remains would be 1/4 of a loaf. From the remainder of 1/4, each worker gets 1/5 of that, meaning that each worker gets 1/20 of a loaf. \square

The method described in the above two examples is called the greedy algorithm. This is because at each step, the algorithm is to (greedily) find the largest share of bread that can be taken away. Then repeat the same greedy process on the remaining portion of the bread. In this process, the series of unit fractions always decrease and eventually the process will stop. Interestingly, the greedy algorithm can be relaxed and still produces an equivalent Egyptian fraction. Thus there are more than one way to obtain Egyptian fractions. The following example shows how it is done.

Example 3
Let’s repeat Example 2 but this time do not take the largest unit fractions less than the given fraction at each stage.

In the first step take 1/3 away from 4/5 (instead of taking 1/2). In the bread analogy, each worker gets 1/3 loaves for a total of 5/3 loaves. Then 7/3 loaves remain, which are shared by 5 workers.

    \displaystyle \frac{4}{5}=\frac{1}{3}+\frac{7}{3} \times \frac{1}{5}=\frac{1}{3}+\frac{7}{15}

To break up 7/15, the greedy algorithm would take 1/3 since 1/3 is the largest unit fraction less than 7/15 = 0.467. For illustration, we take 1/5 away from 7/15 for a total of 15/5 = 3 loaves. Then the remaining 4 loaves are shared by 15 workers.

    \displaystyle \frac{4}{5}=\frac{1}{3}+\frac{7}{15}=\frac{1}{3}+ \biggl[ \frac{1}{5}+4 \times \frac{1}{15} \biggr]=\frac{1}{3}+\frac{1}{5}+\frac{4}{15}

Next, divide 4 loaves of bread among 15 workers. The largest unit fraction less than 4/15 = 0.267 is 1/3. But we take 1/6 away. Each of 15 workers get 1/6 loaves for a total of 15/6 = 2.5 loaves. Then 3/2 = 1.5 loaves remain, which are shared by 15 workers.

    \displaystyle \frac{4}{5}=\frac{1}{3}+\frac{1}{5}+\frac{4}{15}=\frac{1}{3}+ \frac{1}{5}+\biggl[ \frac{1}{6}+\frac{3}{2} \times \frac{1}{15} \biggr]=\frac{1}{3}+\frac{1}{5} +\frac{1}{6}+\frac{1}{10}

The answer is that each worker gets 1/3 of a loaf, 1/5 of a loaf, then 1/6 of a loaf and finally 1/10 of a loaf. Of course, this division would be equivalent to the one in Example 2 (i.e. the same of amount of bread for each worker). \square

Example 3 shows that there are more than one way to express a fraction in Egyptian fractions (in fact, infinitely many ways as shown below). Example 3 also shows that if the greedy algorithm is not used (if the largest share of bread is not taken away at each step), it is possible it will take more steps to distribute the bread.

Example 4
We now demonstrate that an Egyptian fraction can be expressed in an infinite number of ways. The example we use is

    \displaystyle \frac{5}{8}=\frac{1}{2}+\frac{1}{8} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  (1)

We demonstrate that starting with (1), we can derive infinitely many new (but equivalent) Egyptian fractions. The following relationship will be useful.

    \displaystyle 1=\frac{1}{2}+\frac{1}{3}+\frac{1}{6} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*)

The idea is that we replace the last term in a given Egyptian fraction with three new terms using relation (*). The last term in the first Egyptian fraction (1) is \frac{1}{8}. So multiply (*) by \frac{1}{8} to obtain

    \displaystyle \frac{1}{8}=\frac{1}{8} \biggl[\frac{1}{2}+\frac{1}{3}+\frac{1}{6}  \biggr]=\frac{1}{16}+\frac{1}{24}+\frac{1}{48}

which then replaces the \frac{1}{8} in (1) to produce

    \displaystyle \frac{5}{8}=\frac{1}{2}+\frac{1}{16}+\frac{1}{24}+\frac{1}{48} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

The last term in (2) is \frac{1}{48}. Multiply (*) by \frac{1}{48} and plug the result back into (2) to obtain

    \displaystyle \frac{5}{8}=\frac{1}{2}+\frac{1}{16}+\frac{1}{24}+\frac{1}{96}+\frac{1}{144}+\frac{1}{288} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

By continuing the same process on replacing the last term, the fraction \frac{5}{8} can be expanded to include more and more unit fractions. By performing the same process, any Egyptian fraction has infinitely many different expressions as Egyptian fractions. \square


Rhind Papyrus

Both of the references cited above (links are repeated here and here) mention the Rhind papyrus. Both have an image of a portion of the papyrus. Rhind papyrus was named after Henry Rhind, the Scottish antiquarian who purchased it in 1858 in Luxor, Egypt. It dates to around 1650 BC and is one of the best known original documents on Egyptian mathematics. A table for Egyptian fractions \frac{2}{n} is found in the early part of the papyrus along with problems and puzzles. The table lists \frac{2}{n} as sum of unit fractions for all odd values of n from 3 to 101.

As observed earlier, Egyptian fraction representation is not unique. It is clear that the author of the papyrus preferred some forms over the others. For example, \frac{2}{13}=\frac{1}{8}+\frac{1}{52}+\frac{1}{104} is found in the papyrus while the simpler form \frac{2}{13}=\frac{1}{7}+\frac{1}{91} is not. The scribe seemed to favor unit fractions with even denominators, possibly to make multiplication and division easier.

Some of the \frac{2}{n} fractions in the Rhind papyrus are derived from smaller calculations. For example,

    \displaystyle \frac{2}{81}=\frac{1}{54}+\frac{1}{162}

is found in the papyrus. Since 27 x 3 = 81, \frac{2}{81} is obtained by multiplying \frac{2}{3}=\frac{1}{2}+\frac{1}{6} by \frac{1}{27}.

    \displaystyle \frac{2}{81}=\frac{1}{27} \times \frac{2}{3}=\frac{1}{27} \biggl[ \frac{1}{2}+\frac{1}{6} \biggr]=\frac{1}{54}+\frac{1}{162}

Another example, \frac{2}{87}=\frac{1}{58}+\frac{1}{174} is obtained by multiplying \frac{2}{3}=\frac{1}{2}+\frac{1}{6} by \frac{1}{29}. Thus, whenever the denominator is a composite number, the given fraction can be obtained from a smaller Egyptian fractions.

According to the greedy algorithm, the fraction \frac{2}{2m+1} can be expressed as:

    \displaystyle \frac{2}{2m+1}=\frac{1}{m+1}+\frac{1}{(m+1) \ (2m+1)}

Once the formula is known, the entire \frac{2}{n} in the Rhind papyrus can be obtained easily. However, the creator of the papyrus chose not to use the above short and seemingly convenient form of Egyptian fractions. We do not know fully why. One reason may be that using the above form would require large denominators in the second unit fraction. All the denominators in the \frac{2}{n} table are no larger than 999. Thus \frac{2}{83} is

    \displaystyle \frac{2}{83}=\frac{1}{60}+\frac{1}{332}+\frac{1}{415}+\frac{1}{498}

rather than

    \displaystyle \frac{2}{83}=\frac{1}{42}+\frac{1}{3486}

Deriving at the first form of \frac{2}{83} would require more work. After making some calculations on Egyptian fractions on my own, I cannot help but have admiration for the person who made the calculations that appeared in the Rhind papyrus. I have help in the form of a hand held calculator. They would have to go through a lot of trials and errors to get at the results.


A Number Theory Problem

Egyptian fractions are a source of interesting problems in the mathematical area of number theory. One example is the Erdos-Straus conjectures, which states that for any positive integer n \ge 2, the fraction \frac{4}{n} can be expressed as a sum of unit fractions as follows:

    \displaystyle \frac{4}{n}=\frac{1}{x}+\frac{1}{y}+\frac{1}{z}

Another way to state the problem is that the above equation always has solutions in positive integers x, y and z.

Based on the above discussion, the fraction \frac{4}{n} can always be expressed as a sum of unit fractions (just use the greedy algorithm). But the result may have just two terms or more than 3 terms. The requirement is that it is a sum of three unit fractions (not necessarily all different). Example 3 shows that \frac{4}{5}=\frac{1}{2}+\frac{1}{4}+\frac{1}{20}. The question is, can something similar be done for \frac{4}{n} for all other values of n? Computer calculations had shown that \frac{4}{n} can be expressed a sum of three unit fractions for all n \le 10^{17}. At this point, there is no answer for values of n above 10^{17}.


Concluding Remarks

Egyptian fraction is a fascinating topic in many respects. The concept has a practical significance for Egyptians and Babylonians of 3000 BC, e.g. to address the division of wages. Though the practical reason is no longer relevant for us, learning about Egyptian fractions does add to our understanding of fractions. Thus Egyptian fraction is a great educational tool for teaching fractions. There is also the historical connection. Learning about Egyptian fractions makes the history real and vivid. Knowing how to express \frac{2}{5} or \frac{2}{7} as sum of unit fractions, for example, can in a sense transport us back to the time the scribe wrote on the Rhind papyrus. We would be doing the same math problems they were doing in ancient Egypt. We are far apart from the Egyptians of 3000 BC in time and space and also in culture and languages. Yet their concept of fractions, though different in usage, are accessible to us.

The title of this post could also be “Add fractions like an ancient Egyptian” or “how ancient Egyptians add fractions with a purpose.” One thing is clear. The concept opf fractions is purpose driven math for ancient Egyptians. It should be for us too.



The \frac{2}{n} table in the Rhind papyrus is a good source for exercises to practice Egyptian fractions. The table contains fractions for \frac{2}{n} for values of odd integers n from 3 to 101. See if the Egyptian fractions you come up with agree with the ones in the papyrus (see the table in the first two links below).

Another good source of practice problems is through the Erdos-Straus conjecture discussed above. The goal is to express \frac{4}{n} as a sum of three unit fractions. Note that \frac{4}{n} can be expressed as a sum of unit fractions using the greedy algorithm. But the results may have more than three terms. So the challenge is to look for three terms.



The links indicated above are repeated here. The discussion in this post is a simple and basic mathematical introduction. The first link below is a particularly good resource, providing more details and insight.

\copyright 2017 – Dan Ma

What is purpose driven math?

The following is an adorable math cartoon that I found recently.


\text{Roz Chast:}

The cartoon depicts a shop for math related products and services. If there ever was such a shop, some of the advertised services are obsolete. Square root extraction was offered as a bonus for purchase of $50 or more. With computing devices so readily available (hand-held calculators, smart phones, etc), there is surely no need for square root extraction. There surely is no need for arithmetic services for anyone who knows how to punch buttons in a pocket calculator (or tab the screen of a device). I really got a chuckle with “fresh prime numbers daily!”

It turns out a daily supply of fresh prime numbers is not just something only mathematicians care about. Internet commerce depends on a daily fresh supply of prime numbers. In fact, a fresh daily supply of prime numbers that are several hundred or even thousands digits long.

Whenever you enter a credit card number on the Internet, it would be very risky for you if the credit card number is sent over the Internet without masking (encryption). Then other people can read it and make use of it without your authorization. It would be like shouting secrets in a crowded room. Before it is sent, the credit card number is encrypted. Once it gets to the destination (the online merchant), the credit card number is decrypted. One of the most commonly used encryption/decryption schemes is the RSA algorithm, which is built on prime numbers.

Every time a piece of sensitive information, e.g. your credit card number, is sent over the internet, the RSA algorithm encrypts the information using a public key, which consists of a large number N that is a product of two large prime numbers p and q (note that the two prime numbers p and q are not known to the public, and that only N is). Once the encrypted information arrives at the merchant, it is decrypted using a private key, which consists of the two individual prime numbers p and q. The security of the RSA algorithm rests on the difficulty of uncovering p and q if only the product N is known. To get a better sense of how RSA works, see this article from Slate. See here for another introduction to RSA algorithm.

For RSA algorithm to work properly and securely, a fresh daily supply of prime numbers is necessary. What is purpose driven math? Due to its wide spread use in online commerce, the study of prime numbers would be called purpose driven math.

It is interesting to point out that prime numbers were not always regarded as purpose driven math. Euclid proved that there are infinite number of prime numbers (circa 300 BC). Prime numbers had a special hold on humanity ever since, even though they were studied more in a “math for math’s sake” kind of way. Many people through the ages found prime numbers interesting and important, but never as a way to enable commerce. It is not always easy to tell ahead of time whether a given mathematical construct is purpose driven math. Thus what is considered esoteric math today may become purpose driven math tomorrow.

There are actually many other mathematical theories and notions that were not meant to be purpose driven at the beginning but were found to be useful later. Some people disdain (or at least do not value) mathematical theories that have no practical purpose. I have a friend who always ask “what is the use of this?” whenever I mention a math concept in a conversation. There are mathematicians who disdain mathematics that have a practical purpose. The notion of prime numbers is a bridge between these two extreme camps. As an abstract mathematical theory, prime numbers are beautiful and interesting. On the other hand, prime numbers make it possible to transmit sensitive information. In this blog, I will discuss bridge concepts such as prime numbers and other purpose driven math.

This blog is a small attempt to dispel the notion that math is useless or meaningless. I hope that learning purpose driven math will be an antidote to math phobia or even math hatred. I have no illusion that the blog will be a cure all for math phobia. I hope that some readers will come away with a sense that there is more to math than the math taught at school. There is purpose driven math and there is school math. For many, school math is fear and anxiety inducing. Many students do not like math but have to endure it because schools require it. This blog is to show that there is another face of math.

I hope that those who hate school math will hate math less (or tolerate more of it) after knowing more about purpose driven math through this blog or other resources. School math is required in order to get through school. But school math also gives the necessary background to understand and use purpose driven math. For example, to learn how money grows in a bank account, we need a tool that measures a growth phenomenon that doubles over a certain period of time (e.g. at 10% annual interest rate, the money doubles approximately every 7 years). This is called exponential growth and the tool for measuring this type of growth phenomenon is called the exponential function, which is taught in high school.

Whenever a teacher mentions anyone of these terms or phrases – solving for x, the slope of a line, quadratic equations, solving an inequality, fractions and sines and cosines and many more – some students are immediate turned off. I realize that sometimes the way math is taught at school is not very effective. Purpose driven math is another side of math that is often omitted when school math is taught. Math is not only useful, it is one of the ways through which we make sense of the world. Thus mathematics is on par with other worthy endeavors such as science, literature, philosophy, art and music.

\copyright 2017 – Dan Ma