Statistical Consultants Ltd


Benford's Law Explained

Statistical Theory, Accounting, Forensics / Fraud Detection

Date posted:
2 July 2011

This post continues from the previous post: Benford’s Law and Accounting Fraud Detection.
There are several ways of explaining why Benford’s law works, including the following:

Multiplying Distributions

Benford’s distribution becomes more prominent when data is taken from a mixture of several distributions, with values from those distributions multiplied together, resulting in right skewed data.
In accounting, the types of data which often conform to Benford’s Law involve a multiplication of two variables i.e. prices multiplied by quantities.

Theorem: Let X be a random variable with a continuous probability density function.  There exists alpha star such that for all alpha gte alpha star:
X transformedconforms to Benford’s Law (for any value of scaling parameter sigma)

The following animated gif shows a distribution and its corresponding first digit distribution as the exponent used to transform the data increases.  The data happens to be 10,000 observations randomly generated from a normal distribution of mean 100 and standard deviation 10.

Benford's Law animated gif

Exponential Growth Processes

This explanation is similar to the explanation for the rank-size rule of city populations.

For many data sets that conforms to Benford’s Law (especially accounting data due to inflation, money supply expansion, interest etc), the figures tend to follow Gibrat’s law of proportional growth i.e. the figures grow by amounts proportional to the size of the current figure. 
Over time, a variable (such as a reoccurring expense) would spend more of its time with lower first digits, than higher first digits.  For example, lets say the variable x starts with a value of 10.  The variable would spend more time being:

x gte 10 and lt 20 than x gte 20 and lt 30

and more time being:
x gte 20 and lt 30 than x gte 30 and lt 40
… and so on.

Once the variable reaches the hundreds, it would spend more time being:

x gte 100 and lt 200 than x gte 200 and lt 300

and more time being:
x gte 200 and lt 300 than x gte 300 and lt 400
… and so on.


Overall, lower first digits would be more common than the higher first digits.


Scale Invariance

While the previous explanations show what kind of data would conform to Benford’s Law (and how it comes about), scale invariance explains how the formula for Benford’s distribution can be derived.
If the first digits of some large data set conform to a particular distribution, then the distribution should be independent of the data’s units of measurement.  For example, changing the units of measurement from dollars to yen shouldn’t change the distribution of digits (it would likely change many individual first digits but not their overall distribution). 

Rough Derivation:
Let’s say that x has a scale invariant distribution. 
If x is scale invariant then:

This can make the derivation simpler by allowing the assumption that:
x gte 1 and lt 10

If x is scale invariant, then multiplying x by a constant won’t change its distribution.  Multiplying x by a constant, is equivalent of adding a constant to log(x).  Since x is scale invariant, then adding a constant to log(x) wouldn’t change the distribution of log(x).  The only probability distribution that doesn’t change when a constant is added, is the uniform distribution.  This would mean:

probablity of logx equals 1

Since x gte 1 and lt 10 and probablity of logx equals 1, then:

probability of d=11

where y equals logx

Or more generally:

Probability of d=n

where y equals logx

See also:

Benford’s Law and Accounting Fraud Detection
The Rank-Size Rule of City Populations


Copyright © Statistical Consultants Ltd  2011