Statistical Consultants Ltd
The German Tank Problem
Statistical Analysis Techniques, War
During World War 2, the Western Allies used a simple formula to estimate the rate at which German tanks were being produced, based on the serial numbers obtained from captured and destroyed tanks.
The formula is the following:
where is estimated total number of objects (e.g. German tanks)
m is the highest sampled serial number
n is the sample size (e.g. the number of captured/destroyed German tanks)
For example, let’s say 10 tanks were captured/destroyed, and the following serial numbers were obtained:
117, 232, 122, 173, 167, 12, 168, 204, 4, 229
The highest serial number obtained was 232, therefore m = 232.
It so happens that these 10 serial numbers were drawn randomly from a (rounded) uniform distribution with minimum 1, and maximum 255.
How well the formula performed
The formula performed much better than the conventional intelligence estimates. Conventional intelligence estimates were based on counting the number of tanks on the battlefield and by secretly observing factories.
Through conventional intelligence it was estimated that the Germans were producing around 1400 tanks per month, from June 1940 to September 1942. The statistical estimate was 246 tanks per month. After the war, German production figures showed the actual number to be 245.
Estimates for some specific months:
The statistical estimates were useful because they gave the Allies an idea of whether or not an attack on the western front could succeed.
This formula can be applied to other things with serial numbers. For example, with serial numbers gathered through online discussions, the same formula was used to estimate the number of iphones sold. It was estimated that Apple had sold around 9.1 million phones to the end of September 2008.
To comment on this or other blog posts, join the Statistical Consultants Ltd Facebook page or Google+ page. Suggestions for blog posts are welcome.
|Copyright © Statistical Consultants Ltd 2010|