How often do you think about averages? In my case, there isn't a day that goes by in which I don't think about some sort of an "average." However, I seldom have a specific mathematical concept in mind. Instead, I think about the average representing a "typical value".
For example, I believe that "on average" I spend a total of 45 minutes a day on phone conversations. On some days I spend more than 45 minutes and on other days less. Then as a measure of central tendency, 45 describes the typical value.
Clearly, if I would actually measure how long I speak per day over a few days, then I could also calculate this average. How would I do that? The common way is the arithmetic mean where I would sum up the daily measurements and divide by the total number of days.
OK, so we understand arithmetic means. Though, as you may know, there are other common accepted ways of describing "typical values". For example, as we explored in an earlier blog post, the median is also a very popular measure. Then there are related terms such as weighted averages, truncated means and mathematical expectations.
However, let's return to the arithmetic mean - sometimes just called "the mean", "the sample mean" or "the sample average". It turns out that it has two related friends - far less popular: the geometric mean and the harmonic mean.
All three means, the arithmetic, geometric and harmonic are generally called Pythagorean means. This is due to geometric reasons arising in the case of two data values. We won't go into the geometry of it (see this brief MathWorld description if you are interested). Instead, let's ask:
When is each of the Pythagorean means best used as a "typical value"?
Before we introduce (or review) the way we calculate each of these means, let's consider three distinct scenarios that require a typical value, denoted by A.
(i) Total calculations: Say you are about to carry N items and wish to estimate the total weight you will carry. In this case, if A is the average item weight, then clearly,
(ii) Return on investment calculations: Say you invest K dollars in a savings account for a duration of N years. If the average growth rate per year is with a factor of A then after N years you have:
This is because after the first year, you have K x A and after two years you have K x A x A, etc. Note that if there is an interest rate of r involved, then A = 1+r. For example with a 5% interest rate, A = 1.05.
(iii) Distance-time-speed calculations: Say you are traveling for a distance of N kilometers at an average speed of A km/hour. In this case, the time travelled is:
In each of the cases (i) - (iii), the average value, A, signifies a typical quantity. That is a typical weight in case (i), a typical growth factor in case (ii) and a typical speed in case (iii). Indeed, if all we know is A, then we would probably use the above formulas without trouble.
However, what if we are actually given data matching each of these scenarios? How would we compute the desired average A, matching the data? For example, what if N = 4 and we were given,
Keep in mind, the meaning of of these data values depends on the context:
For case (i) these values represent 4 individual weights, say in kilograms.
For case (ii) these values represent growth factors for each of the 4 periods. For example, the first value of 1.05 represents 5% growth and the second value represents 15% decay.
For case (iii) let's assume that these values represent the speeds taken over consecutive kilometres when traveling for a total of 4 kilometres. That is the first value represents a speed of 1.05 km/h during the first kilometre of motion, the second value represents a speed of 0.85 km/h during the second kilometre and so on.
These are quite slow speeds, so assume we are walking over tightrope. Four kilometres of balancing -Yikes!
Clearly, depending on the scenario, the numbers have very distinct meanings. Then how could we compute an average value A? Exploring the three Pythagorean means will help us get some context.
Arithmetic mean: This is simply the sum of the values divided by the total number of values:
Geometric mean: This is the N'th root of the product of values. For example, if there were only two values, you would multiply them and take their square root. In our case, N = 4 and hence:
Harmonic mean: This is the reciprocal of the arithmetic mean of the reciprocals. That is, instead of each value, consider its reciprocal, sum these up and divide N by that sum. For N = 4 we have:
You can observe that the means are not exactly the same. In-fact, the following inequality always holds:
OK, so we know how to calculate these means. So what?
Why ever use the geometric or harmonic mean? Why not just stick to the arithmetic mean?
To answer this, let's return to the cases (i), (ii) and (iii) above. Ideally, we would like a "typical value" to correctly summarise the information found in the individual values. Follow the calculations below to see how this plays out:
Case (i): Try and use the arithmetic mean:
That seems to work well! As desired, the total weight is exactly reproduced.
Case (ii): What type of mean should we use in place of A? It turns out that the arithmetic mean doesn't do a good job. Instead try the geometric mean:
Hence by using the geometric mean, we have found the exact return over the period of 4 years. Numerically, notice that
As you can see, starting with K dollars and we end up with 1.6065 x K dollars after 4 years. This is an average growth factor of 1.1258 per year, or about 12.6% per year. Note that it would have been wrong to use the arithmetic mean, implying 15% per year.
Case (iii): Here speed varies over each kilometre of travel. If we would just average out the speeds using the arithmetic mean yielding 1.15 km/h, using it in the time = distance/speed formula would yield an inaccurate result. Instead, try the harmonic mean:
This is the desired quantity because
Try out the numbers and see that the total time is 3.6289 hours. This is significantly different than the value of 4/1.15 = 3.478 hours obtained by (wrongfully) using the arithmetic mean. Can you think of other contexts where the harmonic mean is useful?
So in summary, there are different ways of finding "typical values" depending on the context. In practice, the arithmetic mean is the most common way. However, for specific contexts involving growth rates, the geometric mean works better and for contexts involving rates or speeds, the harmonic mean does the job perfectly.
A related post describing similar ideas aimed at data-scientists is here. Have you ever used the geometric or harmonic mean? Let us know how.