Imagine you went out to dinner with a group of nine friends. You enjoyed the dinner a lot, talking about sports and silly things. At the end of the dinner, your group gets the check. Below you see the value of the meal that each person ate. The question is: how much should each person pay?
Since you ate dinner with close friends, it would be impersonal to have each friend pay their bill separately. So, all ten of you decide to share the cost; but then, how do you share the cost? Surely, there is more than one way to split the bill. Whenever there is more than way to do something, you may estimate.
Estimating is not easy; in fact, statisticians dedicate their entire lives to this. It isn't easy to figure out methods for estimating answers to problems whose answers aren't clearly defined.
While some methods of estimation are very advanced, others like the arithmetic mean and the median are mathematically simple. These methods often provide very good estimations of the central tendency. This means they help to figure out what is happening in the middle of your data (observations). In case of an arithmetic mean, you give each observation an equal weight. In your case, this implies that each friend shares the cost equally.
To find the arithmetic mean, you take all the observations and add them up. In you and your friends' case, that would be the total bill: $319. Then, you divide it equally among each of the ten friends:
Consider this plot showing the arithmetic mean in orange and the cost of each person in blue:
What do you notice? If everybody paid the mean value of $31.9, would that be fair to everybody? From the data, you see that 7 out of the 10 friends spent less than the mean. Should an estimate be considered fair if 3 friends benefit from sharing while 7 friends are paying more than their cost of the meal?
So you try something else. Instead of dividing the cost equally among each friend (using the mean), you consider using the middle number. This is called the median. Since you will look at the middle number, it only makes sense to first arrange your costs from the lowest to highest. Then, you would cross off each number from the furthest ends until we reach the middle. However, in your example, there are two numbers in the middle (see below) so you take a value in the middle of the middle two numbers. The middle of $20 and $22 is $21. That is the median.
Now this plot shows the median in grey and the cost of each person in blue:
Walla! I think you might have found a good estimate of the central tendency. In the graph, you notice that $21 does a very good job at estimating the central tendency. This seems fair since half of your group benefit by paying less, and the other half ends up paying a bit more than what they ordered.
But only if the job of a statistician was that simple! If you and your friends chose the median as the best estimate, you would come short $109 from the total bill because you would only be able to pay $21 x 10 = $210. Opting the median as the best estimator might make all the friends happy; though unfortunately, it won’t help pay the restaurant bill. So in summary:
The median may seem like a robust estimate of the cost,
but don't count on it for splitting bills.
Back to the arithmetic mean; why does it do a poor job at estimating fairly in our example? Put differently, how would a statistician know when an arithmetic mean does a poor job at estimation? In your example, the mean is effected greatly because Hannah’s meal cost the most. At $120, she spent about three times as much as Jennifer, who spent the next highest amount ($38). For example, if Hannah’s meal was to cost $40, the mean cost would be $23.9 which is very close to median: $21.
Unfortunately for you and your friends, Hannah’s meal doesn’t cost $40 but rather $120. Statisticians refer to Hannah's meal as an outlier, which is a number located far away from the bulk of the observations. It could be either very high or very low in comparison to the rest. At your dinner table, Hannah’s meal costs way more than any of the friends. Large outliers often cause over or under estimation of the central tendency by influencing the arithmetic mean. This is one of the reasons that statisticians sometimes don't use the mean as an estimator and instead use the median.
Food for Thought: After the dinner, Thomas, Azam, you, Ebby, and Wassim were a little unhappy about splitting the bill equally. You didn't like how far off the mean was from the value of your own meal. The five of you agree that a fair estimate for the bill would be around $25. Since Hannah's meal was the most expensive, you wonder: what is the most Hannah could have spent, so that mean would not be more than $25?
To help you find the answer, you may find this formula helpful: