Central tendency measures summarize data using a single value that represents the center or middle of the distribution of the data points. Central tendency measures are fundamental principles of statistics. The three major central tendency measures are the mean, the median, and the mode. Each of these measures expresses the central tendency of a data distribution and, depending on the data, has relative advantages and disadvantages.
Central tendency measures have several important functions. In addition to defining common or typical value(s), they can indicate whether individual values in a dataset are unusual or extreme outliers. For example, a central tendency measure can tell students whether their test scores are typical—similar to those of the majority of other students—or exceptionally good or very poor in comparison to the scores of other students.
The mean, also called the arithmetic average or arithmetic mean, is the sum of all the data points or observations in a dataset divided by the number of data points or observations. For example, a dataset of the ages of students entering college might have the following frequency distribution:
The sum of all the data points in the example is 392, and there are 21 data points, so the mean is 392 divided by 21 or 18.67 years. The mean is the most commonly used central tendency measure. It is useful for both continuous and discontinuous or discreet numeric data. It cannot be used for categorical or non-numerical data, in which the values cannot be summed. Furthermore, because the mean includes every data point in the distribution, it is influenced by skewed distributions and outliers—unusual values or values that are far removed from the central tendency.
The median is the value at the center of a distribution that divides the distribution such that half of the values are equal to or below it and half are equal to or above it. If a dataset has an odd number of data points, as in the above example (21), the median is the value that has the same number of points above and below (18 in the example). If the distribution has an even number of data points, the median is the mean of the two center values. In the above example, if there were only five students aged 18, for a total of 20 data points, the median would be the mean of the two middle values of 18 and 19 or 18.5 years.
The median is a commonly used central tendency measure. It is usually the preferred measure when a distribution of data points is not symmetrical. This is because the median is less affected by outliers or skewed data than the mean. The median is not useful for values that cannot be logically arranged in an ascending or descending order.
The mode is the value or data point that occurs most frequently in a dataset. In the previous example, the mode is age 18, because this is the most commonly occurring value. The mode is useful because it can express both numerical and non-numerical or categorical data. However, for some datasets, the mode may not accurately express the center of the distribution. In the prior example, the center of the age distribution is 19, but the mode—age 18—is lower.
A dataset may also have more than one mode, known as bimodal or multimodal. This can occur if two or more categories have the same frequency; for example, if there were five students aged 17 and five aged 18. In such cases, the mode may not describe the central tendency or typical value of the distribution as accurately. In other cases, especially if the data is continuous, there may not be a mode; for example, if all of the values are different. In such cases, the mean or median may be a more appropriate central tendency measurement. Alternatively, the data might be grouped into intervals to determine the mode of each interval or class.
With a symmetrical distribution, the mode, median, and mean are all at the center of the distribution. With a skewed distribution, as in the previous example, the mode is still the most commonly occurring value (18 in the example), and the median is still the center value of the distribution (18 in the example); however, the mean is usually skewed in the direction of one of the tails of the distribution (18.67 in the example, because the distribution is skewed toward older students). Therefore, with a skewed distribution, the median is often a better central tendency measure, because the mean is not usually at the center of the distribution. In this example, the distribution is positively skewed or right-skewed, because the upper or higher side of the distribution is longer or larger, and the mean is higher. In a positively skewed distribution, most of the values, including the median, are often, but not always, less than the mean. When a distribution is negatively or left-skewed, the lower side of the distribution is longer. In general, with negatively skewed distributions, most of the values, including the median, tend to be greater than the mean.
Cohen, Ronald Jay, Mark E. Swerdlik, and Edward Sturman. Psychological Testing and Assessment: An Introduction to Tests and Measurement. 8th ed. New York: McGraw-Hill, 2013.
Coolidge, Frederick L. Statistics: A Gentle Introduction. Thousand Oaks, CA: SAGE, 2013.
Salkind, Neil J. 100 Questions (and Answers) About Statistics. Los Angeles: SAGE, 2015.
Spiegel, Murray R., and Larry J. Stephens. Statistics. 5th ed. New York: McGraw-Hill, 2014.
Lane, David M. Online Statistics Education: A Multimedia Course of Study. http://onlinestatbook.com (accessed June 25, 2015).
“Statistical Language—Measures of Central Tendencies.” Australian Bureau of Statistics. July 3, 2013. http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+measures+of+central+tendency (accessed June 25, 2015).
American Statistical Association, 732 N. Washington St., Alexandria, VA, 22314-1943, (703) 684-1221, (888) 2313473, Fax: (703) 684-2037, email@example.com, http://www.amstat.org.