Measures of Central Tendency: Mean, Median & Mode
What is Central Tendency and why do we use it?
Let’s say you made 1400 sales of ice creams in
one year, however, you being smart, you sold the ice cream at different prices
depending on the season. Now at the end of the year, you want to calculate how
much you earned on an average so you can make various other calculations such
as how much money you make per month or year and how much would you like to
increase this number by in order to hit your goal. This is where you come up
with your central values if you have plotted all the sales figures and the money
earned.
Technically, central tendency is the
conclusion for a data set describing its central value or key focus area which
we can use to find out more advanced information. Central tendency tells us
where most values fall in the distribution of data points, specifically over
the plot of sales figures from the example earlier. To find out this central
tendency, statistics uses three measures namely Mean, Median and Mode - each
with a different formula and purpose behind identifying where these central
tendencies lie and how they could be used going forward.
Different measures of Central Tendency
Mean:
Mean is nothing but the simple average that we calculate in basic mathematics
or statistics. It is the sum of all data points divided by the total number of
data points. This gives us the quantifiable measure of central tendency, or in
other words, an arithmetic average of all the values across the data set.
While this value becomes the highlight of this
measure, more often than not, it exposes the definition of central tendency
when it doesn’t always locate the central value in a data set. For example: (10
+ 1 + 1 + 1 + 1) / 5 = 2.8 which is nowhere close to the center of the axis (5)
where the highest value is 10.
Note: There is another sub-classification of
mean such as geometric mean as well. Geometric mean is used in functions where
variables are multiplied instead of addition. For example rate of interest, or
data that follows lognormal distribution. This multiplication is followed by
the root of the number of variables used to multiply.
Median:
Median is that measure of central tendency which ensures that the data points
are arranged in an absolutely balanced way. Put simply, think of it as a weight
balance when both sides are equally distributed. And that is exactly what a
median does, it arranges the data in ascending or descending order and takes
the value of the one that lies in the middle.
Difference between mean and median
This gives us the central tendency that is not
sensitive to changes in value like we saw in the mean. For example, let’s take
the median of five numbers (10, 20, 30, 40, 50) which comes out to be the third
number, that is 30. Now if we change any of the two numbers of either side to
large values like (10, 20, 30, 100, 1000) or (-1000, 1, 30, 40, 50), the median
still remains the same. Hence, median is a better measure than mean when it
comes to extreme values in outliers or skewed distributions.
Mode:
Mode is another commonly used measure of central tendency which tells us about
the highest occurring data point. Mode can be calculated by identifying the
frequency of data points, sort of by categorizing the data set based upon its
repetitions. Hence, mode is the most popular measure for finding the central
tendency of categorical data, and in fact, the only type.
The value of mode is easily noticeable on a
bar chart since it is the highest value. If a data set has multiple values with
highest frequency, then both data points get the joint credits for mode in what
is called a multi-modal distribution. Conversely, if no value returns the
highest frequency, the data set does not have a mode.
Conclusion:
For various kinds of distributions, we have
different measures of central tendency. For a symmetrical distribution, you
will notice that the mean, mode and median are at the heart or at the center of
the distribution. However, for a data set skewed to the right or left, the mean
shifts to the either side as well, leading to difficulty in finding the central
tendency of the distribution effectively. This is where median comes handy.
As mentioned earlier, mode proves its
uniqueness when calculating the central tendency for categorical data such as
different flavors of ice cream. Another special case occurs when calculating
the central tendency for a continuous data set when there is no mode for it. We
can still deduce the frequency by locating the maximum value on a probability
distribution plot.
There is a different measure of central
tendency for every different data set or distribution. We’ll sum these up
quickly below.
Mean: Symmetrical distribution, continuous
data sets.
Median: Skewed distribution, Continuous data
set, Ordinal data set
Mode: Categorical data, Ordinal data set,
Probability distribution
Measures of central tendency are key to
finding out measures of variability, and diving deeper into statistical
analysis which forms a core of Data Analysis.
If you’re interested in picking up these subjects or developing an aptitude for
the same, you can enroll for the Data Science
and AI course at Skillslash
which also offers a unique opportunity of real work experience at top MNCs. Get
in touch with one of our counselors today by visiting https://skillslash.com/data-science-course-in-delhi
Comments
Post a Comment