Statistics for Data Science: Measures of Dispersion
What is dispersion? Dispersion in statistics is the measure of how far the data points stretch out or spread out from a certain point of reference. Usually, this certain point is the mean, one which gives us a measure of a data set’s central tendency. This is why it becomes a good reference point to calculate the ]dispersion or distance from the central point, so we can work on the data using a specific criteria. For example, let’s assume the average of all students’ scores in a class is 70%. From this overall average, now we can calculate the dispersion for each of the student scores. In other words, this stat can tell us how far behind or ahead a student is compared to the class average. In Data Science, where we deal with enormous amounts of data, measures of dispersion become the primal starting point before we start analysing data on a deeper level. The measures of dispersion commonly used are Range, Interquartile range, Standard Deviation, and Variance. Let’s look into what the