Introduction
- Standard Deviation the most widely used method to measure the Dispersion of a set of data. It measures the deviation of elements of a population or dataset from its mean.
- Standard Deviation is calculated as the positive square root of the mean of squared deviations from the mean
Chebyshev’s Inequality
- According to Chebyshev’s inequality, for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1.
- Let us understand the above statement,
- Mean or the average of a given dataset calculated by dividing the sum of all elements of the dataset by the total number of elements in the dataset.
- Mean = (Sum of dataset elements)/ Total number of elements in the dataset
- Standard Deviation of a dataset is obtained by using the following formula:
- Population Standard Deviation =
- Where D is the difference between the elements of a population and its mean
- And n is the population size
- Let us take k = 2, then,
- As per Chebyshev’s inequality, 1- ¼ = ¾ or 75% of elements should lie in the range of k standard deviation i.e. 2 standard deviations
Illustration 1: or a dataset A if Mean is = 10 and Standard Deviation is 2, then 75% of the elements of the dataset A should lie between 10 ± 2 i.e. between 8 and 12.
- For different values of k, we can get different data that how much percent element of a dataset should lie in the range of k Standard Deviations.
Example. 1:
- Let us take a sample: 2,5,8,10,12,15,18, verify Chebyshev’s inequality for k=2.
- Solution: For k =2, around 75% of dataset elements should be in the range of Mean ± 2* Standard Deviation i.e. 10 ± 11.134 i.e. -1.134 to 21.134
- In our sample dataset, all the elements lie in the above range
- Hence, Chebyshev’s Inequality that at least 75% of elements should fall in this range is verified.