TutorChase logo
Decorative notebook illustration
IB DP Maths AA HL Study Notes

4.1.2 Measures of Dispersion

Understanding the spread or variability of a dataset is as crucial as understanding its central tendency. Measures of dispersion provide insights into this spread, indicating how data points deviate from the central value. The primary measures of dispersion include variance, standard deviation, range, and interquartile range.

Variance

Variance quantifies the average squared deviation of each data point from the mean of the dataset. It provides a sense of the overall variability or spread of the data.

Definition:

Variance measures the average of the squared differences from the mean. It gives an idea of how much each data point in the set varies from the mean.

Formula:

For a dataset, the variance is calculated as: Variance = (sum of (each data point - mean) squared) / number of data points

Properties:

  • Variance is always non-negative. A variance of zero means all values in the dataset are the same.
  • The units of variance are the square of the units of the original data.
  • Variance is sensitive to outliers. Extreme values can significantly increase the variance.

Applications:

  • Finance: Variance helps in determining the volatility or risk of an asset or portfolio.
  • Quality Control: In manufacturing, variance can highlight inconsistencies in a production process.

Example Question: Given the dataset: 4, 6, 8, 10, 12. Calculate the variance.

Solution: First, calculate the mean: (4 + 6 + 8 + 10 + 12) / 5 = 8. Next, compute the variance using the formula: [(4-8)2 + (6-8)2 + (8-8)2 + (10-8)2 + (12-8)2] / 5 = 8.

Standard Deviation

Standard deviation is the square root of variance. It measures the average distance between each data point and the mean, providing a more intuitive sense of the dataset's spread.

Definition:

Standard deviation quantifies the amount of variation or dispersion of a set of values from the mean.

Formula:

Standard Deviation = square root of Variance

Applications:

  • Finance: Standard deviation measures the risk or volatility of an investment.
  • Psychology: In testing, it can show the dispersion of scores around the mean.

Example Question: Using the variance calculated above (6.4), find the standard deviation.

Solution: Standard Deviation = square root of 8 = 2.83 (rounded to two decimal places).

Range

Range provides a straightforward measure of dispersion, indicating the difference between the highest and lowest values in a dataset.

Formula:

Range = Highest Value - Lowest Value

Applications:

  • Weather: The range can show temperature variation for a day.
  • Sales: It can indicate the spread between the highest and lowest sales in a period.

Example Question: Determine the range for the dataset: 15, 22, 25, 28, 30, 35.

Solution: Range = 35 - 15 = 20.

Interquartile Range (IQR)

The IQR represents the range between the first quartile (25th percentile) and the third quartile (75th percentile) of a dataset. It focuses on the middle 50% of the data.

Formula:

IQR = Third Quartile - First Quartile

Applications:

  • Statistics: IQR is used in box plots and to identify outliers.
  • Economics: It can show the spread of the middle 50% of incomes in a region.

Example Question: For the dataset: 3, 5, 7, 8, 12, 13, 14, 18, 21. Find the IQR.

Solution: First, arrange the numbers in ascending order. Q1 (the median of the first half) = 5. Q3 (the median of the second half) = 18. IQR = 18 - 5 = 13.

FAQ

No, the standard deviation cannot be negative. It is derived from the square root of variance, and since variance (being the average of squared differences from the mean) is always non-negative, its square root (standard deviation) will also always be non-negative. A standard deviation of zero would indicate that all values in the dataset are the same, and any positive value indicates some degree of spread or dispersion in the data.

While measures of central tendency, like the mean or median, provide a snapshot of the 'centre' of the data, they don't give any information about the spread or variability of the data. Two datasets can have the same mean but vastly different spreads. Understanding dispersion is crucial because it provides context to the central value, giving a more comprehensive view of the dataset's distribution. For instance, in finance, an investment's return might have a high average, but if its variability (risk) is also high, it might not be a suitable choice for risk-averse investors.

The sample size can influence the stability and reliability of measures of dispersion. With a small sample size, a single outlier can significantly skew measures like the range or variance. As the sample size increases, the measures of dispersion tend to be more stable and less susceptible to the influence of individual data points. Additionally, when comparing variances or standard deviations between two samples, it's essential to consider sample size. A larger sample might give a more accurate representation of the population's true variance or standard deviation.

Outliers can significantly affect the range because the range is calculated using only the highest and lowest values in the dataset. If there's an extreme value (outlier), it can cause the range to be much larger than the actual spread of the majority of the data. However, the interquartile range (IQR) is more robust against outliers. Since the IQR focuses on the middle 50% of the data (between the first and third quartiles), it is not influenced by extreme values. This makes the IQR a more reliable measure of spread in datasets with potential outliers.

The standard deviation is more commonly used than variance because it is expressed in the same units as the data, making it more interpretable. Variance, on the other hand, is in squared units, which can be challenging to relate back to the original dataset. For instance, if we're looking at a dataset of heights measured in centimetres, the variance would be in square centimetres, which doesn't provide an intuitive sense of spread. The standard deviation, being the square root of variance, brings the measure back to the original unit (centimetres in this case), making it easier to understand and relate to the data.

Practice Questions

A maths teacher recorded the marks of 10 students in a recent test: 56, 58, 60, 62, 63, 64, 65, 67, 68, 70. Calculate the range and interquartile range for these marks.

First, we need to arrange the marks in ascending order, but they are already in order. For the range, we subtract the smallest mark from the largest mark: Range = 70 - 56 = 14.

To find the interquartile range (IQR), we first determine the first quartile (Q1) and the third quartile (Q3). Q1 (the median of the first half) is the average of the 2.5th value, which is the average of 60 and 62, giving 61. Q3 (the median of the second half) is the average of the 7.5th value, which is the average of 65 and 67, giving 66. IQR = Q3 - Q1 = 66 - 61 = 5.

Thus, the range is 14 and the IQR is 5.

The variance of a set of seven numbers is 9. Determine the standard deviation of this set.

The standard deviation is the square root of the variance. Given that the variance is 9, the standard deviation is the square root of 9. Standard Deviation = √9 = 3.

Therefore, the standard deviation of the set is 3.

Dr Rahil Sachak-Patwa avatar
Written by: Dr Rahil Sachak-Patwa
LinkedIn
Oxford University - PhD Mathematics

Rahil spent ten years working as private tutor, teaching students for GCSEs, A-Levels, and university admissions. During his PhD he published papers on modelling infectious disease epidemics and was a tutor to undergraduate and masters students for mathematics courses.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2 About yourself
Still have questions?
Let's get in touch.