**Definition**

The **mean **of a set of numbers is determined by taking the sum of all the numbers in the dataset and then dividing it by the total number of values. It provides a central value, representing the 'average' of the dataset. This concept is closely related to the normal distribution, which is a key topic in understanding how data points are spread out around the mean.

**Calculation**

To determine the mean of a dataset:

1.** Add up all the numbers**: This gives the total of all the values in the dataset.

2. **Divide by the total number of values**: This provides the average value.

**Formula**: Mean = (Sum of all values) / (Number of values)

**Example:**

Consider a dataset: 4, 8, 6, 5, 9

- Sum of values = 4 + 8 + 6 + 5 + 9 = 32
- Number of values = 5

Mean = 32 / 5 = 6.4

The mean of this dataset is 6.4.

**Properties of Mean**

The mean has several intriguing properties:

1. **Influence of Every Value**: The mean is affected by every single value in the dataset. A change in even one value will result in a change in the mean.

2. **Impact of Outliers**: Extreme values, or outliers, can have a significant impact on the mean. For instance, a single very high or very low value can skew the mean, making it higher or lower than the true central value of the dataset. Understanding the basics of probability can help in assessing the likelihood of such outliers.

3.** Balance Point**: The mean serves as a balance point. If you were to imagine placing weights on a number line, the mean would be the point where the number line would balance perfectly.

4. **Sum of Deviations**: The sum of the deviations of each value from the mean is always zero. This is because the values below the mean balance out the values above the mean.

**Example:**

Consider the dataset: 2, 4, 6, 8, 10

Mean = (2 + 4 + 6 + 8 + 10) / 5 = 6

Deviations from the mean:

- 2 is 4 units below the mean
- 4 is 2 units below the mean
- 6 is right at the mean
- 8 is 2 units above the mean
- 10 is 4 units above the mean

Summing these deviations: -4 + -2 + 0 + 2 + 4 = 0

This illustrates the property that the sum of the deviations from the mean is zero.

**Types of Mean**

While the arithmetic mean is the most commonly used, there are other types of means in statistics:

1. **Geometric Mean**: Used for datasets where values are multiplied together, like growth rates.

2. **Harmonic Mean**: Useful for datasets where values are rates, like speed.

3. **Root-Mean-Square**: Often used in physics and engineering, it gives a measure of the magnitude of a set of numbers.

These types of means can be particularly useful in different contexts, such as analysing the correlation coefficient in statistical data.

**Real-World Applications**

In education, teachers often calculate the mean score of exams to gauge the overall performance of the class. For instance, if students scored 55, 60, 65, 70, and 75 out of 100 in a maths exam, the mean score would be 65. This average score provides the teacher with an insight into the class's overall performance.

However, it's crucial to note that if one student scored exceptionally high, say 95, the mean would be skewed upwards. This highlights the importance of understanding the influence of outliers on the mean. This highlights the importance of understanding the influence of outliers on the mean and how it relates to the binomial distribution in probability theory.

**Practice Questions**

- A footballer scored the following goals in 7 matches: 1, 2, 3, 2, 1, 3, 4. Calculate the mean goals scored.

**Solution**: Total goals = 1 + 2 + 3 + 2 + 1 + 3 + 4 = 16 Number of matches = 7

Mean goals = 16 / 7 = 2.29 (rounded to two decimal places)

Thus, the mean goals scored by the footballer over the 7 matches is approximately 2.29. This is a practical example of applying the mean, similar to how one might analyse Venn diagrams in probability and set theory.

- A factory produces bulbs, and the lifespan of 5 bulbs are recorded as: 1000, 1100, 1050, 1200, 1150 hours. Calculate the mean lifespan.

**Solution**: Total lifespan = 1000 + 1100 + 1050 + 1200 + 1150 = 5500 hours Number of bulbs = 5

Mean lifespan = 5500 / 5 = 1100 hours

Thus, the mean lifespan of the bulbs is 1100 hours.

## FAQ

The sum of the deviations from the mean is always zero due to the definition of the mean itself. The mean is essentially the balance point of the dataset. When you calculate the deviations of each value from the mean, the values below the mean will have negative deviations, while those above the mean will have positive deviations. When summed together, these positive and negative deviations cancel each other out, resulting in a total of zero. This property reinforces the idea of the mean as the central value or balance point of the dataset.

The mean, median, and mode are all measures of central tendency, but they represent different aspects of a dataset. The **mean** is the average of all values, calculated by summing up all the numbers and dividing by the count of numbers. The **median** is the middle value when the data is arranged in ascending or descending order. If there's an even number of values, the median is the average of the two middle numbers. The **mode**, on the other hand, is the value that appears most frequently in a dataset. While the mean takes into account every value, it can be skewed by outliers. The median is resistant to outliers, and the mode reflects the most common value, which might not always be near the centre of the data.

The mean, while being a widely used measure of central tendency, may not always be the most representative value for a dataset, especially when the data contains outliers. Outliers are extreme values that can significantly skew the mean, making it higher or lower than the true central value of the dataset. For instance, in a set of incomes, if one person earns significantly more than the others, the mean income will be skewed upwards, giving a false impression of the average income. In such cases, the median, which is the middle value when the data is arranged in ascending or descending order, might be a more appropriate measure as it is not affected by extreme values.

Yes, a dataset can have more than one mode. When a dataset has two modes, it is referred to as **bimodal**. If it has more than two modes, it's called **multimodal**. For instance, in the dataset 2, 3, 4, 4, 5, 5, 6, the modes are 4 and 5, making it bimodal. It's important to note that a dataset can also have no mode if no number appears more than once.

The **weighted mean** is an extension of the arithmetic mean, where each value in the dataset is multiplied by a weight before summing them up. The sum is then divided by the total of the weights, not the number of values. It's used when some values in the dataset are more important or have a greater significance than others. For instance, in calculating a student's overall grade, if exams are worth 70% and coursework is worth 30%, then the scores in these areas would be weighted accordingly in calculating the mean. The arithmetic mean, on the other hand, treats every value equally, giving them all the same significance.

## Practice Questions

To calculate the mean score, we need to sum up all the scores and then divide by the number of students.

Total score = 134 + 98 + 132 + 123 + 83 + 102 + 93 + 143 + 66 + 105 = 1079

Number of students = 10

Mean score = 1079 ÷ 10 = 107.9

Thus, the mean score of the class is 107.9.

To find the total points scored in the first 6 matches, we multiply the mean score by 6.

Total points in 6 matches = 95 x 6 = 570

Including the 7th match, the total points = 570 + 110 = 680

Number of matches = 7

New mean score = 680 ÷ 7 = 97.14

Thus, the new mean score over the 7 matches is 97.14 points.

Rahil spent ten years working as private tutor, teaching students for GCSEs, A-Levels, and university admissions. During his PhD he published papers on modelling infectious disease epidemics and was a tutor to undergraduate and masters students for mathematics courses.