**Introduction to Scatter Plots**

A scatter plot is a type of data visualisation that uses dots to represent the values obtained for two different variables - one plotted along the x-axis and the other plotted along the y-axis.

**Purpose**: The primary purpose of scatter plots is to observe and show the relationship between two numeric variables. They help in determining the type and strength of the relationship between two variables. To further understand how to measure this relationship, you might want to learn about the correlation coefficient.**Components**: A scatter plot typically consists of a horizontal axis (x-axis), a vertical axis (y-axis), and a series of dots. Each dot represents an observation from the data set and its position on the x and y-axis indicates its values for the two variables.

**Plotting Data on Scatter Plots**

**Choosing Variables**

Before creating a scatter plot, it's essential to determine which variables you want to compare:

**Independent Variable (x-axis)**: This is the variable that you think might influence the other variable. It's the one you have control over, the one you can choose.**Dependent Variable (y-axis)**: This is the variable that you suspect might be influenced by the independent variable. It's the outcome you're interested in. Understanding these variables can also help when studying normal distribution.

**Drawing the Scatter Plot**

1. **Draw the Axes**: Begin by drawing a horizontal line (x-axis) and a vertical line (y-axis) that intersect at a right angle. Label each axis with its respective variable.

2. **Plotting Data Points**: For each observation in your data, find the value of the independent variable (x-coordinate) and the value of the dependent variable (y-coordinate). Place a dot at the point where these two values intersect.

3. **Title and Labels**: Always provide a title for your scatter plot to indicate what the graph represents. Also, label the axes with the units of measurement if applicable. A good practice is to understand the concept of the mean when dealing with data points.

**Example**

Suppose we want to investigate if there's a relationship between the number of hours students study and their exam scores. We can plot the number of hours studied on the x-axis and the exam scores on the y-axis. By plotting each student's data point on the graph, we can visually inspect if there's a trend or correlation between study hours and exam scores.

**Interpreting Patterns in Scatter Plots**

Once the scatter plot is drawn, the next step is to interpret the patterns:

**Types of Correlation**

**Positive Correlation**: If the data points form an upward trend from left to right, this indicates a positive correlation. As one variable increases, the other variable also tends to increase.**Negative Correlation**: If the data points form a downward trend from left to right, this indicates a negative correlation. As one variable increases, the other variable tends to decrease.**No Correlation**: If the data points are scattered randomly without any clear trend, this indicates that there's no correlation between the two variables.

**Strength of Correlation**

The strength of the correlation can be visually assessed by how closely the data points cluster around a line:

**Strong Correlation**: Data points are closely packed together, following a clear trend.**Weak Correlation**: Data points are more spread out and less consistent in following a trend. To see how lines can help in visualising these correlations, refer to the concept of the line of best fit.

**Outliers**

Outliers are data points that do not fit the general trend of your scatter plot. They can skew interpretations and should be investigated to determine if they represent genuine data or errors. Additionally, analysing outliers is significant in contexts like the binomial distribution.

**Practical Applications of Scatter Plots**

Scatter plots are widely used in various fields:

**Economics**: To study the relationship between variables like income and expenditure, price and demand, etc.**Biology**: To study the relationship between variables like height and weight, age and blood pressure, etc.**Environmental Science**: To study the relationship between variables like temperature and ice melt, carbon dioxide levels and global temperature, etc.

**Example Questions**

1.** Question**: Given the following data points, plot them on a scatter plot and determine if there's a correlation:Age (years): 20, 25, 30, 35, 40Salary (£): 20,000, 25,000, 30,000, 35,000, 40,000

**Solution**: After plotting the data points, we can see an upward trend. As age increases, salary also seems to increase. This indicates a positive correlation between age and salary.

2.** Question**: Plot the following data points and interpret the correlation:Hours of TV watched per week: 1, 2, 3, 4, 5Fitness score: 90, 85, 80, 75, 70

**Solution**: The scatter plot shows a downward trend. As the hours of TV watched increase, the fitness score decreases. This indicates a negative correlation between hours of TV watched and fitness score.

## FAQ

Traditional scatter plots are designed to visualise the relationship between two variables. However, there are variations and techniques to incorporate more than two variables into a scatter plot. One common method is using colour, size, or shape of the data points to represent a third variable. For instance, in a scatter plot comparing height and weight, the colour of each point could represent age. Another approach is to use 3D scatter plots, where three axes represent three variables. However, 3D plots can be challenging to interpret on a two-dimensional screen, so they're less commonly used than their 2D counterparts.

Scatter plots and line graphs both visualise data points on a graph, but they serve different purposes. A scatter plot is used to show the relationship between two variables, with each dot representing an individual data point. There's no inherent order to the points, and they're not connected. On the other hand, a line graph is used to track changes over periods, typically time. The data points in a line graph are connected to show a continuous series. In essence, while scatter plots are used to determine relationships between variables, line graphs are used to track changes over time.

In the context of scatter plots, correlation refers to the relationship or association between two variables. If the data points in a scatter plot show a clear trend (either upward or downward), it indicates that the variables are correlated. However, correlation does not imply causation. Just because two variables are correlated doesn't mean that changes in one variable cause changes in the other. Causation implies a direct cause-and-effect relationship between two variables. Determining causation requires more rigorous investigation, often involving controlled experiments or additional statistical analyses.

While scatter plots are a powerful tool for visualising relationships between two variables, they do have limitations. Firstly, they're best suited for continuous numerical data; categorical data can be challenging to represent. Secondly, scatter plots can become cluttered and hard to interpret if there's a large amount of data. Overlapping data points can obscure patterns. Additionally, while scatter plots can show correlations, they don't provide a quantitative measure of the strength or nature of the relationship. For that, additional statistical analyses, like correlation coefficients or regression, might be needed.

Scatter plots are crucial in data analysis because they provide a visual representation of the relationship between two variables. By plotting data points on a graph, analysts can quickly identify patterns, trends, and correlations. This visual insight can be more intuitive than just looking at raw data or statistical measures. Scatter plots also allow for the identification of outliers or anomalies in the data, which might indicate errors or unique cases that need further investigation. Moreover, they serve as a foundational step for many advanced statistical techniques, such as regression analysis, where the relationship between variables is quantified.

## Practice Questions

**The data collected from 5 students is as follows:**

**Hours on Social Media: 1, 2, 3, 4, 5**

**Average Grades: 85, 82, 78, 75, 70**

**a) Plot the data on a scatter plot.**

**b) Describe the type of correlation observed from the scatter plot.**

**c) Predict the average grade of a student who spends 6 hours on social media daily.**

a) The scatter plot shows a downward trend as the number of hours on social media increases, the average grades decrease.

b) The correlation observed from the scatter plot is negative. As the number of hours a student spends on social media increases, their average grade tends to decrease.

c) Based on the observed trend, a student who spends 6 hours on social media daily might have an average grade around 65 or lower.

**The data for 5 cars is given below:**

**Age of Cars (years): 1, 3, 5, 7, 9**

**Resale Value (£): 15,000, 12,000, 9,000, 7,000, 5,000**

**a) Create a scatter plot using the given data.**

**b) Determine the type of correlation between the age of cars and their resale value.**

**c) If a car is 10 years old, estimate its resale value based on the trend observed.**

a) The scatter plot will show a clear downward trend indicating that as cars age, their resale value decreases.

b) The scatter plot shows a negative correlation between the age of cars and their resale value. As cars get older, their resale value tends to decrease.

c) Based on the trend observed in the scatter plot, a car that is 10 years old might have a resale value of around £4,000 or slightly less.