Statistics : Correlation

Study concepts, example questions & explanations for Statistics

varsity tutors app store varsity tutors android store

Example Questions

Example Question #1 : Coefficients

Which of the following best describes a data set with a correlation coefficient equal to zero?

Possible Answers:

Negative

Random

Moderately positive

Positive 

Moderately negative

Correct answer:

Random

Explanation:

In order to solve this problem, we need to understand several key concepts associated with correlations. First, let's discuss what is meant by the term "correlation." A correlation exists when two variables possess a statistical relationship with one another. It is important to note that correlation in no way relates to causation. Causation implies that one variable causes change in the other, while correlation simply denotes the observation of a trend between two variables.

Second, let's observe the differences between slope and the correlation coefficient. The correlation coefficient is denoted by the following variable.

It is mathematically defined as a goodness of fit measure that is calculated by dividing the covariance of the samples by the product of the sample's standard deviations. This is also known as Pearson's r and it describes the strength and direction of a linear relationship between two variables. On the other hand, the slope is described as the gradient of a line and is key component of the slope intercept formula:

This formula provides information about two key parts of a line: the slope and y-intercept.

The slope is commonly defined as rise over run. In other words it is the change in y-values across points divided by the change in x-values. It is calculated using the following formula:

In this formula, the x and y-values come from two points from the line written in the following format: 

It is important to note that slopes can be positive or negative. A positive slope moves upward from left to right while a negative slope moves downward. Even though the correlation coefficient will share the same sign as the slope, they mean entirely different things.

We have discussed the following distinctions: the differences between what is meant correlation and causation as well as the differences between the correlation coefficient and the slope. Now, we can start to solve the problem. 

First, lets learn how calculate the correlation coefficient from coefficient of determination. The coefficient of determination is denoted by the following:

We can calculate the correlation coefficient by taking the square root of the coefficient of determination:

After we calculate the correlation coefficient, we need to know how to evaluate what the number means. We can pick the sign based on the position of the trendline or slope. If the slope is negative then the trendline travels downward from the left to the right of the graph. On the other hand, if the slope is positive then the trendline travels upwards from the left to the right side of the graph. Below is a table of values that explains the relationships between points based upon the correlation coefficient. A correlation coefficient close to zero indicates a random distribution.

Screen shot 2016 01 18 at 1.41.50 pm

Let's look at several examples. The following two graphs possess a positive and moderately positive trendline, respectively:

Positive strong

Positive moderate

Graphs can possess a negative and slightly negative trendlines, respectively:

Negative strong

Negative weak

Last, a graph with a near horizontal line is indicative of a random distribution; therefore, the answer to the question is "random."

Random

 

Learning Tools by Varsity Tutors