Welcome to Regression and Correlation!
In your Statistics 1 (S1) studies, you learned how to see if two variables have a linear relationship using the Product Moment Correlation Coefficient (PMCC). Now, in Unit S3, we are going to take this further! We will learn how to deal with data that isn't perfectly linear and, more importantly, how to prove if a relationship is actually "real" or just a result of random chance. Don't worry if Statistics has felt a bit heavy before—we'll break this down into simple, manageable steps.
1. Spearman's Rank Correlation Coefficient \( (r_s) \)
Sometimes, we want to know if two things are related, but they don't form a straight line on a graph. Or, perhaps the data is based on ranks (like a talent show where contestants are placed 1st, 2nd, and 3rd). This is where Spearman’s Rank Correlation Coefficient comes in.
Why use Spearman's instead of PMCC?
- Non-linear relationships: If the data moves in the same direction (both up or both down) but not in a straight line, Spearman's is better.
- Ranked data: If you only have the order of items, not their exact measurements.
- Outliers: Spearman's is less affected by one or two "weird" data points because it only looks at the order, not the specific values.
How to Calculate \( r_s \) Step-by-Step
Even though your calculator might do some of the work, you need to understand the process. Here is the magic formula:
\( r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)} \)
The Step-by-Step Guide:
- Rank the first set of data (\(x\)) from smallest to largest (1 is smallest).
- Rank the second set of data (\(y\)) from smallest to largest.
- Find the difference (\(d\)) between the ranks for each pair of data.
- Square each difference (\(d^2\)).
- Sum all those squared differences to get \( \sum d^2 \).
- Plug it into the formula where \(n\) is the number of pairs of data.
Quick Review: What do the results mean?
Just like PMCC, the answer will always be between +1 and -1.
+1: Perfect positive rank correlation (ranks match exactly).
0: No rank correlation at all.
-1: Perfect negative rank correlation (ranks are exactly opposite).
Wait, what about "Ties"?
If two people both come in "2nd place," they have a tie. In the exam, you won't be asked to calculate Spearman's with ties, but you should know how they are handled: we give them the average of the ranks they would have taken. For example, if two items tie for 2nd and 3rd, they both get rank 2.5.
Key Takeaway: Spearman’s is all about the order of the data, not the actual values. It’s perfect for judging competitions or seeing if one variable increases as another does, even if it’s not a straight line.
2. Testing for Zero Correlation
Imagine you find a correlation of 0.5. Is that "strong enough" to say the variables are related, or did you just get lucky with your small sample? In S3, we use Hypothesis Testing to find out.
Setting up the Hypotheses
When testing if a correlation exists, our "default" assumption is that there is no relationship in the whole population.
- Null Hypothesis (\( H_0 \)): \( \rho = 0 \) (There is no correlation).
- Alternative Hypothesis (\( H_1 \)):
- \( \rho > 0 \) (We suspect a positive correlation - 1-tailed).
- \( \rho < 0 \) (We suspect a negative correlation - 1-tailed).
- \( \rho \neq 0 \) (We just suspect some correlation - 2-tailed).
Note: We use the Greek letter \( \rho \) (rho) for PMCC and \( \rho_s \) for Spearman's to represent the population.
Using the Statistical Tables
You don't have to calculate the "p-value" from scratch. You will be given a table of Critical Values in the exam. This is like a "pass mark" for your test.
- Look at your Sample Size (\(n\)).
- Look at your Significance Level (usually 5% or 1%).
- Find the Critical Value in the table.
The Decision Rule:
If your calculated value (ignoring any minus sign) is GREATER than the table value, you have enough evidence! You Reject \( H_0 \) and conclude there is a correlation.
Memory Trick: Think of the Critical Value as a hurdle. If your correlation is "strong" enough to jump over the hurdle, you've proven there's a relationship!
Encouragement: Don't worry if this seems tricky at first! The hardest part is usually just picking the right column in the table. Always double-check if your test is 1-tailed or 2-tailed before looking up the value.
Key Takeaway: A hypothesis test tells us if our sample correlation is strong enough to represent the entire population. Use the tables provided, and always write your conclusion in the context of the original question!
3. Summary and Common Mistakes to Avoid
Common Pitfalls
- Using the wrong table: There are separate tables for PMCC and Spearman’s. Make sure you use the one that matches your calculation!
- Forgetting to square \(d\): In Spearman's, we must square the differences (\(d^2\)). If you don't, your sum will usually be zero!
- 2-Tailed Confusion: For a 2-tailed test at 5% significance, you often look at the 0.025 column (5% split into two ends) in some tables—check the heading of your specific table carefully!
- Misinterpreting \( \rho = 0 \): Remember that \( \rho = 0 \) means no linear correlation for PMCC, but there might still be a non-linear relationship.
Did you know?
Charles Spearman, who invented the rank correlation, was actually a psychologist. He used these statistical methods to develop theories about human intelligence!Quick Review Box
Formula: \( r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)} \)
H0: Always assume correlation is zero (\( \rho = 0 \)).
Decision: Calculated > Table Value = Significant result!
You have now covered the core of the Regression and Correlation chapter for S3. Great job! Keep practicing those table look-ups, as they are "easy marks" once you get the hang of them.