Skip to end of metadata
Go to start of metadata

I'll probably do a terrible job of this but . . .

For data that is quantitative, and linearly related, you can fit a line through the data that will best represent all the data.  Meaning the sum of the distance of the actual values from their predicteds will be minimized (Remember best-fit lines from high school?).  I'm not going to get into it here (unless someone is desperate to know), but we can convert two different value types (eg inches and pounds, seconds and miles etc) so that we can compare them and decide which values are more unusual (eg farther from their mean).  Once we standardize these scores, the mean becomes 0 for each axis and we get an equation for the regression line that has correlation for the slope.  Since correlation is always between -1 and 1, any x-value we plug in will give us a y-value closer to 0 (the mean).

Why the name regression?

Sir Francis Galton discovered this statistical and group phenomenon in his research for eugenics.  He wanted to figure out how to produce the perfect human race and height was one of his requirements.  He hypothesized that children of tall parents would grow to be even taller.  He was disappointed to find that this does not actually happen, but actually children will tend to be closer to a "mediocre point" than their parents were.  In his disappointment, he described this as "regression," his hopes for a progression in the human race dashed to pieces.  Luckily for the clothing industry.

"Statistics are the triumph of the quantitative method, and the quantitative method is the victory of sterility and death." -Hilaire Belloc

ahahahhhhaaa!!! that makes me laugh.