You are currently browsing the category archive for the ‘Data Science’ category.

Correlation does not imply causation is a mantra of modern data science. It is probably worthwhile at this point to define the terms correlation, imply, and (harder) causation.


For the purposes of this piece, it is sufficient to say that if we measure and record values of variables x and y, and they appear to have a straight-line relationship, then the correlation is a measure of how close the data is to being on a straight line. For example, consider the following data:


The variables y and x have a strong correlation. 


Causality is a deep philosophical notion, but, for the purposes of this piece, if there is a relationship between variables y and x such that for each value of x there is a single value of y, then we say that y is a function of x: x is the cause and y is the effect.

In this case, we write y=f(x), said y is a function of x. This is a causal relationship between x and y. (As an example which shows why this definition is only useful for the purposes of this piece, is the relationship between sales t days after January 1, and the sales, S, on that day: for each value of t there is a single value of S: indeed S is a function of t, but t does not cause S).

Read the rest of this entry »


%d bloggers like this: