You are currently browsing the category archive for the ‘Data Science’ category.

*Correlation does not imply causation *is a mantra of modern data science. It is probably worthwhile at this point to define the terms correlation, imply, and (harder) causation.

### Correlation

For the purposes of this piece, it is sufficient to say that if we measure and record values of variables and , and they appear to have a straight-line relationship, then the correlation is a measure of how close the data is to being on a straight line. For example, consider the following data:

*The variables and have a strong correlation. *

### Causation

Causality is a deep philosophical notion, but, for the purposes of this piece, if there is a relationship between variables and such that for each value of there is a single value of , then we say that * is a function of *: is the cause and is the effect.

In this case, we write , said *is a function of *. This is a causal relationship between and . (As an example which shows why this definition is only useful for the purposes of this piece, is the relationship between sales days after January 1, and the sales, , on that day: for each value of there is a single value of : indeed is a *function *of , but does not *cause *).

## Recent Comments