You are currently browsing the category archive for the ‘Data Science’ category.

“Straight-Line-Graph-Through-The-Origin”

The words of Mr Michael Twomey, physics teacher, in Coláiste an Spioraid Naoimh, I can still hear them.

There were two main reasons to produce this straight-line-graph-through-the-origin:

• to measure some quantity (e.g. acceleration due to gravity, speed of sound, etc.)
• to demonstrate some law of nature (e.g. Newton’s Second Law, Ohm’s Law, etc.)

We were correct to draw this straight-line-graph-through-the origin for measurement, but not always, perhaps, in my opinion, for the demonstration of laws of nature.

The purpose of this piece is to explore this in detail.

## Direct Proportion

Two variables $P$ and $Q$ are in direct proportion when there is some (real number) constant $k$ such that $P=k\cdot Q$.

Correlation does not imply causation is a mantra of modern data science. It is probably worthwhile at this point to define the terms correlation, imply, and (harder) causation.

### Correlation

For the purposes of this piece, it is sufficient to say that if we measure and record values of variables $x$ and $y$, and they appear to have a straight-line relationship, then the correlation is a measure of how close the data is to being on a straight line. For example, consider the following data: The variables $y$ and $x$ have a strong correlation.

### Causation

Causality is a deep philosophical notion, but, for the purposes of this piece, if there is a relationship between variables $y$ and $x$ such that for each value of $x$ there is a single value of $y$, then we say that $y$ is a function of $x$: $x$ is the cause and $y$ is the effect.

In this case, we write $y=f(x)$, said $y$ is a function of $x$. This is a causal relationship between $x$ and $y$. (As an example which shows why this definition is only useful for the purposes of this piece, is the relationship between sales $t$ days after January 1, and the sales, $S$, on that day: for each value of $t$ there is a single value of $S$: indeed $S$ is a function of $t$, but $t$ does not cause $S$). 