You are currently browsing the category archive for the ‘Data Science’ category.
“Straight-Line-Graph-Through-The-Origin”
The words of Mr Michael Twomey, physics teacher, in Coláiste an Spioraid Naoimh, I can still hear them.
There were two main reasons to produce this straight-line-graph-through-the-origin:
- to measure some quantity (e.g. acceleration due to gravity, speed of sound, etc.)
- to demonstrate some law of nature (e.g. Newton’s Second Law, Ohm’s Law, etc.)
We were correct to draw this straight-line-graph-through-the origin for measurement, but not always, perhaps, in my opinion, for the demonstration of laws of nature.
The purpose of this piece is to explore this in detail.
Direct Proportion
Two variables and
are in direct proportion when there is some (real number) constant
such that
.
Correlation does not imply causation is a mantra of modern data science. It is probably worthwhile at this point to define the terms correlation, imply, and (harder) causation.
Correlation
For the purposes of this piece, it is sufficient to say that if we measure and record values of variables and
, and they appear to have a straight-line relationship, then the correlation is a measure of how close the data is to being on a straight line. For example, consider the following data:
The variables and
have a strong correlation.
Causation
Causality is a deep philosophical notion, but, for the purposes of this piece, if there is a relationship between variables and
such that for each value of
there is a single value of
, then we say that
is a function of
:
is the cause and
is the effect.
In this case, we write , said
is a function of
. This is a causal relationship between
and
. (As an example which shows why this definition is only useful for the purposes of this piece, is the relationship between sales
days after January 1, and the sales,
, on that day: for each value of
there is a single value of
: indeed
is a function of
, but
does not cause
).
Recent Comments