You are currently browsing the category archive for the ‘Data Science’ category.

“Straight-Line-Graph-Through-The-Origin”

The words of Mr Michael Twomey, physics teacher, in Coláiste an Spioraid Naoimh, I can still hear them.

There were two main reasons to produce this straight-line-graph-through-the-origin:

  • to measure some quantity (e.g. acceleration due to gravity, speed of sound, etc.)
  • to demonstrate some law of nature (e.g. Newton’s Second Law, Ohm’s Law, etc.)

We were correct to draw this straight-line-graph-through-the origin for measurement, but not always, perhaps, in my opinion, for the demonstration of laws of nature.

The purpose of this piece is to explore this in detail.

Direct Proportion

Two variables P and Q are in direct proportion when there is some (real number) constant k such that P=k\cdot Q.

Read the rest of this entry »

Correlation does not imply causation is a mantra of modern data science. It is probably worthwhile at this point to define the terms correlation, imply, and (harder) causation.

Correlation

For the purposes of this piece, it is sufficient to say that if we measure and record values of variables x and y, and they appear to have a straight-line relationship, then the correlation is a measure of how close the data is to being on a straight line. For example, consider the following data:

graph14

The variables y and x have a strong correlation. 

Causation

Causality is a deep philosophical notion, but, for the purposes of this piece, if there is a relationship between variables y and x such that for each value of x there is a single value of y, then we say that y is a function of x: x is the cause and y is the effect.

In this case, we write y=f(x), said y is a function of x. This is a causal relationship between x and y. (As an example which shows why this definition is only useful for the purposes of this piece, is the relationship between sales t days after January 1, and the sales, S, on that day: for each value of t there is a single value of S: indeed S is a function of t, but t does not cause S).

Read the rest of this entry »