In this short note we will explain why we multiply matrices in this “rows-by-columns” fashion. This note will only look at $2\times 2$ matrices but it should be clear, particularly by looking at this note, how this generalises to matrices of arbitrary size.

First of all we need some objects. Consider the plane $\Pi$. By fixing an origin, orientation ($x$– and $y$-directions), and scale, each point $P\in\Pi$ can be associated with an ordered pair $(a,b)$, where $a$ is the distance along the $x$ axis and $b$ is the distance along the $y$ axis. For the purposes of linear algebra we denote this point $P=(a,b)$ by

$\displaystyle P=\left(\begin{array}{c}a\\ b\end{array}\right)$.

We have two basic operations with points in the plane. We can add them together and we can scalar multiply them according to, if $Q=(c,d)$ and $\lambda\in\mathbb{R}$:

$P+Q=\left(\begin{array}{c}a\\ b\end{array}\right)+\left(\begin{array}{c}c\\ d\end{array}\right)$

$\displaystyle=\left(\begin{array}{c}a+c\\ b+d\end{array}\right)$, and

$\lambda\cdot P=\lambda\cdot \left(\begin{array}{c}a\\ b\end{array}\right)=\left(\begin{array}{c}\lambda\cdot a\\ \lambda\cdot b\end{array}\right)$.

Objects in mathematics that can be added together and scalar-multiplied are said to be vectorsSets of vectors are known as vector spaces and a feature of vector spaces is that all vectors can be written in a unique way as a sum of basic vectors.

In the case of the plane $\Pi$, the vectors $e_1=(1,0)$ (one along the $x$) and $e_2=(0,1)$ (one along the $y$) are basic vectors and the set $\mathcal{B}:=\{e_1,e_2\}$ are said to be a basis for $\Pi$. The dimension of a vector space is the size of the basis (bases are not unique but their size is) .Every vector $P\in\Pi$ may be, in a unique way, be written as a sum of elements of $\mathcal{B}$:

$\displaystyle P=\left(\begin{array}{c}a\\ b\end{array}\right)=\left(\begin{array}{c}a\\ 0\end{array}\right)+\left(\begin{array}{c}0\\ b\end{array}\right)=ae_1+be_2$.

One of the first things to do when an algebraic structure is defined, in this case the plane, is to consider functions on it. A function $f:\Pi\rightarrow \Pi$ is a map that sends each vector $P\in \Pi$ to another $f(P)\in \Pi$. For example, the function $R_{\pi/2}$ that rotates a point $\pi/2$ radians around the origin, in the anti-clockwise direction, is a function.

Of particular interest are linear maps. A linear map is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. In the case of functions $\Pi\rightarrow \Pi$, a linear map is any function $T:\Pi\rightarrow \Pi$ where $T(u+\lambda \cdot v)=T(u)+\lambda\cdot T(v)$ for any vectors $u,v\in \Pi$ and scalar $\lambda\in \mathbb{R}$. The quick calculation:

$T\left(a\cdot e_1+b\cdot e_2\right)=a\cdot T(e_1)+b\cdot T(e_2)$,

shows that a linear map is defined what it does to the basis vectors. Suppose that a linear map is defined, for scalars $x_{ij}\in\mathbb{R}$ by:

$T(e_1)=x_{11}\cdot e_1+x_{21}\cdot e_2$, and

$T(e_2)=x_{12}\cdot e_1+x_{22}\cdot e_2$,

then we see that

$T(a,b)=a\cdot T(e_1)+b\cdot T(e_2)=a\cdot (x_{11}\cdot e_1+x_{21}\cdot e_2)+b\cdot (x_{12}\cdot e_1+x_{22}\cdot e_2)$

$=(x_{11}a+x_{12}b)\cdot e_1+(x_{21}a+x_{22}b)\cdot e_2$.

Now it turns out that all this information can be encoded by a matrix $A$ as follows. Let $v=(a,b)\in \Pi$. Then $T(v)=Av$ where $A$ is a matrix given as follows:

$\displaystyle T(v)=T\left(\begin{array}{c}a \\ b\end{array}\right)=\underbrace{\left(\begin{array}{cc}x_{11} & x_{12} \\ x_{21} & x_{22}\end{array}\right)}_{:=A}\left(\begin{array}{c}a \\ b\end{array}\right).$

If we take matrix multiplication to be as we define it then multiplying this out we see that the two of these are the same thing:

$T(v)=(x_{11}a+x_{12}b)\cdot e_1+(x_{21}a+x_{22}b)\cdot e_2$

$\left(\begin{array}{cc}x_{11} & x_{12} \\ x_{21} & x_{22}\end{array}\right)\left(\begin{array}{c}a \\ b\end{array}\right)=\left(\begin{array}{c}x_{11}a +x_{12}b \\ x_{21}a+x_{22}b\end{array}\right)$.

Therefore two-by-two matrices are actually functions in the sense that every linear map $T:\Pi\rightarrow \Pi$ is of the form:

$T(v)=Av$,

for some $2\times2$ matrix $A$.

Another notation for $\Pi$ is $\mathbb{R}^2$ — basically two copies of the real numbers. All finite-dimensional vector spaces, of dimension $n$, where the scalars are real numbers, are of the form $\mathbb{R}^n$ — basically a list of $n$ numbers. It turns out that a matrix of size $M\times N$ ($M$ rows, $N$ columns) encodes a linear map $\mathbb{R}^N\rightarrow \mathbb{R}^M$ (note the switch from $M\text{-}N$ to $N\text{-}M$).

We can compose two functions to produce another. For example, consider two linear maps $T_A,T_B:\Pi\rightarrow \Pi$ encoded by two $2\times2$ matrices $A$ and $B$. Suppose we act on a point $P\in\Pi$ first by $T_B$ and then by $T_A$:

Now this composition is a function in itself, sending $P$ to

$(T_A\circ T_B)P=T_A(T_B(P))=T_A(BP)=ABP$.

Now there are two questions. The map $T_{AB}$ sending $P$ to $ABP$… is it linear (yes, a straightforward exercise) and can we associate to $AB$ a single matrix, say $C$, such that $AB=C$ and $T_{AB}=T_C$? The answer is also yes.

Let us write $P=(x,y)$ and define $T_A$ and $T_B$ by matrices $[a_{ij}]$ and $[b_{ij}]$. Then

$T_B(P)=BP=(b_{11}x+b_{12}y,b_{21}x+b_{22}y)$,

and so

$T_A(BP)=T_A((b_{11}x+b_{12}y,b_{21}x+b_{22}y))$

$=\left(a_{11}(b_{11}x+b_{12}y)+a_{12}(b_{21}x+b_{22}y),\right.$

$\left.,a_{21}(b_{11}x+b_{12}y)+a_{22}(b_{21}x+b_{22}y)\right)$.

Some careful inspection shows that this is nothing but, where $r_i^A$ is the $i$-th row of $A$, and $c_i^B$ is the $i$-th column of $B$:

$\displaystyle \left(\begin{array}{cc} r_1^A\bullet c_1^B & r_1^A\bullet c_2^B \\ r_{2}^A\bullet c_1^B & r_2^A\bullet c_2^B \end{array}\right)\left(\begin{array}{c}x\\ y\end{array}\right)$,

where this $\bullet$, called the dot product takes a pair of vectors and sends them to a scalar. In the case of vectors in the plane:

$(a_1,b_1)\bullet (a_2,b_2)=a_1a_2+b_1b_2$.

So the reason that we multiply matrices why we do is that the matrix product $AB$ represents the function composition $(T_A\circ T_B)$.