In this short note we will explain why we multiply matrices in this “rows-by-columns” fashion. This note will only look at 2\times 2 matrices but it should be clear, particularly by looking at this note, how this generalises to matrices of arbitrary size.

First of all we need some objects. Consider the plane \Pi. By fixing an origin, orientation (x– and y-directions), and scale, each point P\in\Pi can be associated with an ordered pair (a,b), where a is the distance along the x axis and b is the distance along the y axis. For the purposes of linear algebra we denote this point P=(a,b) by

\displaystyle P=\left(\begin{array}{c}a\\ b\end{array}\right).

graph7

We have two basic operations with points in the plane. We can add them together and we can scalar multiply them according to, if Q=(c,d) and \lambda\in\mathbb{R}:

P+Q=\left(\begin{array}{c}a\\ b\end{array}\right)+\left(\begin{array}{c}c\\ d\end{array}\right)

\displaystyle=\left(\begin{array}{c}a+c\\ b+d\end{array}\right), and

\lambda\cdot P=\lambda\cdot \left(\begin{array}{c}a\\ b\end{array}\right)=\left(\begin{array}{c}\lambda\cdot a\\ \lambda\cdot b\end{array}\right).

Objects in mathematics that can be added together and scalar-multiplied are said to be vectorsSets of vectors are known as vector spaces and a feature of vector spaces is that all vectors can be written in a unique way as a sum of basic vectors. 

In the case of the plane \Pi, the vectors e_1=(1,0) (one along the x) and e_2=(0,1) (one along the y) are basic vectors and the set \mathcal{B}:=\{e_1,e_2\} are said to be a basis for \Pi. The dimension of a vector space is the size of the basis (bases are not unique but their size is) .Every vector P\in\Pi may be, in a unique way, be written as a sum of elements of \mathcal{B}:

\displaystyle P=\left(\begin{array}{c}a\\ b\end{array}\right)=\left(\begin{array}{c}a\\ 0\end{array}\right)+\left(\begin{array}{c}0\\ b\end{array}\right)=ae_1+be_2.

One of the first things to do when an algebraic structure is defined, in this case the plane, is to consider functions on it. A function f:\Pi\rightarrow \Pi is a map that sends each vector P\in \Pi to another f(P)\in \Pi. For example, the function R_{\pi/2} that rotates a point \pi/2 radians around the origin, in the anti-clockwise direction, is a function.

graph8

Of particular interest are linear maps. A linear map is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. In the case of functions \Pi\rightarrow \Pi, a linear map is any function T:\Pi\rightarrow \Pi where T(u+\lambda \cdot v)=T(u)+\lambda\cdot T(v) for any vectors u,v\in \Pi and scalar \lambda\in \mathbb{R}. The quick calculation:

T\left(a\cdot e_1+b\cdot e_2\right)=a\cdot T(e_1)+b\cdot T(e_2),

shows that a linear map is defined what it does to the basis vectors. Suppose that a linear map is defined, for scalars x_{ij}\in\mathbb{R} by:

T(e_1)=x_{11}\cdot e_1+x_{21}\cdot e_2, and

T(e_2)=x_{12}\cdot e_1+x_{22}\cdot e_2,

then we see that

T(a,b)=a\cdot T(e_1)+b\cdot T(e_2)=a\cdot (x_{11}\cdot e_1+x_{21}\cdot e_2)+b\cdot (x_{12}\cdot e_1+x_{22}\cdot e_2)

=(x_{11}a+x_{12}b)\cdot e_1+(x_{21}a+x_{22}b)\cdot e_2.

Now it turns out that all this information can be encoded by a matrix A as follows. Let v=(a,b)\in \Pi. Then T(v)=Av where A is a matrix given as follows:

\displaystyle T(v)=T\left(\begin{array}{c}a \\ b\end{array}\right)=\underbrace{\left(\begin{array}{cc}x_{11} & x_{12} \\ x_{21} & x_{22}\end{array}\right)}_{:=A}\left(\begin{array}{c}a \\ b\end{array}\right).

If we take matrix multiplication to be as we define it then multiplying this out we see that the two of these are the same thing:

T(v)=(x_{11}a+x_{12}b)\cdot e_1+(x_{21}a+x_{22}b)\cdot e_2

\left(\begin{array}{cc}x_{11} & x_{12} \\ x_{21} & x_{22}\end{array}\right)\left(\begin{array}{c}a \\ b\end{array}\right)=\left(\begin{array}{c}x_{11}a +x_{12}b \\ x_{21}a+x_{22}b\end{array}\right).

Therefore two-by-two matrices are actually functions in the sense that every linear map T:\Pi\rightarrow \Pi is of the form:

T(v)=Av,

for some 2\times2 matrix A.

Another notation for \Pi is \mathbb{R}^2 — basically two copies of the real numbers. All finite-dimensional vector spaces, of dimension n, where the scalars are real numbers, are of the form \mathbb{R}^n — basically a list of n numbers. It turns out that a matrix of size M\times N (M rows, N columns) encodes a linear map \mathbb{R}^N\rightarrow \mathbb{R}^M (note the switch from M\text{-}N to N\text{-}M).

We can compose two functions to produce another. For example, consider two linear maps T_A,T_B:\Pi\rightarrow \Pi encoded by two 2\times2 matrices A and B. Suppose we act on a point P\in\Pi first by T_B and then by T_A:

graph9

Now this composition is a function in itself, sending P to

(T_A\circ T_B)P=T_A(T_B(P))=T_A(BP)=ABP.

Now there are two questions. The map T_{AB} sending P to ABP… is it linear (yes, a straightforward exercise) and can we associate to AB a single matrix, say C, such that AB=C and T_{AB}=T_C? The answer is also yes.

Let us write P=(x,y) and define T_A and T_B by matrices [a_{ij}] and [b_{ij}]. Then

T_B(P)=BP=(b_{11}x+b_{12}y,b_{21}x+b_{22}y),

and so

T_A(BP)=T_A((b_{11}x+b_{12}y,b_{21}x+b_{22}y))

=\left(a_{11}(b_{11}x+b_{12}y)+a_{12}(b_{21}x+b_{22}y),\right.

\left.,a_{21}(b_{11}x+b_{12}y)+a_{22}(b_{21}x+b_{22}y)\right).

Some careful inspection shows that this is nothing but, where r_i^A is the i-th row of A, and c_i^B is the i-th column of B:

\displaystyle \left(\begin{array}{cc} r_1^A\bullet c_1^B & r_1^A\bullet c_2^B \\ r_{2}^A\bullet c_1^B & r_2^A\bullet c_2^B  \end{array}\right)\left(\begin{array}{c}x\\ y\end{array}\right),

where this \bullet, called the dot product takes a pair of vectors and sends them to a scalar. In the case of vectors in the plane:

(a_1,b_1)\bullet (a_2,b_2)=a_1a_2+b_1b_2.

So the reason that we multiply vectors why we do is that the matrix product AB represents the function composition (T_A\circ T_B).

Advertisements