Matrix Multiplication and Composite Transformations

Visual Linear Algebra Online, Section 1.8

The composition of two reflections in the opposite order produces images (blue) that are rotations of each other by $180^{\circ}$ .

The composition of two reflections in the opposite order produces images (blue) that are rotations of each other by $180^{\circ}$ .

Function composition is a fundamental binary operation that arises in all areas of mathematics. Function composition is a useful way to create new functions from simpler pieces.

When the functions are linear transformations from linear algebra, function composition can be computed via matrix multiplication.

But let’s start by looking at a simple example of function composition. Consider a spherical snowball of volume $100\mbox{ cm}^{3}$ . Suppose (unrealistically) that it stays spherical as it melts at a constant rate of $10\mbox{ cm}^{3}/\mbox{hr}$ . Then the volume of the snowball would be $V=f(t)=100-10t\mbox{ cm}^{3}$ , where $t$ is the number of hours since it started melting and $0\leq t\leq 10$ .

How does the radius of the snowball depend on time?

Since the snowball stays spherical, we know that if $r$ is its radius, in cm, then $V=\frac{4}{3}\pi r^{3}\mbox{ cm}^{3}$ . This is easily solved for the radius as a function of time to get $r=g(V)=\sqrt[3]{\frac{3}{4\pi}V}\mbox{ cm}$ .

Now, to find how $r$ depends on $t$ , “compose” the functions $g$ and $f$ to get $r=h(t)=g(f(t))=g(100-10t)=\sqrt[3]{\frac{3}{4\pi}(100-10t)}$ . Note that we are “plugging in” the output $f(t)$ of the “inner” function $f$ as input for the “outer” function $g$ .

Schematically, if we let $[0,10]$ be the real interval domain of $f$ , this composition might be shown like this: $[0,10]\xrightarrow{\ \ f\ \ }{\Bbb R}\xrightarrow{\ \ g\ \ }{\Bbb R}$ .

Circle Notation and Non-Commutativity

An alternative way to write the relationship between these functions is as $h=g\circ f$ , or $r=h(t)=(g\circ f)(t)$ . This “circle notation” emphasizes that function composition is a binary operation. The small circle $\circ$ represents the operation of taking two functions $g$ and $f$ and forming one new function $h=g\circ f$ .

Unlike addition or multiplication of numbers, however, composition of functions is not commutative. In general, $g\circ f\not=f\circ g$ .

This inequality is certainly the case in our example.

Ignoring the fact that the reverse composition makes no physical sense for this application, we also get a completely different function: $(f\circ g)(V)=f(g(V))=f\left(\sqrt[3]{\frac{3}{4\pi}V}\right)=100-10\sqrt[3]{\frac{3}{4\pi}V}$ .

The key distinction here when comparing this function with $(g\circ f)(t)$ is not that the variable name is different. Rather, it is that the arithmetic operations that the functions represent are different. For most given inputs, the corresponding outputs of these functions are different. Their graphs are also different.

Composing Linear Functions from Precalculus and Calculus

Function composition usually produces a different type of function from the original functions. An exception to this general rule arises with linear functions.

Suppose we consider two linear functions of the type you have encountered before linear algebra, say $f(x)=ax+b$ and $g(x)=cx+d$ . Then $(f\circ g)(x)=f(g(x))=f(cx+d)=a(cx+d)+b=(ac)x+(ad+b)$ and $(g\circ f)(x)=g(f(x))=g(ax+b)=c(ax+b)+d=(ac)x+(bc+d)$ are both linear.

According to our new definition of linearity in linear algebra, we might therefore wonder: is the composition of two linear transformations also a linear transformation?

Composing Linear Transformations

Let $\ell, m,$ and $n$ be positive integers. Suppose $T:{\Bbb R}^{n}\longrightarrow {\Bbb R}^{m}$ and $S:{\Bbb R}^{m}\longrightarrow {\Bbb R}^{\ell}$ be linear transformations. Is the composition $S\circ T$ also a linear transformation?

To find out, suppose $a,b\in {\Bbb R}$ and ${\bf u},{\bf v}\in {\Bbb R}^{n}$ . We can use the definition of linearity found in Section 1.7, “High-Dimensional Linear Algebra” to do the computation below. Notice the importance of the assumption that $S$ and $T$ are both linear in this computation.

$(S\circ T)(a{\bf u}+b{\bf v})=S(T(a{\bf u}+b{\bf v}))=S(aT({\bf u})+bT({\bf v}))$ $=aS(T({\bf u}))+bS(T({\bf v}))=a(S\circ T)({\bf u})+b(S\circ T)({\bf v})$

But how is this linearity related to the matrix representatives of these linear transformations? We might start investigating the answer to this question by considering examples in low dimensions.

Before doing so, it is important to recall that we have already defined a matrix times a vector in Section 1.7, “High-Dimensional Linear Algebra”. Specifically, if $A$ is an $m\times n$ matrix ( $m$ rows and $n$ columns) and ${\bf x}$ is an $n$ -dimensional vector (an $n\times 1$ matrix), then the matrix/vector product $A{\bf x}$ will be an $m$ -dimensional vector (an $m\times 1$ matrix). Even more specifically, $A{\bf x}$ will be the $m$ -dimensional vector formed as a linear combination of the columns of $A$ with the entries of ${\bf x}$ as the corresponding weights.

In a formula, if $A=\left[{\bf a}_{1}\ {\bf a}_{2}\cdots {\bf a}_{n}\right]$ , where the ${\bf a}_{i}\in {\Bbb R}^{m}$ are the columns of $A$ , and if ${\bf x}=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n}\end{array}\right]$ , then:

$A{\bf x}=\left[{\bf a}_{1}\ {\bf a}_{2}\cdots {\bf a}_{n}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n}\end{array}\right]=x_{1}{\bf a}_{1}+x_{2}{\bf a}_{2}+\cdots +x_{n}{\bf a}_{n}=\displaystyle\sum_{i=1}^{n}x_{i}{\bf a}_{i}$ .

Matrix Multiplication for a Composition ${\Bbb R}\xrightarrow{\ \ T\ \ } {\Bbb R}\xrightarrow{\ \ S\ \ }{\Bbb R}$

Suppose $T:{\Bbb R}\longrightarrow {\Bbb R}$ and $S:{\Bbb R}\longrightarrow {\Bbb R}$ are linear transformations. Then there are constants ( $1\times 1$ matrices) $b$ and $a$ such that $T(x)=bx$ and $S(y)=ay$ for all $x,y\in {\Bbb R}$ .

The composition of these functions is $(S\circ T)(x)=S(T(x))=S(bx)=a(bx)=(ab)x$ . In other words, the $1\times 1$ matrix (number) corresponding to the composition is the product of the $1\times 1$ matrices (numbers) corresponding to each of the “factors” $S$ and $T$ of $S\circ T$ .

This leads us to define the product of $1\times 1$ matrices as another $1\times 1$ matrix: $[a][b]=[ab]$ . Matrix multiplication is easy when the matrices are $1\times 1$ . Just multiply the numbers!

Matrix Multiplication for a Composition ${\Bbb R}\xrightarrow{\ \ T\ \ } {\Bbb R}^{2}\xrightarrow{\ \ S\ \ }{\Bbb R}$

Suppose $T:{\Bbb R}\longrightarrow {\Bbb R}^{2}$ and $S:{\Bbb R}^{2}\longrightarrow {\Bbb R}$ are linear transformations. Then there is a $2\times 1$ matrix $B$ and a $1\times 2$ matrix $A$ such that $T({\bf x})=B{\bf x}$ for all ${\bf x}=[x]=x\in {\Bbb R}$ and $S({\bf y})=A{\bf y}$ for all ${\bf y}\in {\Bbb R}^{2}$ .

We seek to define the matrix product $AB$ in such a way that $(S\circ T)({\bf x})=S(T({\bf x}))=(AB){\bf x}=A(B{\bf x})$ for all ${\bf x}=[x]=x\in {\Bbb R}$ . Doing so in this situation necessarily implies that the product $AB$ must be a $1\times 1$ matrix (a number).

If $B=\left[\begin{array}{c} b_{1} \\ b_{2} \end{array}\right]$ and $A=\left[a_{1}\ \ a_{2}\right]$ , then we can write the equations below.

$T({\bf x})=B{\bf x}=\left[\begin{array}{c} b_{1} \\ b_{2} \end{array}\right][x]=\left[\begin{array}{c} b_{1}x \\ b_{2}x \end{array}\right]$ , and

$S({\bf y})=A{\bf y}=\left[a_{1}\ \ a_{2}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]=[a_{1}y_{1}+a_{2}y_{2}]=a_{1}y_{1}+a_{2}y_{2}$ .

The composed function is therefore:

$(S\circ T)({\bf x})=S(T({\bf x}))=S\left(\left[\begin{array}{c} b_{1}x \\ b_{2}x \end{array}\right]\right)=\left[a_{1}\ \ a_{2}\right]\left[\begin{array}{c} b_{1}x \\ b_{2}x \end{array}\right]=a_{1}b_{1}x+a_{2}b_{2}x$

Notice that the final answer is the same as the following matrix product of $1\times 1$ matrices (numbers)

$\left[a_{1}b_{1}+a_{2}b_{2}\right][x]=[a_{1}b_{1}x+a_{2}b_{2}x]=a_{1}b_{1}x+a_{2}b_{2}x$

Therefore, the matrix product $AB$ should be the $1\times 1$ matrix below. This illustrates that a $1\times 2$ matrix times a $2\times 1$ matrix can be defined and the answer is a $1\times 1$ matrix.

$AB=\left[a_{1}\ \ a_{2}\right]\left[\begin{array}{c} b_{1} \\ b_{2} \end{array}\right]=\left[a_{1}b_{1}+a_{2}b_{2}\right]=a_{1}b_{1}+a_{2}b_{2}$

Since $B$ is really a two-dimensional (column) vector, we have already done this computation before! The answer is a linear combination of the columns of $A$ with the entries of $B$ as the weights.

Matrix Multiplication for a Composition ${\Bbb R}^{2}\xrightarrow{\ \ T\ \ } {\Bbb R}\xrightarrow{\ \ S\ \ }{\Bbb R}^{2}$

Suppose $T:{\Bbb R}^{2}\longrightarrow {\Bbb R}$ and $S:{\Bbb R}\longrightarrow {\Bbb R}^{2}$ are linear transformations. Then there is a $1\times 2$ matrix $B$ and a $2\times 1$ matrix $A$ such that $T({\bf x})=B{\bf x}$ for all ${\bf x}\in {\Bbb R}^{2}$ and $S({\bf y})=A{\bf y}$ for all ${\bf y}=[y]=y\in {\Bbb R}$ .

We seek to define the matrix product $AB$ in such a way that $(S\circ T)({\bf x})=S(T({\bf x}))=(AB){\bf x}=A(B{\bf x})\in {\Bbb R}^{2}$ for all ${\bf x}\in {\Bbb R}^{2}$ . Doing so in this situation necessarily implies that the product $AB$ must be a $2\times 2$ matrix.

If $B=\left[b_{1}\ \ b_{2}\right]$ and $A=\left[\begin{array}{c} a_{1} \\ a_{2}\end{array}\right]$ , then we can write the equations below.

$T({\bf x})=B{\bf x}=\left[b_{1}\ \ b_{2}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]=\left[b_{1}x_{1}+b_{2}x_{2}\right]$ , and

$S({\bf y})=A{\bf y}=\left[\begin{array}{c} a_{1} \\ a_{2}\end{array}\right][y]=\left[\begin{array}{c} a_{1}y \\ a_{2} y\end{array}\right]$ .

The composed function is therefore:

$(S\circ T)({\bf x})=S(T({\bf x}))=S\left(\left[b_{1}x_{1}+b_{2}x_{2}\right]\right)=\left[\begin{array}{c} a_{1} \\ a_{2}\end{array}\right]\left[b_{1}x_{1}+b_{2}x_{2}\right]=\left[\begin{array}{c} a_{1}(b_{1}x_{1}+b_{2}x_{2}) \\ a_{2}(b_{1}x_{1}+b_{2}x_{2})\end{array}\right]$ .

This can be rewritten as:

$(S\circ T)({\bf x})=\left[\begin{array}{c} a_{1}b_{1}x_{1}+a_{1}b_{2}x_{2} \\ a_{2}b_{1}x_{1}+a_{2}b_{2}x_{2}\end{array}\right]=x_{1}\left[\begin{array}{c} a_{1}b_{1} \\ a_{2}b_{1} \end{array}\right]+x_{2}\left[\begin{array}{c} a_{1}b_{2} \\ a_{2}b_{2} \end{array}\right]$ .

By our previous definition of matrix/vector multiplication, this is the same as saying that

$(S\circ T)({\bf x})=\left[\begin{array}{cc} a_{1}b_{1} & a_{1}b_{2} \\ a_{2}b_{1} & a_{2}b_{2}\end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$ .

Therefore, to define matrix multiplication of a $2\times 1$ matrix times a $1\times 2$ matrix, we should do it as shown below:

$AB=\left[\begin{array}{c} a_{1} \\ a_{2}\end{array}\right]\left[b_{1}\ \ b_{2}\right]=\left[\begin{array}{cc} a_{1}b_{1} & a_{1}b_{2} \\ a_{2}b_{1} & a_{2}b_{2}\end{array}\right]$ .

This is a fundamentally new kind of product from what we have done before. However, notice that it can be written in the following way, where each column of the answer is a two-dimensional vector formed by as the $2\times 1$ matrix $A$ times a $1\times 1$ column from $B$ .

$AB=\left[\begin{array}{c} a_{1} \\ a_{2}\end{array}\right]\left[b_{1}\ \ b_{2}\right]=\left[\begin{array}{cc} A[b_{1}] & A[b_{2}] \end{array}\right]$

This pattern will work in the general case which we will soon specify.

Matrix Multiplication for a Composition ${\Bbb R}^{2}\xrightarrow{\ \ T\ \ } {\Bbb R}^{2}\xrightarrow{\ \ S\ \ }{\Bbb R}^{2}$

One more example in low dimensions will be considered before we discuss the general case.

Suppose $T:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ and $S:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ are linear transformations. Then there is a $2\times 2$ matrix $B$ and a $2\times 2$ matrix $A$ such that $T({\bf x})=B{\bf x}$ for all ${\bf x}\in {\Bbb R}^{2}$ and $S({\bf y})=A{\bf y}$ for all ${\bf y}\in {\Bbb R}^{2}$ .

Once again, we seek to define the matrix product $AB$ in such a way that $(S\circ T)({\bf x})=S(T({\bf x}))=(AB){\bf x}=A(B{\bf x})\in {\Bbb R}^{2}$ for all ${\bf x}\in {\Bbb R}^{2}$ . Doing so in this situation necessarily implies that the product $AB$ must be a $2\times 2$ matrix.

If $B=\left[\begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array}\right]$ and $A=\left[\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right]$ , then we can write the equations below.

$T({\bf x})=B{\bf x}=\left[\begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$

$=x_{1}\left[\begin{array}{c} b_{11} \\ b_{21} \end{array}\right]+x_{2}\left[\begin{array}{c} b_{12} \\ b_{22} \end{array}\right]=\left[\begin{array}{c} b_{11}x_{1} + b_{12}x_{2} \\ b_{21}x_{1} + b_{22}x_{2} \end{array}\right]$ ,

and

$S({\bf y})=A{\bf y}=\left[\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]$

$=y_{1}\left[\begin{array}{c} a_{11} \\ a_{21} \end{array}\right]+y_{2}\left[\begin{array}{c} a_{12} \\ a_{22} \end{array}\right]=\left[\begin{array}{c} a_{11}y_{1} + a_{12}y_{2} \\ a_{21}y_{1} + a_{22}y_{2} \end{array}\right]$ .

The composed function is therefore:

$(S\circ T)({\bf x})=S\left(\left[\begin{array}{c} b_{11}x_{1} + b_{12}x_{2} \\ b_{21}x_{1} + b_{22}x_{2} \end{array}\right]\right)=\left[\begin{array}{c} a_{11}(b_{11}x_{1} + b_{12}x_{2})+ a_{12}(b_{21}x_{1} + b_{22}x_{2}) \\ a_{21}(b_{11}x_{1} + b_{12}x_{2}) + a_{22}(b_{21}x_{1} + b_{22}x_{2}) \end{array}\right]$

$=\left[\begin{array}{c} (a_{11}b_{11}+a_{12}b_{21})x_{1}+(a_{11}b_{12}+a_{12}b_{22})x_{2} \\ (a_{21}b_{11}+a_{22}b_{21})x_{1}+(a_{21}b_{12}+a_{22}b_{22})x_{2}\end{array}\right]$ ,

which equals

$x_{1}\left[\begin{array}{c} a_{11}b_{11}+a_{12}b_{21} \\ a_{21}b_{11}+a_{22}b_{21}\end{array}\right]+x_{2}\left[\begin{array}{c} a_{11}b_{12}+a_{12}b_{22} \\ a_{21}b_{12}+a_{22}b_{22}\end{array}\right]$ .

By our definition of matrix/vector multiplication, this is the same thing as

$\left[\begin{array}{cc} a_{11}b_{11}+a_{12}b_{21} & a_{11}b_{12}+a_{12}b_{22} \\ a_{21}b_{11}+a_{22}b_{21} & a_{21}b_{12}+a_{22}b_{22}\end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$ .

Therefore, we define the matrix multiplication $AB$ of two $2\times 2$ matrices to be the same $2\times 2$ matrix as on the previous line. Specifically,

$AB=\left[\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right]\left[\begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array}\right]=\left[\begin{array}{cc} a_{11}b_{11}+a_{12}b_{21} & a_{11}b_{12}+a_{12}b_{22} \\ a_{21}b_{11}+a_{22}b_{21} & a_{21}b_{12}+a_{22}b_{22}\end{array}\right]$ .

Notice that if ${\bf b}_{1}=\left[\begin{array}{c} b_{11} \\ b_{21} \end{array}\right]$ and ${\bf b}_{2}=\left[\begin{array}{c} b_{12} \\ b_{22} \end{array}\right]$ are the columns of $B$ , then

$A{\bf b}_{1}=\left[\begin{array}{c} a_{11}b_{11}+a_{12}b_{21} \\ a_{21}b_{11}+a_{22}b_{21} \end{array}\right]$ and $A{\bf b}_{2}=\left[\begin{array}{c} a_{11}b_{12}+a_{12}b_{22} \\ a_{21}b_{12}+a_{22}b_{22} \end{array}\right]$ are the columns of $AB$ .

Thus,

$AB=\left[A{\bf b}_{1}\ \ A{\bf b}_{2}\right]$ .

This is again the pattern that we will see in the general case.

Matrix Multiplication for The General Case ${\Bbb R}^{n}\xrightarrow{\ \ T\ \ } {\Bbb R}^{m}\xrightarrow{\ \ S\ \ }{\Bbb R}^{\ell}$

The reasoning in the general case is probably more interesting and enlightening, though it does take intense concentration to fully understand.

Suppose $T:{\Bbb R}^{n}\longrightarrow {\Bbb R}^{m}$ and $S:{\Bbb R}^{m}\longrightarrow {\Bbb R}^{\ell}$ are linear transformations. Then there is a $m\times n$ matrix $B$ and a $\ell\times m$ matrix $A$ such that $T({\bf x})=B{\bf x}\in {\Bbb R}^{m}$ for all ${\bf x}\in {\Bbb R}^{n}$ and $S({\bf y})=A{\bf y}\in {\Bbb R}^{\ell}$ for all ${\bf y}\in {\Bbb R}^{m}$ .

Once again, we seek to define the matrix product $AB$ in such a way that $(S\circ T)({\bf x})=S(T({\bf x}))=(AB){\bf x}=A(B{\bf x})\in {\Bbb R}^{\ell}$ for all ${\bf x}\in {\Bbb R}^{n}$ . Doing so in this situation necessarily implies that the product $AB$ must be a $\ell\times n$ matrix.

Suppose $B=\left[{\bf b}_{1}\ \ {\bf b}_{2}\ \cdots\ {\bf b}_{n}\right]$ , where each ${\bf b}_{i}\in {\Bbb R}^{m}$ , and suppose ${\bf x}=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]\in {\Bbb R}^{n}$ .

We already know that $B{\bf x}=x_{1}{\bf b}_{1}+x_{2}{\bf b}_{2}+\cdots+x_{n}{\bf b}_{n}$ .

Since $S$ is linear and $S({\bf y})=A{\bf y}$ , we can also say:

$A(B{\bf x})=A(x_{1}{\bf b}_{1}+x_{2}{\bf b}_{2}+\cdots+x_{n}{\bf b}_{n})=x_{1}(A{\bf b}_{1})+x_{2}(A{\bf b}_{2})+\cdots+x_{n}(A{\bf b}_{n}).$

But this is a linear combination of the $\ell$ -dimensional vectors $A{\bf b}_{1}, A{\bf b}_{2},\ldots, A{\bf b}_{n}$ with the entries of ${\bf x}$ as the weights!

In other words,

$A(B{\bf x})=\left[A{\bf b}_{1}\ \ A{\bf b}_{2}\ \ \cdots\ \ A{\bf b}_{n}\right]{\bf x}$ .

If this is to equal $(AB){\bf x}$ , we should define the matrix product $AB$ to be the following $\ell\times n$ matrix:

$AB=\left[A{\bf b}_{1}\ \ A{\bf b}_{2}\ \ \cdots\ \ A{\bf b}_{n}\right]$ .

This is indeed what we do.

Alternative Approach with Dot Products

Of course, this means each column of $AB$ is a linear combination of the columns of $A$ with entries from the appropriate column of $B$ as the corresponding weights.

To be more precise, let $A=\left[{\bf a}_{1}\ \ {\bf a}_{2}\ \cdots\ {\bf a}_{m}\right]$ , where each column is a vector in ${\Bbb R}^{\ell}$ . Furthermore, let ${\bf b}_{j}=\left[\begin{array}{c} b_{1j} \\ b_{2j} \\ \vdots \\ b_{mj}\end{array}\right]$ be the $j^{th}$ column of $B$ , for $1\leq j\leq n$ . Then the $j^{th}$ column of $AB$ is

$A{\bf b}_{j}=b_{1j}{\bf a}_{1}+b_{2j}{\bf a}_{2}+\cdots+b_{mj}{\bf a}_{m}\in {\Bbb R}^{\ell}$ .

Let $c_{ij}$ be the entry in the $i^{th}$ row and $j^{th}$ column of the product $AB$ (with $1\leq i\leq \ell$ and $1\leq j\leq n$ ). Furthermore, for $1\leq i\leq m$ , let ${\bf a}_{i}=\left[\begin{array}{c} a_{1i} \\ a_{2i} \\ \vdots \\ a_{\ell i}\end{array}\right]$ be the $i^{th}$ column of $A$ .

What we have done also shows that, for $1\leq i\leq \ell$ and $1\leq j\leq n$ ,

$c_{ij}=a_{i1}b_{1j}+a_{i2}b_{2j}+\cdots+a_{im}b_{mj}=\displaystyle\sum_{k=1}^{m}a_{ik}b_{kj}$ .

It is very much worth noticing that $c_{ij}$ is a dot product. It is a dot product of the $j^{th}$ column ${\bf b}_{j}$ of $B$ with the $i^{th}$ row of $A$ (which we can certainly think of as a vector).

This description of the entries of $AB$ as dot products is actually a quicker way to find $AB$ . It is less meaningful as a definition, however.

Our approach with linear transformations gets at the heart of the true meaning of the matrix multiplication $AB$ : it should be directly related to composite linear transformations.

The Dimensions of the Matrix Factors of the Product

It is important to note that the product $AB$ of an $\ell\times m$ matrix $A$ (on the left) times an $m\times n$ matrix $B$ (on the right) can be done, and that the answer is a $\ell\times n$ matrix.

That is, the number of columns of $A$ must match the number of rows of $B$ . Furthermore, the final answer has the same number of rows as $A$ and the same number of columns as $B$ .

Because of this, you should also realize that just because the product $AB$ can be computed, this does not mean the reverse product $BA$ can be computed.

When $A$ is $\ell\times m$ and $B$ is $m\times n$ , the reverse product $BA$ can be computed if and only if $\ell=n$ . However, even in this situation, it is usually the case that $AB\not=BA$ . This reflects the fact that function composition is not commutative in general either.

The Associative Property of Matrix Multiplication

Before considering examples, it is worth emphasizing that matrix multiplication satisfies the associative property. This reflects the fact that function composition is associative.

Suppose $T:{\Bbb R}^{n}\longrightarrow {\Bbb R}^{m}$ , $S:{\Bbb R}^{m}\longrightarrow {\Bbb R}^{\ell}$ , and $R:{\Bbb R}^{\ell}\longrightarrow {\Bbb R}^{k}$ are all linear transformations. Then the equation $(R\circ S)\circ T=R\circ (S\circ T)$ is easy to verify.

Given an arbitrary ${\bf x}\in {\Bbb R}^{n}$ , we have

$((R\circ S)\circ T){\bf x}=(R\circ S)(T({\bf x}))=R(S(T({\bf x})))$ $=R((S\circ T)({\bf x}))=(R\circ (S\circ T))({\bf x}).$

If $T({\bf x})=C{\bf x}$ , $S({\bf y})=B{\bf y}$ , and $R({\bf z})=A{\bf z}$ for appropriately sized matrices and variable vectors, then the equation $(AB)C=A(BC)$ will clearly hold as well.

The associative property generalizes to any matrix product with a finite number of factors. It also implies that can write such products without using parentheses.

Example 1

We will first consider a purely computational high-dimensional example. After this point, we will consider low-dimensional examples that can be visualized.

Define $T:{\Bbb R}^{5}\longrightarrow {\Bbb R}^{3}$ and $S:{\Bbb R}^{3}\longrightarrow {\Bbb R}^{4}$ by the following formulas.

$T({\bf x})=B{\bf x}=\left[\begin{array}{ccccc} 1 & -3 & 1 & 0 & 5 \\ 2 & 2 & -3 & -4 & 1 \\ -3 & 4 & 5 & 1 & 0 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{array}\right]=\left[\begin{array}{c} x_{1}-3x_{2}+x_{3}+5x_{5} \\ 2x_{1}+2x_{2}-3x_{3}-4x_{4}+x_{5} \\ -3x_{1}+4x_{2}+5x_{3}+x_{4} \end{array}\right]$

and

$S({\bf y})=A{\bf y}=\left[\begin{array}{ccc} 1 & -4 & 6 \\ 3 & 2 & -1 \\ 0 & 1 & -5 \\ 4 & 2 & 2 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \end{array}\right]=\left[\begin{array}{c} y_{1}-4y_{2}+6y_{3} \\ 3y_{1}+2y_{2}-y_{3} \\ y_{2}-5y_{3} \\\ 4y_{1}+2y_{2}+2y_{3} \end{array}\right]$

Then $S\circ T:{\Bbb R}^{5}\longrightarrow {\Bbb R}^{4}$ is defined by $(S\circ T)({\bf x})=S(T({\bf x}))=(AB){\bf x}$ , where $AB=A\left[{\bf b}_{1}\ {\bf b}_{2}\ {\bf b}_{3}\ {\bf b}_{4}\ {\bf b}_{5}\right]=\left[A{\bf b}_{1}\ A{\bf b}_{2}\ A{\bf b}_{3}\ A{\bf b}_{4}\ A{\bf b}_{5}\right]$ .

Now compute each column individually. We will do the first column.

$A{\bf b}_{1}=\left[\begin{array}{ccc} 1 & -4 & 6 \\ 3 & 2 & -1 \\ 0 & 1 & -5 \\ 4 & 2 & 2 \end{array}\right]\left[\begin{array}{c} 1 \\ 2 \\ -3\end{array}\right]$ $=1\left[\begin{array}{c} 1 \\ 3 \\ 0 \\ 4 \end{array}\right]+2\left[\begin{array}{c} -4 \\ 2 \\ 1 \\ 2 \end{array}\right]+(-3)\left[\begin{array}{c} 6 \\ -1 \\ -5 \\ 2 \end{array}\right]=\left[\begin{array}{c} -25 \\ 10 \\ 17 \\ 2\end{array}\right]$

etcetera.

You should definitely take the time to check that the final matrix product is

$AB=\left[\begin{array}{ccccc} -25 & 13 & 43 & 22 & 1 \\ 10 & -9 & -8 & -9 & 17 \\ 17 & -18 & -28 & -9 & 1 \\ 2 & 0 & 8 & -6 & 22 \end{array}\right]$ .

We can also double-check this by using the dot product as described above.

For example, consider $c_{32}=-18$ , the entry in the third row and second column of the product $AB$ . The third row of $A$ is $[0\ \ 1\ -5]$ , while the second column of $B$ is $\left[\begin{array}{c} -3 \\ 2 \\ 4 \end{array}\right]$ . The dot product of these two vectors is $0\cdot (-3)+1\cdot 2+(-5)\cdot 4=0+2-20=-18=c_{32}$ .

Therefore, the final formula for the composite linear transformation is

$(S\circ T)({\bf x})=(AB){\bf x}=\left[\begin{array}{ccccc} -25 & 13 & 43 & 22 & 1 \\ 10 & -9 & -8 & -9 & 17 \\ 17 & -18 & -28 & -9 & 1 \\ 2 & 0 & 8 & -6 & 22 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{array}\right]$

$=\left[\begin{array}{c} -25x_{1}+13x_{2}+43x_{3}+22x_{4}+x_{5} \\ 10x_{1}-9x_{2}-8x_{3}-9x_{4}+17x_{5} \\ 17x_{1}-18x_{2}-28x_{3}-9x_{4}+x_{5} \\ 2x_{1}+8x_{3}-6x_{4}+22x_{5}\end{array}\right]$ .

Of course, now that we know the matrix for the composition $S\circ T:{\Bbb R}^{5}\longrightarrow {\Bbb R}^{4}$ , we can use elementary row operations to, for example, determine its kernel (null space).

As an exercise, you should take the time to show that the augmented matrix for the system $(AB){\bf x}={\bf 0}$ can be reduced to the shown reduced row echelon form (RREF).

$\left[\begin{array}{cccccc} -25 & 13 & 43 & 22 & 1 & 0 \\ 10 & -9 & -8 & -9 & 17 & 0 \\ 17 & -18 & -28 & -9 & 1 & 0 \\ 2 & 0 & 8 & -6 & 22 & 0\end{array}\right]\xrightarrow{\ \ \ \ RREF\ \ \ \ }\left[\begin{array}{cccccc} 1 & 0 & 0 & -\frac{23}{13} & \frac{43}{13} & 0 \\ 0 & 1 & 0 & -\frac{9}{13} & \frac{1}{13} & 0 \\ 0 & 0 & 1 & -\frac{4}{13} & \frac{25}{13} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0\end{array}\right]$

Therefore $x_{4}$ and $x_{5}$ are free variables and the kernel (null space) is a two-dimensional plane through the origin in five-dimensional space.

$\mbox{Ker}(S\circ T)=\mbox{Nul}(AB)=\left\{x_{4}\left[\begin{array}{c} \frac{23}{13} \\ \frac{9}{13} \\ \frac{4}{13} \\ 1 \\ 0 \end{array}\right]+x_{5}\left[\begin{array}{c} -\frac{43}{13} \\ -\frac{1}{13} \\ -\frac{25}{13} \\ 0 \\ 1 \end{array}\right]\ \biggr|\ x_{4},x_{5}\in {\Bbb R}\right\}\subseteq {\Bbb R}^{5}.$

Example 2

The next two examples are two-dimensional so that we can easily visualize them. For the purpose of reviewing Section 1.4, “Linear Transformations in Two Dimensions”, we emphasize visualizing a linear transformation both as a mapping and as a vector field.

What happens when we compose a rotation and a shear, for example?

We explore that now. We will use a different set of letters for the transformation names to avoid confusion with their geometric effects.

Suppose $S:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ and $R:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ be defined by

$S({\bf x})=B{\bf x}=\left[\begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$

and

$R({\bf y})=A{\bf y}=\left[\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]$ .

Then $S$ is the shear transformation discussed in Section 1.4, “Linear Transformations in Two Dimensions” and $R$ is a rotation transformation counterclockwise around the origin by $90^{\circ}$ .

The matrix product $AB$ will define the formula for the composite transformation $R\circ S:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ . On the other hand, the matrix product $BA$ will define the formula for the composite transformation $S\circ R:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ .

Let’s compute formulas for both functions. We will use the equivalent dot product formulation of matrix multiplication.

$(R\circ S)({\bf x})=R(S({\bf x}))=(AB){\bf x}=\left[\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$ $=\left[\begin{array}{cc} 0\cdot 1+(-1)\cdot 0 & 0\cdot 1 + (-1)\cdot 1 \\ 1\cdot 1+0\cdot 0 & 1\cdot 1+0\cdot 1 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]=\left[\begin{array}{cc} 0 & -1 \\ 1 & 1 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$

$=\left[\begin{array}{c} -x_{2} \\ x_{1}+x_{2} \end{array}\right]$ .

For the reverse composition, to be consistent with what we have done so far, we use y‘s for the variable names. Don’t let this bother you. We could have used x‘s.

$(S\circ R)({\bf y})=S(R({\bf y}))=(BA){\bf y}=\left[\begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array}\right]\left[\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]$ $=\left[\begin{array}{cc} 1\cdot 0+1\cdot 1 & 1\cdot (-1) + 1\cdot 0 \\ 0\cdot 0+1\cdot 1 & 0\cdot (-1)+1\cdot 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]=\left[\begin{array}{cc} 1 & -1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]$

$=\left[\begin{array}{c} y_{1}-y_{2} \\ y_{1} \end{array}\right]$ .

These two transformations are indeed different. That is, $R\circ S\not=S\circ R$ . This should be expected. Function composition is not commutative.

Matrix multiplication is also not commutative, even when both $AB$ and $BA$ can be computed.

Visualizing the Compositions as Mappings

Can the geometric effects of these compositions be easily visualized? Yes, but they are not “simple” visualizations. Essentially it is best to look at their geometric effects in the order that they occur.

Here is a visual that shows how these two different compositions each affect the butterfly curve from Section 1.4, “Linear Transformations in Two Dimensions”. In each case, the beginning (red) butterfly represents a set of points $A$ that will be transformed under the composite mapping to the ending (blue) butterfly. The ending butterfly is the image of the composite map $R\circ S$ (on the left) and $R\circ S$ (on the right), respectively.

Matrix multiplication can be used to find the composition of two linear transformations (a rotation and a shear) as mappings. — *The two composite linear transformations from Example 2 as mappings. These two composite mappings are not equal, and the images of the butterfly curve are different.*

Visualizing the Compositions as Vector Fields

Recall that a vector field representation of a linear transformation ${\Bbb R}^{2}\longrightarrow {\Bbb R}^{2}$ takes each point $(x,y)$ as input and “attaches” the output vector to this point as an arrow.

As static pictures, these vector fields are shown below.

Matrix multiplication can be used to find the composition of two linear transformations (a rotation and a shear) as vector fields. — *The two composite linear transformations from Example 2 as vector fields.*

We can animate this as well to see the connection with the mapping view of the linear transformation. We can see the “shearing” and “rotating” in each case if we imagine transforming each vector in the vector field from the “identity” vector field $I({\bf x})={\bf x}$ .

In each case, the starting (identity) vector field is red, the intermediate vector field is purple, and the final vector field is blue.

The composition of two linear transformations (a rotation and a shear) as transformed vector fields. — The two composite linear transformations from Example 2 as vector fields obtained from transforming the identity vector field. For the vector field on the left, the actions are to shear first, then rotate. For the one on the right, rotate first, then shear.

Example 3

For our final example, we consider the composition of two reflections. One transformation will be a reflection across the horizontal axis. The other transformation will be a reflection across the $45^{\circ}$ diagonal line through the origin.

We will call these reflections $R_{h}$ and $R_{d}$ (“h” for horizontal and “d” for diagonal). We will also write $R_{h}({\bf x})=H{\bf x}$ and $R_{d}({\bf y})=D{\bf y}$ for some $2\times 2$ matrices $H$ and $D$ and ${\bf x},{\bf y}\in {\Bbb R}^{2}$ .

The formula for $R_{d}\circ R_{h}$ based on the matrix product $DH$ is this:

$(R_{d}\circ R_{h})({\bf x})=(DH){\bf x}=\left[\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]$

$=\left[\begin{array}{cc} 0\cdot 1+1\cdot 0 & 0\cdot 0+1\cdot (-1) \\ 1\cdot 1+0\cdot 0 & 1\cdot 0+0\cdot (-1) \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]=\left[\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]=\left[\begin{array}{c} -x_{2} \\ x_{1} \end{array}\right]$ .

And the formula for $R_{h}\circ R_{d}$ based on the matrix product $HD$ is this:

$(R_{h}\circ R_{d})({\bf y})=(HD){\bf y}=\left[\begin{array}{cc} 1 & 0 \\ 0 & -1 \end{array}\right]\left[\begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]$

$=\left[\begin{array}{cc} 1\cdot 0+0\cdot 1 & 1\cdot 1+0\cdot 0 \\ 0\cdot 0+(-1)\cdot 1 & 0\cdot 1+-1\cdot 0 \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \end{array}\right]=\left[\begin{array}{cc} 0 & 1 \\ -1 & 0 \end{array}\right]\left[\begin{array}{c} y_{1} \\ y_{2} \end{array}\right]=\left[\begin{array}{c} y_{2} \\ -y_{1} \end{array}\right]$ .

Visualizations

Let ${\bf z}=\left[\begin{array}{c} z_{1} \\ z_{2} \end{array}\right]\in {\Bbb R}^{2}$ be arbitrary. Note that $(R_{h}\circ R_{d})({\bf z})\not=(R_{d}\circ R_{h})({\bf z})$ . On the other hand, we can say that $(R_{h}\circ R_{d})({\bf z})=-(R_{d}\circ R_{h})({\bf z})$ . This is a very special property.

We should see this reflected in our animations (pun intended).

Matrix multiplication can be used to find the composition of two linear transformations (two reflections) as mappings. — *The two composite linear transformations from Example 3 as mappings. These two composite mappings are not equal, and the images of the butterfly curve are different.* However, the final (blue) images are $180^{\circ}$ rotations of each other (reflections through the origin) since $(R_{h}\circ R_{d})({\bf z})=-(R_{d}\circ R_{h})({\bf z})$ for all ${\bf z}\in {\Bbb R}^{2}$ .

Here are the static vector fields.

The composition of two linear transformations (two reflections) as vector fields. — *The two composite linear transformations from Example 3 as vector fields.* Notice that these seem to be equivalent to rotations (by $90^{\circ}$ counterclockwise on the left and $90^{\circ}$ clockwise on the right). This is indeed the case. Notice that the vector fields are ‘opposites’ of each other.

*The two composite linear transformations from Example 3 as vector fields.* Notice that these seem to be equivalent to rotations (by $90^{\circ}$ counterclockwise on the left and $90^{\circ}$ clockwise on the right). This is indeed the case. Notice that the vector fields are ‘opposites’ of each other.

And here are the animations of the identity vector field being transformed into each of these vector fields. Notice how the starting vectors (red) are getting reflected to the intermediate state (purple) before going on to the final state (blue).

The composition of two linear transformations (two reflections) as transformed vector fields. — The two composite linear transformations from Example 3 as vector fields obtained from transforming the identity vector field. For the vector field on the left, reflect horizontally first, then diagonally. Do those actions in the opposite order for the vector field on the right.

Exercises

Let $f(x)=3x^{2}-2x+4$ and $g(x)=2x^{3}-5$ . Find and simplify formulas for $f\circ g$ and $g\circ f$ . Note that the results are different functions.
Let $f(x)=\frac{3x-5}{2x+7}$ and $g(x)=\frac{5x+4}{x-3}$ . (a) Find and simplify formulas for $f\circ g$ and $g\circ f$ . Note that the results are different functions. Note however, in this situation, that both (simplified) functions are of the “same type” as the originals. When the inputs are allowed to be complex numbers, such functions are called linear fractional transformations. As mappings of the complex plane, they have a number of beautiful properties that are worth investigating.
Let $A=\left[\begin{array}{cc} 1 & 3 \\ -2 & 4 \\ 3 & 0 \\ 1 & -5 \\ -2 & -3 \end{array}\right]$ and $B=\left[\begin{array}{ccc} 2 & 3 & -4 \\ -1 & -6 & 0 \end{array}\right]$ . (a) Find the product $AB$ . (b) Let $T:{\Bbb R}^{3}\longrightarrow {\Bbb R}^{2}$ and $S:{\Bbb R}^{2}\longrightarrow {\Bbb R}^{5}$ be defined by $T({\bf x})=B{\bf x}$ and $S({\bf y})=A{\bf y}$ , for ${\bf x}\in {\Bbb R}^{3}$ and ${\bf y}\in {\Bbb R}^{2}$ . Find a simplified formula for $S\circ T$ . (c) Find the kernel of $S\circ T$ .

Video for Section 1.8

Here is a video overview of the content of this section.