Visual Linear Algebra Online, Section 1.7
Many practical situations that require the use of linear algebra can involve hundreds or even thousands of unknowns. Modeling a national economy, optimizing the purchase of supplies, and managing flight schedules are prominent examples.
A vector quantity representing all the data in such a situation might therefore have hundreds or thousands of components. This is a difficult-to-imagine, but common, situation in high-dimensional linear algebra. How in the world can we visualize such a vector? Is it worth even trying to visualize it in some way?
A Least Squares Problem
A data-fitting problem might just have two variables and be easy to visualize at the start, but it could have many equations. This can lead to visualization difficulties as we try to conceptualize the best solution of the problem.
For example, suppose we want to find the “line of best fit” for the five data points , and .
These points, along with the line of best fit, are shown in the figure below. The line of best fit has equation . The vertical lines represent the “errors” in using the line to approximate the -coordinates of the data values at the given -values.
The most common way to find this line is called the Method of Least Squares. Assume that the line has equation for some constants and . The goal is to choose the values of and to minimize the sum of the squares of the errors between the -coordinates of the data and the corresponding -coordinates along the line. In the picture above, the sum of the squares of the lengths of the brown lines is what is minimized by the location (defined by the slope and intercept) of the blue line.
This problem can be solved with calculus. It can also be solved with linear algebra.
The Corresponding Overdetermined System
Pretend, at the outset, that the graph of the line goes exactly through all five points. Then errors would be zero. In fact, this would be equivalent to the following linear system of 5 equations and 2 unknowns having a unique solution.
But this system has no solution. It is said to be overdetermined. This is typical when there are more equations than unknowns. If we graph these five lines in an -plane, there is no point of common intersection.
The graphs of these five lines is shown below. The black point has coordinates equal to the slope and intercept of the line of best fit above. In the new picture below, this point somehow represents the point that is “closest” to being the common intersection of the five lines. It might be thought of as some kind of “average” of the red points at the intersections of pairs of these lines. Note that the point is the intersection of more than one pair of these lines.
The Corresponding Linear Transformation
Here again is our overdetermined system.
We can think of the left-hand sides of these equations as defining a linear transformation. Let represent five-dimensional space. This cannot be visualized, but it can be conceptualized as the collection of all five-dimensional column vectors ( real matrices). Define by the following formula, for .
The expression on the right side of the preceding line is a linear combination of two five-dimensional vectors. By what we have emphasized in both Section 1.5, “Matrices and Linear Transformations in Low Dimensions” and Section 1.6, “Linear Algebra in Three Dimensions”, we can write the linear transformation as a matrix multiplication . The two five-dimensional vectors are the columns of . The components of , the scalars and , are the corresponding weights.
In other words, we can write the following equation.
The Image of T (Column Space of A)
Since the columns of are not scalar multiples of each other, the image of (the column space of ) will be a two-dimensional plane through the origin sitting inside five-dimensional space. Of course, this is impossible to visualize, other than by analogy of a two-dimensional plane sitting inside three-dimensional space. But is clearly not an onto function.
In particular, the five-dimensional vector representing the right-hand sides of the original system is not an element of .
But we should not expect it to “typically” be the case that such a vector will be in the image. If we choose other numbers, at random, for the -coordinates of our original data points, the resulting vector is unlikely to be in the two-dimensional plane sitting inside five-dimensional space. After all, five-dimensional space is intuitively a lot “bigger” than any two-dimensional “subspace”.
On the other hand, the vector is an element of . In fact, it is the unique element of this two-dimensional subspace of that is closest to the vector .
This means that, for all possible values of and that could have been chosen and all possible corresponding inputs of , the magnitude is smallest when . But such a magnitude, as we will see below, is a square root of a sum of squares. This is why this approach is called the Method of Least Squares.
But how is the two-dimensional vector found? That is a topic for another section.
Visualizing the Least Squares Solution
We can produce a visualization by analogy with the picture below. Pretend that the three-dimensional space shown is actually five-dimensional space. The plane shown represents the two-dimensional subspace , while represents the five-dimensional vector. The output is the closest point in the subspace , to .
Notice the right angle in this picture. Evidently, the notion of an angle can be generalized to higher dimensions, though it can only be visualized by analogy.
n-Dimensional Space
We see that high-dimensional linear algebra, which we can only visualize by analogy, can evidently still be useful. Let us therefore commence with a discussion of -dimensional space, where is any positive integer, no matter how large.
Definitions
An -dimensional vector is a matrix with real number entries. We write it as shown below. The set consists of all possible -dimensional vectors. The symbol means and is indeed the additive inverse of .
Of course, as always, vector addition and scalar multiplication is done component-wise. All scalars are assumed to be real numbers.
and
Properties
These definitions allow us to prove all the properties in the theorem below, based on the corresponding properties of real numbers. The proof is easy, though it is also tedious. The zero vector has all components equal to zero.
Theorem 1.7.1: The following algebra statements are true.
- (Closure under Vector Addition) for all .
- (Commutativity) for all .
- (Associativity of Vector Addition) for all .
- (Zero is an Additive Identity) for all .
- (Additive Inverses) for all .
- (Closure under Scalar Multiplication) for all and for all .
- (Distributivity of Scalars over Vector Addition) for all and for all .
- (Distributivity of Vectors over Scalar Addition) for all and for all .
- (Associativity of Scalar Multiplication) for all and for all .
- (Multiplication by the Scalar 1) for all .
The Dot Product and Angle Between Two Vectors
In high-dimensional linear algebra, the dot product can be defined in a way that is analogous to the two-dimensional case. The following formula is the definition. It is a sum of products of corresponding components.
Magnitude and Properties
The magnitude (length) of an -dimensional vector is then defined to be . Perhaps it is better to always call this a “magnitude” rather than a “length”, since it is impossible to imagine literally “measuring” its length in -dimensions. However, we will not shy away from using either term.
It is easy (but tedious) to prove the first four algebraic properties of dot products in -dimensions listed below. The fifth one is a bit harder.
Theorem 1.7.2: The following properties are true for given arbitrary vectors in and scalars in .
- (Commutative Property) .
- (Distributive Property) .
- (Associative Property) .
- (Non-negativity) . In addition, if and only if .
- (Triangle Inequality) .
Angle Between Two Vectors
Recall that in Section 1.2, “Vectors in Two Dimensions” and Section 1.6, “Linear Algebra in Three Dimensions“, we saw that the dot product of two vectors is intimately tied to an angle between the two vectors. In particular, we saw that and therefore .
But how do we define an angle between vectors in high-dimensional linear algebra?
Here is a bright idea. Why not use the expression to do the defining for us?
This is indeed the best thing to do.
There is an issue to clear up, however. Recall that the arccosine (inverse cosine) function is only defined for inputs that are in the interval on the real number line. Therefore, to use the expressions in the previous paragraph to define the angle, we must make sure that .
Cauchy-Schwarz Inequality
This last inequality is indeed true for -dimensional vectors in high-dimensional linear algebra.
In fact, it is important enough to have a name. It is called the Cauchy-Schwarz inequality.
Theorem 1.7.3 (Cauchy-Schwarz inequality): Let . Then . In terms of their components and , this is equivalent to .
The proof of this is quite tricky. I made a video where I gave the idea of the proof. Your teacher may have you do work to make this into a true proof.
It is still an open question whether it is a good idea to define the angle between two vectors in high-dimensional linear algebra in this way.
Suffice it to say that it is a good idea. One thing this justifies is our drawing of the right angle when we visualized the least-squares solution as it related to five-dimensional vectors above. This picture emphasizes that, even in five-dimensional space, the shortest distance between a point and a plane is along a line segment perpendicular to the plane.
Linear Transformations and Matrices
Let and be positive integers. If is a linear transformation, then there will be an matrix that will define via matrix multiplication of times a vector . Conversely, any matrix multiplication like this defines a linear transformation.
The definition of a linear transformation in this context still has the same form as before. A function is a linear transformation if for all and for all .
Suppose there are constants , for and , such that the following formula holds true.
Then
This is a linear combination of (-dimensional) vectors with corresponding weights . If we create an matrix with the various as its entries, then this is what the product means.
Symbolically,
Another way to think of this is as follows. Let be the columns of . In fact, we will even write . Then .
When Will Be Onto? When Will It Be One-to-One?
Under what conditions will a linear transformation with matrix representative be onto and/or one-to-one?
We will explore this more deeply in Chapter 2 of Visual Linear Algebra Online. At the moment, we just make a few intuitive “probabilistic” statements. Think of these statements as reflecting what happens when we choose the entries of at random.
These statements are related to the dimensions of the domain and codomain. They are also related to the fact that linear transformations are relatively “simple”. Nothing too strange happens when we look at the mapping properties of linear transformations.
If , then maps a higher-dimensional space to a lower-dimensional space. It has a “collapsing” nature to it. It is most definitely not one-to-one. On the other hand, it is likely to be onto, though this is not guaranteed.
If , then maps a lower-dimensional space to a higher-dimensional space. It is most definitely not onto. On the other hand, it is likely to be one-to-one, though this is not guaranteed.
Finally, if , then the domain and codomain of have the same dimension. It turns out that is likely to be both one-to-one and onto in this situation, though neither property is guaranteed. However, it does turn out that these two properties are actually equivalent in this situation. This should be somewhat surprising! They are certainly not equivalent for general functions. There must be something special about linear transformations!
Row Operations and Reduced Row Echelon Form
For any particular example, the question of whether the mapping is one-to-one and/or onto can be answered using elementary row operations.
Let’s start by considering the line of best fit example from above. Recall that the formula for is
To confirm that is not onto, it suffices to find just one vector in five-dimensional space that is not in the image . Practically any vector in will do, including our vector from the right-hand side of the original overdetermined system .
Elementary Row Operations on the Augmented Matrix
Elementary row operations on the augmented matrix are shown below.
Next:
This is enough to confirm that . The reason is because the third and fourth rows of the last matrix are equivalent to the statements . This is a false statement and therefore a contradiction to the assumption that there is a solution. The original system is indeed inconsistent and is not onto.
Continued Reduction to Reduced Row Echelon Form (RREF)
Though it is not required to confirm that is not onto, let us continue “reducing” this matrix with row operations. This will provide practice with row operations towards the ultimate goal of seeing the reduced row echelon form (RREF) of the matrix.
The first four rows of the final matrix above have leading entries (they are also “leading ones”). These are the leftmost nonzero entries of those rows. To continue the process toward reduced row echelon form, we need to make sure the leading entry in each row is to the right of the leading entry of the row above it.
This is not true for the leading entry of the fourth row. Therefore, we “eliminate” it by a replacement elementary row operation.
This new matrix now satisfies the condition that the leading entry in each row is to the right of the leading entry of the row above it. This matrix is now in row echelon form (REF).
To proceed to reduced row echelon form (RREF), we now do replacement operations to eliminate the entries above the leading ones, moving from right to left. Here are the steps.
Because this corresponds to an augmented matrix for the original system involving two variables and , the resulting equivalent system is
As we saw before, the last of these equations is a contradiction to the assumption that there is a solution. Therefore, there is no solution. The system is inconsistent (and no, and are not both zero).
Description of Reduced Row Echelon Form (RREF)
Here are descriptions of both row echelon form (REF) and reduced row echelon form. These apply to any matrix, augmented or not.
Definition 1.7.1: A matrix is in row echelon form (REF) if
- All nonzero rows (rows with at least one nonzero entry) are above any rows of all zeros.
- Each leading entry of a row is in a column to the right of the leading entry of the row above it. Therefore, all entries in a column below a leading entry are zeros.
Definition 1.7.2: A matrix is in reduced row echelon form (RREF) if conditions (1) and (2) from Definition 1.7.1 hold, and if
- The leading entry in each nonzero row is 1.
- Each leading 1 is the only nonzero entry in its column.
A matrix is equivalent to exactly one RREF form obtained from row operations. This is a theorem, but we will not prove it.
Theorem 1.7.3: Every matrix is row equivalent to a unique RREF matrix.
Is T one-to-one? Determining the Kernel (Null Space) is Sufficient to Decide
To show that is one-to-one, we must prove that implies that .
Certainly if is one-to-one, the kernel of (null space of its matrix ) will be just the zero vector (as an element of the domain). By definition, . But the fact that is one-to-one implies nothing else can get mapped to the zero vector in the codomain.
What is more surprising is the following fact, which we prove using the definition of linearity found in Section 1.4, “Linear Transformations in Two Dimensions”. The proof is quite short once you see the trick.
Theorem 1.7.4: Let be a linear transformation. If , then is one-to-one.
Proof:
Suppose . Then . But is a linear transformation. Therefore, . Now the fact that is one-to-one allows us to conclude that . Therefore, .
Q.E.D.
Combining Theorem 1.7.4 with the discussion two paragraphs above it gives the following biconditional (if and only if) theorem.
Theorem 1.7.5: Let be a linear transformation. Then if and only if is one-to-one.
Row Operations on the Corresponding Homogeneous System
Now consider the homogeneous system corresponding to the data-fitting example linear transformation from above.
The corresponding augmented matrix and resulting row operations begin as follows.
This quickly leads to the unique equivalent RREF matrix.
The top two rows of this matrix correspond to the equations and . Furthermore, there is no contradiction that occurs. The homogeneous system has a unique solution. The kernel (null space) has just one element .
By Theorem 1.7.4, this implies that is one-to-one.
Another Example
Let’s consider a second example in high-dimensional linear algebra where the matrix of the linear transformation is the transpose of the preceding example, where we swap the rows and columns. This has nothing to do with the original data-fitting problem.
Define by the formula
Of course, this gives
T is not one-to-one
Such a function is definitely not one-to-one. By Theorem 1.7.5, this can be confirmed by computing the kernel (null space) and seeing that it contains nonzero vectors. Here are elementary row operations on the augmented matrix for the corresponding homogeneous system, with a last column of zeros, to RREF.
Next,
Finally,
This is a matrix in RREF form (check the conditions described above). It corresponds to the following system, which is equivalent to the original homogeneous system.
From this, we see that , , and are free variables, while and are basic. We can therefore describe the kernel of in parametric vector form as follows.
As we will see in Chapter 2, the three vectors , , and in are “linearly independent“. These three vectors will not all be in the same two-dimensional plane in .
Instead, this means that is a three-dimensional “plane” through the origin in five-dimensional space.
And, as already mentioned, this is enough to conclude that is not one-to-one.
As an interesting aside, note that, by using the dot product, you can check that these three vectors are also perpendicular to the columns of first example:
This is one instance of a general theorem relating the column space of a matrix with the null space of its transpose.
T is onto
It is likely that is onto. In other words, the image is all of .
How can this be confirmed? By showing that an arbitrary non-homogeneous (inhomogeneous) system can be solved.
Below are the row operations to RREF. Notice that they are the same row operations as we did above for the homogeneous system. First,
Next,
Finally,
This is a matrix in RREF form. It corresponds to the following system, which is equivalent to the original inhomogeneous system.
This shows that there are indeed solutions to the arbitrary inhomogeneous system so that is onto.
In fact, for fixed and , the parametric vector form of the solution set is just a “translation” of the three-dimensional kernel of inside five-dimensional space. This set no longer goes through the origin unless , but it is “parallel” to the kernel. This is no accident.
Summarizing the Examples in Relation to the Rank-Nullity Theorem
In our first, data-fitting related linear transformation with matrix
,
we saw that the kernel (null space) was “trivial” (which is zero-dimensional) and the image was two-dimensional. Note that , which is the dimension of the domain of , as well as the number of columns of .
In the second example, the linear transformation had matrix
.
Here we got a three-dimensional kernel and a two-dimensional image. The sum is the dimension of the domain of , as well as the number of columns of .
These facts are not coincidences. The Rank-Nullity Theorem (or just “Rank Theorem”) guarantees this will always be true.
Theorem 1.7.6 (Rank-Nullity Theorem): Suppose is a linear transformation with (standard) matrix representative . Then the dimension of the kernel of (null space of ) plus the dimension of the image of (column space of ) equals , the dimension of the domain of (the number of columns of ).
Exercises in High-Dimensional Linear Algebra
- Let and . (a) Find . (b) Find the angle between and with . Approximate your answer to the nearest tenth of a degree.
- (a) Use row operations to RREF on the augmented matrix for the following system to solve it. Write your answer in parametric vector form: . (b) Describe the kernel and image of the corresponding linear transformation . (c) Decide whether is one-to-one and/or onto. Defend your answers.
- Consider the data set . The (least squares) line of best fit turns out have formula . (a) Graph these points and this line as accurately as you can. (b) Find the sum of the squares of the errors in using this line to estimate the -coordinates of the data points based on their -coordinates. (c) If you assume that goes through the data points exactly, write down the corresponding system of five equations and two unknowns that would be satisfied. (d) Do row operations with the corresponding augmented matrix for the system in (c) to confirm that the system has no solution. (e) Describe the kernel and image of the linear transformation defined by the left-hand sides of the system of equations from part (c). (f) Compute the distance in that is minimized when and is the five-dimensional vector from the right-hand sides of the equations in part (c).
- Prove Theorem 1.7.1.
- Prove Theorem 1.7.2.
- A homogeneous system of three linear equations and five unknowns is solved using elementary row operations. It is found that there are three free variables that result in a kernel that is three-dimensional. Will the linear transformation corresponding to this system be an onto function? Explain why or why not.
Video for Section 1.7
Here is a video overview of the content of this section.