cgranade::streams: linear algebra

Showing posts with label linear algebra. Show all posts

Sunday, August 22, 2010

What is a matrix? (Part 2)

Now we have a new kind of mathematical toy to play with, the matrix. As I said in the previous post, the easiest way to get a sense of what a matrices do is to use them for a while. In this post, then, I just want to go over a couple useful examples.
Suppose you wish to make all vectors in ℝ² longer or shorter by some factor s ≠ 0. You can represent this by a function f(v) = sv. With a moment's work, we can verify that this is a linear function because of the distributive law. Thus, we can represent f by a matrix. To do so, remember that we calculate f for each element of a basis. For simplicity, we will use the elementary basis {x, y}. Then, f(x) = sx and f(y) = sy. By using coordinates, we can write this as f([1; 0]) = [s; 0] and f([0; 1]) = [0; s]. The matrix representation of f then becomes:

Note that if s = 1, the function f doesn't do anything. Representing f(v) = v as a matrix, we get the very special matrix called the identity matrix, written as I, 𝟙 or 𝕀:

The identity matrix has the property that for any matrix M, M𝟙 = 𝟙M = M, much like the number 1 acts.

Of course, there's no requirement that we stretch x and y by the same amount. The matrix [a 0; 0 b], for instance, stretches x by a and y by b. If one or both of a and b is negative, then we flip the direction of x or y, respectively, since -v is the vector of the same length as v but pointing in the opposite direction.

A more complicated example shows how matrices can "mix up" the different parts of a vector by rotating one into the other. Consider, for instance, a rotation of the 2D plane by some angle θ (counterclockwise, of course). This is more difficult to write down as a function, and so a picture may be useful:

By referencing this picture, we see that f(x) = cos θ x + sin θ y, while f(y) = - sin θ x + cos θ y. Thus, we can obtain the famous rotation matrix:

As a sanity check, note that if θ = 0, then R_θ = 𝟙, as we would expect for a matrix that "does nothing."
One very important note that needs to be made about matrices is that multiplication of matrices is not always (or even often) commutative. To see this we let the matrix S swap the roles of x and y; that is, S = [0 1; 1 0]. Then, consider A = SR_θ and B = S. Since applying S twice does nothing (that is, S² = 𝟙), we have that BA = R_θ. On the other hand, if we calculate AB = SR_θS, we find that AB = R_-θ:

(Sorry for the formatting problems with that equation.)We conclude that AB ≠ BA unless sin θ = 0, neatly demonstrating that not all the typical rules of multiplication carry over to matrices.

I'll leave it here for now, but hopefully seeing a few useful matrices makes them seem less mysterious. Until next time!

Saturday, August 21, 2010

What is a matrix? (Part 1)

Functions are an important tool in mathematics, and are used to represent many different kinds of processes in nature. Like so many mathematical objects, however, functions can be difficult to use without making some simplifying assumptions. One particularly nice assumption that we will often make is that a function is linear in its arguments:

$f(ax+by)=af(x)+bf(y)$

One can think of a linear function as one that leaves addition and scalar multiplication alone. To see where the name comes from, let's look at a few properties of a linear function f:

$f(x)=f(x+0-0)=f(x+0)-f(0)=f(x)-f(0)$

This implies that f(0) = 0 for any linear function. Next, suppose that f(x) = 1 for some x. Then:

$f(y)=f(yx/x)=(y/x)f(x)=y/x$

This means that if f represents a line passing through 0 having slope m = 1 / x.

So what does all this have to do with matrices? Suppose we have a linear function which takes vectors as inputs. (To avoid formatting problems, I'll write vectors as lowercase letters that are italicized and underlined when they appear in text, such as v.) In particular, let's consider a vector v in ℝ². If we use the {x, y} basis discussed last time, then we can write v = ax + by. Now, suppose we have a linear function f : ℝ² → ℝ² (that means that takes ℝ² vectors as inputs and produces ℝ² vectors as output). We can use the linear property to specify how f acts on any arbitrary vector by just specifying a few values:

$f(\vec{v})=f(a\hat{x}+b\hat{y})=af(\hat{x})+bf(\hat{y})$

This makes it plain that f(x) and f(y) contain all of the necessary information to describe f. Since each of these may itself be written in the {x, y} basis, we may as well just keep the coefficients of f(x) and f(y) in that basis:

$\mathbf{F}=\left[\begin{matrix}f_{xx} & f_{xy} \\f_{yx} & f_{yy} \end{matrix}\right]$

We call the object F made up of the coefficients of f(x) and f(y) a matrix, and say that it has four elements. The element in the ith row and jth column is often written F_ij. Application of the function f to a vector v can now be written as the matrix F multiplied by the column vector representation of v:

We can take this as defining how a matrix gets multiplied by a vector, in fact. This approach gives us a lot of power. For instance, if we have a second linear function g : ℝ² → ℝ², then we can write out the composition (g ∘ f)(v) = g(f(v)) in the same way:

That means that we can find a matrix for g ∘ f from the matrices for g and f. The process for doing so is what we call matrix multiplication. Concretely, if we want to find (AB)_ij, the element in the ith row and jth column of the product AB, we take the dot product of the ith row of A and the jth column of B, where the dot product of two lists of numbers is the sum of their products:

To find the dot product of any two vectors, we write them each out in the same basis and use this formula. It can be shown that which basis you use doesn't change the answer.

If this all seems arcane, then try reading through it a few times, but rest assured, it makes a lot of sense with some more practice. Next time, we'll look at some particular matrices that have some very useful applications.

Tuesday, August 17, 2010

What is a basis?

Consider a vector. Just to make things concrete, consider a vector on the 2-D plane. In fact, let's consider this one (call it v⃑):

It's a vector, to be sure, but it's hardly clear how one is supposed to work with it. It doesn't make sense to pull out a ruler and pencil every time we want to add our vector to something; mathematics is supposed to be a model of the world, and thus we should be able to understand things about that model without recourse to physical measurements. To solve this problem for vectors on the plane, we can introduce two new vectors, x̂ and ŷ, then use vector addition to write v⃑ as a sum:

Now we can write v⃑ = ax̂ + bŷ̂, which doesn't at first seem to buy us much. Note, however, that we can write any vector on the 2D plane as a sum of these two new vectors in various linear combinations. Mathematically, we write this as ℝ² = span {x̂, ŷ}. Whenever a space can be written this way for some set of vectors B, we say that B is a basis for the space.

Once we have a basis picked out, we can work with the coefficients (a and b in our example) instead of the vector itself, as they completely characterize the vector. For example, adding vectors becomes a matter of adding their respective coefficients.

In spaces other than the 2-D plane, we can also apply the same idea to find bases for representing vectors. Consider, for instance, the space of column vectors such as [a; b] (pretend they're stacked in a column, OK?). Then, a perfectly fine basis would be the set:

$B=\left\{\left[\begin{matrix}1\\0\end{matrix}\right],\ \left[\begin{matrix}0\\1\end{matrix}\right]\right\}$

It's easy to see that we can write any other 2-dimensional column vector as a sum of the form a[1; 0] + b[0; 1] = [a; 0] + [b; 0] = [a; b].

A point that can get lost in this kind of discussion, however, is that there's absolutely nothing special about the bases I've given here as examples. We could just as well used [1; 1] and [1; -1] as a basis for column vectors, or just as well used a different pair of vectors in the plane:

Put differently, a basis is a largely arbitrary choice that you make when working with vectors. The relevant operations work regardless of what basis you use, since each of the vectors in a basis can itself be expanded. For example, [1; 0] = ½([1;1] + [1; -1]) and [0; 1] = ½([1; 1] - [1; -1]), so that we have a way of converting from a representation in the {[1; 0], [0; 1]} basis to the {[1; 1], [1; -1]} basis.

While there is much, much more to be said on the topic of bases for vectorspaces, I'm happy to say a few words about bases. As we shall see when we get into discussing linear operations, the existence of bases for vectorspaces is a large part of what gives us so much power in linear algebra. We shall need this power in the quantum realm, as linear algebra may well be said to be the language of quantum mechanics. Hopefully I'll get a few more words in on the subject before my vacation!

cgranade::streams