Sunteți pe pagina 1din 11

EMAT10001 : Linear Algebra Vectors and Vector Spaces

Andrew Calway March 2, 2014

Introduction
This part of the unit is about linear algebra. It is called linear algebra because it deals with linear combinations of things. For example, everything that well be dealing with can at some level be written in the form of linear equations, such as a1 x1 + a2 x2 + a3 x3 + .. + an xn = y . In fact much of linear algebra is about the language that we use to manipulate and reason about these linear forms, without having to explicitly write out all of the details. Thus it makes a lot of use of abstractions and data structures to help us do that. Why is this important? Well, it turns out that there are many applications in computer science in which linear algebra plays a central role. Examples include search engines, machine learning, robotics, graphics and computer vision. Thus computer scientists need to have a good understanding of the key concepts and more importantly, the language of linear algebra. As was the case with probability, there are many many textbooks and online material about linear algebra. Some are aimed at specic disciplines and applications in science and engineering, including computer science. I suggest you make use of these as much as possible, to get a variety of dierent explanations and descriptions so that you can make sure that you fully understand the material. As previously, I have made use of several textbooks from my bookshelves and these are listed below. Theory and problems of linear algebra by Seymour Lipschutz, McGraw-Hill,1981. Linear Algebra and Probability for CS Applications by Ernest Davis, CRC Press, 2012. Coding the Matrix by Philip N Klein, Newtonian Press, 2013.

Vectors
We start with one of the basic components of linear algebra - the vector - which we have already used when we looked at the gradient and Fourier series. As pointed out then, youll nd that in some textbooks, vectors are treated as abstract entities without concrete denition. In many areas of mathematics this is important. However, we will keep things 1

v 2.1 2 |u | O 3 2 u 1.5

Figure 1: 2-D and 3-D vectors concrete, as this is how vectors are often used in computer science and it will also make our exploration easier to follow. We will dene a vector as an ordered list (or tuple) of real numbers. Which order they are in doesnt matter as long as we are consistent and dont change it afterwards. As always, notation varies depending on the discipline and the user - there is no single correct way. We will use round brackets (.) to delimit our ordered list and use bold font to denote a vector. For example, the following are vectors: u = (3, 2) v = (2, 1.5, 2.1) w = (0.5, 0.5, 0.5, 0.5) y = (y1 , y2 , . . . , yn )

x = (2.1, 3, 4, 5.1, 1.2, 3, 1)

The numbers that make up the vector are known as its components and for vector y, say, we denote the ith component by yi . The above are sometimes known as n-dimensional vectors, where n is the number of components, i.e. n = 1, 2, 3, ..., etc. For example, u and x are 2-D and 5-D vectors, respectively, and u2 = 2 and x6 = 3. The set of all such n-dimensional vectors is often denoted by Rn . As an aside, if the components are complex numbers, then the set of vectors is denoted by C n .

Magnitude and Direction


In linear algebra, single real numbers are known as scalars to distinguish them from vectors. Scalars have a magnitude. Vectors also have a magnitude (also called their length) but they also have a direction. This is best seen by considering the 2-D vector u = (3, 2) and representing it on the 2-D plane by an arrow starting at some reference point O and with its endpoint at the coordinates given by the components of the vector as shown on the left in Fig. 1. 2

In this case the magnitude of the vector can be obtained with help from our Greek friend 2 2 + 22 = = Mr P, i.e. |u| = u2 + u 3 13, and similarly, the direction can be dened 2 1 by the angle = arctan(u2 /u1 ) = arctan(2/3). We can dene similar quantities for 3-D vectors and the magnitude for an n-D vector is in general
n

|y| =

2 y1

2 y2

+ ... +

2 yn

=
i=1

2 yi

As another aside, this is also called the Euclidean norm. The angle of direction, however, becomes more tricky to dene as the number of dimensions increases. Well return to this later. We now dene two special vectors. The rst is the zero vector which, unsurprisingly, has zero magnitude and an even less surprising denition, i.e. O = (0, 0, 0). The second are unit vectors, whose magnitudes are equal to one, e.g. for the vector w = (0.5, 0.5, 0.5, 0.5), |w| = 0.52 + 0.52 + 0.52 + 0.52 = 1. Note that for any vector u, say, we can dene a corresponding unit vector in the same direction as u/|u|. These vectors are particularly important and we shall use them a lot.

Examples
Having introduced vectors and their basic properties, let us now look at some examples. Vectors are often used to represent physical things, such as force and velocity, and this is probably how you have come across them before in, e.g., physics. In the case of velocity, they can be 2-D or 3-D vectors, with the magnitude representing the speed and the direction representing the direction of travel. As these are physical things they are straightforward to visualise and interpret. However, as indicated above, we are not restricted to 2-D or 3-D vectors. We can dene and work with vectors of any dimension and although these are harder to visualise, they prove to be very useful in many applications in computer science. For example, in document analysis, it is common to make use of what is termed a bag of words model in which documents are represented by vectors in which each component denes the number of occurrences of a given word from a vocabulary in the document. A simple example might be a vector (2, 4, 9, . . .) which represents a document with 2 occurrences of the word sea, 4 of the word boat, 9 of the word island, and so on. Thus with a vocabulary of 1000 words, say, documents become vectors in 1000-D space, which is tricky to visualise (dont try, just think of 3-D and relax!), but which proves to be a very powerful representation for applications such as document classication. Take a look at the following web pages: http://en.wikipedia.org/wiki/Bag-of-words model and http://en.wikipedia.org/wiki/Document classification. We shall return to this later. 3

(w 1 , w 2 , . . . , w n )

Figure 2: Documents can be represented by vectors, in which the components correspond to the number of occurrences of words from a vocabulary.

Addition and Scalar Multiplication


We now move on to some operations for combining and manipulating vectors. The rst one is to add them together and the denition is simple: we add the individual components together to create a new vector, e.g. if w1 = (1, 2, 3, 5) and w2 = (2, 1, 1, 3), then w1 + w2 = (1, 2, 3, 5) + (2, 1, 1, 3) = (1 2, 2 + 1, 3 1, 5 + 3) = (3, 3, 2, 2) or more generally y + z = (y1 + z1 , y2 + z2 , . . . yn + zn ). It follows that we can only add vectors together if they have the same number of components. The other simple operation is to scale a vector by multiplying it with a scalar, e.g. for a scalar a and vector y, ay = (ay1 , ay2 , . . . ayn ), so for the vector v = (2, 1.5, 2.1) , 3v = (3 2, 3 1.5, 3 2.1) = (6, 4.5, 6.3).It should be easy to see that vector addition and scalar multiplication have the same properties as scalar addition and multiplication. It is useful to visualise both operations as shown in Fig. 3 for the case of 2-D vectors. Note that the addition of two vectors corresponds to the diagonal of the parallelogram formed by the vectors and that scalar multiplication by -1 corresponds to a vector in the opposite direction. The subtraction of two vectors is therefore parallel with and of the same magnitude as the other diagonal of the parallelogram, and the distance between the two vectors is given by its magnitude, i.e. d(u, v) = (u1 v1 )2 + (u2 v2 )2 + . . . + (un vn )2

Also note the triangular inequality : the magnitude of the sum of two vectors must be less than or equal to the sum of their magnitudes, i.e. |u + v| |u| + |v|. Again, it is dicult to visualise, but the same applies to all n-D vectors - the addition of two 10-D vectors, for example, corresponds to the diagonal of the parallelogram formed in their common 2-D plane.

5.5

u+v 2v

2 v 0 .5 u

3.5

v v 0 .5 u

u u

1.5

4.5

Figure 3: Vector addition and scalar multiplication

The Dot Product


Next we look at another way of combining vectors. It is one way of forming the product of two vectors. Known as the dot product - denoted by a . - it is dened as follows for two n-D vectors: u.v = (u1 v1 + u2 v2 + . . . + un vn ) Hence it gives a scalar and corresponds to the sum of the products of the corresponding components -multiply the components and add em up. So what does it represent? Lets take a look at the case of two 2-D vectors u and v. It is straightforward to show (you will prove it in the workshop) that the dot product is also given by u.v = |u| |v| cos where is the angle between the two vectors as shown below. Thus, the dot product indicates how close the vectors are to being parallel - it has a value of 1 or -1 if they are parallel and a value of zero if they are perpendicular or, in the language of linear algebra, orthogonal. Note that this applies to all n-D vectors - forming the dot product between two 12-D vectors, for instance, enables us to determine the angle between them, i.e. = arccos(u.v/|u||v|). Note also that if one of the vectors is a unit vector, then the dot product corresponds to the projection of the other vector along the direction of the unit vector, i.e. it is the distance along the direction of the unit vector which is closest to the endpoint of the other vector as shown in Fig. 4. In other words, if u is a unit vector then u.v = arg min d(v, au)
a

|u| = 1

where d(, ) is the distance function. In geometric terms, the vector au, where a = u.v, is then orthogonal to the vector (v au) as also shown in Fig. 4. 5

|u | = 1

v (u.v)u u

u.v = |u| |v| cos = |v| cos

Figure 4: The dot product corresponds to the projection of vector v onto the direction of unit vector u. We can also note that the squared magnitude of a vector is given by the dot product of the vector with itself, i.e. |v|2 = v.v, and from the above cosine relationship that |u.v| |u| |v|. This is known as the Cauchy-Schwarz inequality and makes sense if you consider the geometric interpretation of the dot product. The dot product also has some of the standard properties, e.g. it is commutative and distributive over vector addition, and well cover these in the workshop.

Vector Spaces and Subspaces


We are now going to look at two important concepts - the vector space and the vector subspace. As with vectors, these are sometimes considered in abstract terms, but we are going to look at them in concrete terms based on our vectors with real number components. We start with some denitions: Linear Combination Let V be a set of vectors. Then the vector u is a linear combination over V if there exists vectors v1 , v2 , . . . , vm in V and scalars a1 , a2 , . . . , am such that u = a1 v1 + a2 v2 + + am v m = m i=1 ai vi . In other words, we can write the vector u as a weighted sum of vectors from V where the weights are the scalars ai . Span Let S be a set of vectors. The span of S , denoted Span(S ), is the set of linear combinations over S . In other words, its the set of all vectors that are generated by computing all the 6

v u

Figure 5: The vectors u and v span a subspace containing all vectors such as w which are in the plane . linear combinations of the vectors in S . Thus we say that the vectors in S span the vectors in the set Span(S ). Example 1 Let S = {s1 , s2 } where s1 = (1, 2) and s2 = (1, 1). Then the vector v = (5, 1) is a linear combination over S since (5, 1) = 2(1, 2) 3(1, 1). In fact, all 2-D vectors in R2 are spanned by S , since Span(S ) = {(a b, 2a b) : a, b R} and suitable choices of a and b enable us to generate any vector in R2 . Vector Spaces and Subspaces A set of vectors V is a vector space if there exists a set of vectors S for which V = Span(S ). In other words, a vector space is the set of all the possible linear combinations of a set of vectors. This is not a complete and strict denition of a vector space, but it suces for our purposes. You should take some time to look up the formal denition and also convince yourself that Rn is a vector space. We can also dene a vector subspace : if V and W are both vector spaces and W V , then W is a subspace of V . Example 2 Let S = {(1, 0, 1), (1, 1, 1)} then all vectors (b a, b, a + b) for a, b R are spanned by S and hence form a subspace of R3 . The vector (1, 2, 3) is in the subspace but the vector (2, 1, 1) is not in the subspace (you should convince yourself of this). It should be clear that in terms of geometry, all the vectors in the subspace are in a plane, similar to that shown in Fig. 5. Note also that both Span((1, 0, 1)) and Span((1, 1, 1)) are also 7

subspaces - they contain all the scalar multiples of (1, 0, 1) and (1, 1, 1), respectively, and correspond to lines in 3-D.

Basis and Dimension


Having dened the notions of vector spaces and subspaces as sets of vectors which can be generated by another set of vectors, it is natural to ask how many sets of vectors exist that will generate a given subspace, how many vectors are in them and whether there are some that are better than others. Of course, we need to be clear what we mean by better, but a good choice would seem to be those containing the smallest number of vectors. Why? Well it means that we would then be able to represent any vector in the space using just those common set of vectors plus the same number of scalars. Thus it would be a minimal representation requiring the least amount of storage - which makes us happy as computer scientists. Also, we would likely be interested in how easy it is to generate such representations and then prefer those sets for which generation required the least amount of computation. We will look at that issue in the section once we have determined the minimum number of generating vectors we need. To start, we need some more denitions. Linear (In)Dependence A set of vectors {u1 , u2 , . . . , un } are said to be a linearly dependent set if there exists scalars ai such that a1 u1 + a2 u2 + . . . an un = 0. Another way of putting this is that the set is dependent if we can write any one of the vectors as a linear combination of the others. The counter case is a linearly independent set - no vector can be written as a linear comniation of the others. For example, the vectors (1, 0, 1), (1, 1, 1) and (0, 1, 2) are dependent since (0, 1, 2) = (1, 0, 1) + (1, 1, 1), whereas (1, 0, 1), (1, 1, 1) and (1, 0, 1) are independent - convince yourself of this. Basis and Dimension We are now in a position to determine the minimal number of vectors that need to be in a generating set for a given vector space. Consider a generating set S for a vector space V . Assume rst that S is a dependent set. This means that one or more vectors in S can be written as a linear combination of the others. Thus if we remove them from the set then we can still generate the vector space - they are redundent; if we need them then we just generate them from the reduced set.

It therefore follows that the minimum number of vectors we need to still generate the space is the number that is left once we have removed all the dependent vectors, i.e. until we are left with an independent set. Thus - the smallest set that we can have is one that both generates the vector space, i.e. its vectors span the space, and is a linearly indendent set. Such a set of vectors is known as a basis for the space - think of it as the base of the space from which all others are built from - and the number of vectors in the set is known as the dimension of the space - think of it as the number of degrees of freedom in the space. Example 3 The set S = {(1, 0, 1), (1, 1, 1)} are linearly independent and thus form a basis for the subspace Span(S ) corresponding geometrically to a plane in 3-D. The dimension of the subspace is 2. The set S = {(1, 0, 1), (1, 1, 1), (0, 1, 2)} are linearly dependent and so although it spans the subspace it is not a basis for it. Example 4 The 5-D vectors v1 = (1, 0, 2, 1, 1), v2 = (2, 1, 0, 1, 1), v3 = (0, 1, 1, 2, 0) and v4 = (4, 2, 1, 3, 3) are linearly dependent since v4 = 2v1 v2 3v3 . The set {v1 , v2 , v3 } are independent and span a 3-D subspace of R5 .

Coordinates and Orthonormal Bases


So now we know the minimum number of vectors we need to generate a vector space - we need a basis - the next question to ask is how we determine the representation for a given vector in our space in terms of our basis. Or more specically, if B = {b1 , b2 , . . . , bm } for a subspace of dimension m, how do we nd the scalars ai such that a vector v in the subspace is given by v = m i=1 ai bi ? The scalars ai are known as the coordinates of the vector v with respect to the basis set B . Actually, we have been using coordinates since we started - the components of our vectors are coordinates w.r.t to what we call the standard basis. For example, in R3 , the standard basis is the set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and it is easy to see that the weights required to build any vector from a linear combination of these vectors correspond to the components of the vector. But what about the coordinates w.r.t any another basis? In such cases we have solve for the ai in the linear combination v = m i=1 ai bi . For example, taking the vectors in Example 4, if we assume that (4, 2, 1, 3, 3) = a1 (1, 0, 2, 1, 1) + a2 (2, 1, 0, 1, 1) + a3 (0, 1, 1, 2, 0) 9

then we get 4 = a1 2a2 2 = a2 a3 1 = 2 a1 + a3 3 = a1 + a2 2a3 3 = a1 a2

By substitution, we then get a1 = 2, a2 = 1 and a3 = 3, as in Example 4. Note that these values satisfy all ve of the above equations. What would it mean if we couldnt nd values to satisfy all ve? It would mean that (4, 2, 1, 3, 3) couldnt be represented as a linear combination of the other vectors and thus was outside of the subspace. It turns out however that there is a quicker way of determining the coordinates w.r.t a given basis if we choose the latter carefully. If as well as being linearly independent the basis vectors are also unit vectors and orthogonal to each other, then we can show (you will do this in the workshop) that the coordinates for a vector are given by the dot product between the vector and each basis vector. Thus, if vector v lies in a subspace with basis B = {b1 , b2 , . . . , bm } and bi .bj = 1 if i = j 0 if i = j

then v = m i=1 ai bi if ai = v.bi . This type of basis is known as an orthonormal basis, i.e. the basis vectors are orthogonal and normalised to have length one. We might then ask how do we nd an orthonormal basis for any given subspace? This can be done using an algorithm called Gram-Schmidt orthogonalisation. We shant look at this but if you are interested, then look it up. Example 5 The two 4-D unit vectors b1 = 0.5(1, 1, 1, 1) and b2 = 0.5(1, 1, 1, 1) are linearly independent and orthogonal. The vector v = (2.5, 0.5, 2.5, 0.5) is in the subspace spanned by the two vectors and its coordinates w.r.t the basis are a1 = v.b1 = 0.5((1 2.5) + (1 0.5) + (1 2.5) + (1 0.5)) = 0.5 4 = 2 which is correct since a2 = v.b2 = 0.5((1 2.5) + (1 0.5) + (1 2.5) + (1 0.5)) = 0.5 6 = 3 (2.5, 0.5, 2.5, 0.5) = 2 0.5(1, 1, 1, 1) 3 0.5(1, 1, 1, 1)

Projections onto Subspaces


To nish o, lets return to our discussion about the dot product. Recall that we said that it could be seen as the projection of one vector onto the direction of the other. In other 10

words, it sort of tells how much there is of one vector in the direction of the other. It turns out that it is very useful to ask a similar question about vectors and subspaces - how much of a vector is within a given subspace? We can answer this by considering the projection of a vector onto a subspace. What does this mean? Well in the same way as with the dot product, it means the vector within the subspace which is closest to the vector being projected, i.e. the distance between them is the smallest possible across all vectors in the subspace. Let v = m i=1 ci bi be the projection of a vector v onto the subspace spanned by an orthonormal basis B = {b1 , b2 , . . . , bm }, then since it is the closest vector to v, the dierence vector v v and v must be orthogonal. We can be see this by considering the geometry as in the case of the dot product. We can also show (and youll do this as well in the workshop) that the coordinates ci of v are given by ci = v.bi , i.e. the projection of v onto each of the basis vectors. Example 6 Consider the subspace spanned by the orthonormal basis b1 and b2 in Example 5. We want to nd the projection of the vector v = (0.5, 1, 0, 1) onto the subspace. Computing the projection of v onto each basis vector gives v.b1 = 0.5(0.5 + 1 + 0 + 1) = 0.75 Hence the projection is given by v = 0.75 0.5(1, 1, 1, 1) + 1.25 0.5(1, 1, 1, 1) = (0.25, 1, 0.25, 1) which obviously lies in the subspace. We can now check whether the dierence vector is orgononal to subspace, i.e. d = v v = (0.5, 1, 0, 1) (0.25, 1, 0.25, 1) = (0.25, 0, 0.25, 0). Forming the dot product with each basis vector gives b1 .d = 0.5(0.25 + 0 + 0.25 + 0) = 0 b2 .d = 0.5(0.25 + 0 0.25 + 0) = 0 v.b2 = 0.5(0.5 + 1 + 0 + 1) = 1.25

which shows that d is orthogonal to both basis vectors and hence to all vectors in the subspace (you should convince yourself of that as well). In fact this example is interesting since it illustrates the basic mechanism used in the Gram-Schmidt algorithm - it iteratively projects vectors onto subspaces, building up an orthogonal basis as it goes.

11

S-ar putea să vă placă și