begin lecture 14, Bonus Material. Well let's consider a more involved example, this one motivated by a problem in statistics. Lets say that you run an experiment and it gives you some data that is of the following form, y equals m times x. That is you measure x values and y values. And you know that there's some linear relationship between them, but you don't know the value of m. Perhaps this is coming from a physical experiment, like trying to measure a spring constant by measuring force and deflection. Whatever the physical motivation, you are given some collection of data points, but these data points are noisy. They don't fit perfectly on the line. How do you determine the appropriate value of m? Well you could just draw a line and try to make it fit, and look right. But wouldn't it be nice to have a more principled approach? This is what statistics is meant for. So, let's assume that the inputs to your problem are a collection of data points. x values, x sub i and y values, y sub i, that are paired. Now, in order to find the appropriate value of m, we're going to write this as an optimization problem. The method of least squares is a wonderful technique for determining the optimal m. Consider the function s depending on m that is given by the following. I'm going to look at the vertical distance between the data points, and the line of slope m. This vertical distance is given by y sub i minus m times x sub i. What I'm going to want to do is add up all of those distances and then minimize. Now there's a bit of a problem in that these distances are signed, they're positive or negative because I'm really just looking at the change in y values. So let's square that term, we have y sub i minus mx sub i, quantity squared. And now let's sum all of those terms up over i. This is going to give you a deviation of the data from the line of slope m. This function depends on m.
If we chose a value of m like 0, well
that would that would give a very large value of s. In this case, what we want to do is find the value of m that minimizes this deviation s. So let's proceed. If we compute the derivative of s with respect to m, what would we get? This looks scary, but it's not so bad. Differentiation is linear, so we can pass the derivative inside the summation sign. Now, using the chain rule what do we get? Well we get twice quantity y sub i minus m times x sub i, times the derivative of that quantity with respect to m. That derivative is negative x sub i. Now if we distribute this multiplication and expand out into 2 sums we get minus 2 times the sum of over i of xi times yi plus 2 times m times the sum over i, xi squared. We can factor out that two and that m because they appear in every summation term. Now our goal is to compute the minimum. So we find the critical point by setting this derivative equal to zero. Moving one sum over to the other side, we see that 2 times the sum over i of xi, yi is equal to 2m times the sum over i of xi squared. What is it that we're trying to solve for. We're trying to solve for m and so cancelling the 2s and then dividing both sides by the sum of xi squared gives a value of m equal to the sum over i of xi times yi divided by the sum over i of xi squared. The question remains is this critical point a local min or a local max? Well you might guess that it's a local min, but how would you show it for sure? Well, if we compute the second derivative of s with respect to m, what will we get? It looks complicated, but there's really only one m in that first derivative. And so, treating everything as a constant, we get that the second derivative is simply 2 times the sum over i of xi squared. What do we note about that? Well we don't care what the xi values are. When we square it, we get something non negative. So as long as sum of the xi terms are positive, we get a positive second derivative, and a minimum.
This value of m is going to minimize our
deviation and give us a best fit line. Now, what happens if our experiment is a little bit different? The line that we're looking for doesn't necessarily pass through the origin. Well it doesn't seem as though the problem has really changed much at all. We're just again looking for a straight line. But now we have to worry about not only the slope, but also the y intercept which we might call b. We're looking for a line of the form y equals m x plus b. I wonder, could we do the same thing? Well the vertical distance would involve a b term in this s function. And now, this function would depend not only on m but on B. And this leads us to some very interesting questions because we do not know how to find a max or min of a function that depends on more than one input. This is really a problem that you're going to come back to in multivariable calculus. When you add function with several inputs, how do you do optimization? Well I've got to tell you, some unusual things can happen in that context. But, those unusually situations wind up opening a whole new world of interesting questions and applications for example, gain theory, deals with optimization of multi-varied functions. Linear programming, machine learning, all of these fascinating subjects, are deeply concerned with optimization of finding maxima, minima and other types of critical points. There are some wonderful fields out there that will rely on the intuition that we've learned in single variable calculus.