Sunteți pe pagina 1din 3

An Overview of Statistical Learning

Statistical learning refers to a vast set of tools for understanding data. These tools
can be classied as supervised or unsupervised.
supervised statistical learning involves building a statistical model for predicting, or
estimating, an output based on one or moreinputs
With unsupervised statistical learning, there are inputs but no supervising output;
nevertheless we can learn relationships and structure from such data.
The inputs go by dierent names, such as predictors, independent variables,
features, predictor independent variable feature or sometimes just variables. The
output variablein this case, salesis variable often called the response or
dependent variable, and is typically denoted response dependent variable using the
symbol Y.
More generally, suppose that we observe a quantitative response Y and p dierent
predictors, X1,X2,...,Xp. We assume that there is some relationship between Y and X
=(X1,X2,...,Xp), which can be written in the very general form
Y = f(X)+e
Here f is some xed but unknown function of X1,...,Xp, and E is a random error term,
which is independent of X and has mean zero.

Why Estimate f?
There are two main reasons that we may wish to estimate f: prediction and
inference. We discuss each in turn.
In many situations, a set of inputs X are readily available, but the output Y cannot
be easily obtained. In this setting, since the error term averages to zero, we can
predict Y using (So we predict average)
Y = f(X), (2.2)
where f represents our estimate for f, and Y represents the resulting prediction
for Y . In this setting, f is often treated as a black box, in the sense that one is not
typically concerned with the exact form of f, provided that it yields accurate
predictions for Y.
The accuracy of Y as a prediction for Y depends on two quantities, which we will
call the reducible error and the irreducible error. In general, reducible error
irreducible error

if will not be a perfect estimate for f, and this inaccuracy will introduce some error.
This error is reducible because we can potentially improve the accuracy of f by
using the most appropriate statistical learning technique to estimate f. However,
even if it were possible to form a perfect estimate for f, so that our estimated
response took the form Y = f(X), our prediction would still have some error in it!
This is because Y is also a function of , which, by denition, cannot be predicted
using X. Therefore, variability associated with also aects the accuracy of our
predictions. This is known as the irreducible error, because no matter how well we
estimate f, we cannot reduce the error introduced by

Note: If we have a single variable, mean will be the best predictor of any give value.
Goodness of the t: 1. calculate distance of data point from mean line
2. This distance b/t best t line to observed values is called resuledles/errors
3. the resuduals always addup to zero or it has mean zero
4. Now We square resuduals because it of the same reson we square SD because 1.
we want it to be positive and 2. We want exacerbate the the points further away. We
add them together and call them Sum of Squared errors(SSE). When we say Sum of
squares we mean sum of Actual squares created by sqaring Resudual.
5. So we create a linear model that Min. Sum of Square errors. So initiall we will
create a best t line using Dependent variable by calculating mean and Di as SSE.
Then we introduce dependent variable and it creates a best t line which has sum of
squares regression, which should be less then SSE.
6. If SSE and SSR are overlapiing then independendent variable is of no use.So we
compare best t line to dependent variable regression line.
7. Correlation+ANOVA = Simple Linear Regression
8. Y = B0 + B1(X) + E
B0=Y intercept popu parameter/mean of all observerd values (Y)
B1 = SLOPE Popu Parameter
E= ERROR term, unexplained variation in Y
Y+E=E(Y)=Mean value of Y at X/expected value of Y=B0+B1(X)
9. If we know actual B1,B0 values we can calculate E(Y), but in real world we cannot
get those values. So we will use (Y) cap, point estimate, (b1,b0 )cap, point estimate
from sample data to estimate Y,B1,B0.

10. For population Mean point estimator is sample mean, same way for SD and
11. Goal is to minizize SSL
1. Do scatter plot check, look for visual line

Expdcted value of sampling distribution mean = population distribution

Corelenarirt, scatterplots, and simple linear regression