Sunteți pe pagina 1din 14

Chapter 0 Introduction

STAT3907 Linear Models and Forecasting

Guodong Li
Department of Statistics and Actuarial Science,
The University of Hong Kong

Guodong Li STAT3907/2804 1 / 14
What is “Linear Models and Forecasting”?

I The “forecasting” is defined by Webster’s dictionary as an activity

I “to calculate or predict some future event or condition, usually as a


result of rational study or analysis of pertinent data.”

I The “forecasting” is the aim and the “linear model” is one of the tools to
do forecasting.

Guodong Li STAT3907/2804 2 / 14
Example 1

Day Mon Tue Wed Thu Fri


Waiting time (min) 2 9 7 10 4

I On Saturday, suppose you have waited for more than 11 minutes, but the
bus still does not come. Then, ...

Guodong Li STAT3907/2804 3 / 14
Something Behind the Story

I You feel unhappy because you suppose that

I the bus should arrive around 7 minutes, and it is very unusually for
more than 11 minutes
I In the above, you have done a forecasting based the historical data (5
values)

I The bus driver may argue that the waiting time depends on

1. the traffic situation on the way to this station


2. the time table when the bus is sent out from the starting station
3. ...

Guodong Li STAT3907/2804 4 / 14
Example 2

I A hospital record the plasma levels of total cholesterol (in mg/ml) of 24


patients with hypercholesterolemia admitted to this hospital

3.5 1.9 4.0 2.6 4.5 3.0 2.9 3.8 2.1 3.8 4.1 3.0
2.5 4.6 3.2 4.2 2.3 4.0 4.3 3.9 3.3 3.2 2.5 3.3

I Suppose a new patient comes to the hospital. How to get an overview


about his/her plasma levels? (i.e. how to forecast his/her plasma levels?)

Guodong Li STAT3907/2804 5 / 14
Forecasting the Levels

I Denote X = the plasma levels. 5


Then, these 24 values can be
considered as independent

Plasma Levels
observations of the random 4
variable X.
I We may conclude that, for the
3
new patient, the plasma level will
be around 3.354 and 95%
confidence interval (CI) is 2
3.354 ± 1.96 × 0.8 where
I the sample mean is 3.354 4 8 12 16 20 24

I the standard deviation is 0.8 Patient

Guodong Li STAT3907/2804 6 / 14
Example 2 (Continued)

I The hospital also collected the ages of these patients

46 20 52 30 57 25 28 36 22 43 57 33
22 63 40 48 28 49 52 58 29 34 24 50

I How to predict the plasma levels of the new patient?

Guodong Li STAT3907/2804 7 / 14
Any Conclusion?

5 5

Plasma Levels
Plasma Levels

4 4

3 3

2 2

4 8 12 16 20 24 20 30 40 50 60
Patient Age

Guodong Li STAT3907/2804 8 / 14
Predicting the Plasma Levels with Age
5

Plasma Levels 4

20 30 40 50 60
Age

I Red line or Blue line?


I Some assumptions on “Plasma level” and “Age” are needed!
Guodong Li STAT3907/2804 9 / 14
Linear Regression Models

I Suppose there two variables

I One is the target variable. It is called dependent variable or


response, and denoted by Y .
I The others are used to explain the target variable. It is called the
independent variable or predictor, and denoted by X.
I There may be more than one predictors.

I The linear model

I assumes that a linear formula for Y , which is linear function of X,


and
I some assumptions are also assumed.

I We can do forecasting or/and make inference, if the linear regression


models are known or are already estimated.

Guodong Li STAT3907/2804 10 / 14
Time Series
I In economics, there are a lot of data.

1. They are collected by the order of the time, e.g. daily data, monthly
data, ...
2. The current value may depend on the past values of itself, e.g. the
stock prices, ...
3. The possible variation of the target variable may due to factors we
cannot explain, such as weather, changes in taste, ...
4. There are many other variables which have statistically significant
contribution to the target variable. However, (1) it is very difficult to
judge the causality between them; and (2) their values are also
difficult to obtain, e.g. these macroeconomic variables.
5. The remained parts after linear models still have the property 2.
6. ...

I For these special type of data, we need the tool of time series analysis.
Guodong Li STAT3907/2804 11 / 14
Example 3

I The annual Canadian hares were recorded from 1905-1909

50 21 20 22 27

I How to set the linear model for the above time series?

Guodong Li STAT3907/2804 12 / 14
Linear Time Series Models

I Suppose the number of Canadian Present Year Last Year


hares can be affected by that in (Y ) (X)
the last year.
50 21
I The formula can be specified as 21 20
Y = a + bX 20 22

or 22 27
27
Y (t) = a + bY (t − 1)

Guodong Li STAT3907/2804 13 / 14
Content of this course

I All the content of this course can be divided into two parts:

I Linear regression analysis (Part I)


I Time series analysis (Part II)

Guodong Li STAT3907/2804 14 / 14

S-ar putea să vă placă și