Introduction To Forecasting Presentation

Intro to Forecasting
Michael Bailey
Economist, Data Scientist @Facebook
Trying to predict the future is like trying to drive down a country

road at night with no lights while looking out the back window.
Peter Drucker
Successful Forecasts
Source: 538 Blog NY Times
Source: Baseball-Almanac
Source: Baseball-Almanac
Failed Forecasts
"I predict the Internet will soon go spectacularly supernova and in 1996
catastrophically collapse. Robert Metcalfe, founder of 3Com and
inventor of Ethernet, writing in a 1995 InfoWorld column.
"Television won't be able to hold on to any market it captures after the

first six months. People will soon get tired of staring at a plywood box
every night." Darryl Zanuck, 20th Century Fox, 1946
We will never make a 32 bit operating system. Bill Gates
Failed Forecasts
Failed Forecasts
Outline
What methods should be used to construct a forecast?
What distinguishes a good forecast from a bad one? What are the best
practices and common mistakes of forecasters?
How can I learn more?
Example: P2P Lending

Peer-to-Peer (P2P) lending networks facilitate the matching between
borrowers who want to borrow money, and peer lenders who are willing
to loan them money at a premium.
Because peers are matched directly, borrowers face lower interest rates
than those offered by banks or credit cards and lenders can earn higher
returns than those offered by banks or bonds.
These networks are highly transparent, releasing a plethora of data
about the potential borrower. The two largest networks, Lending Club
and Prosper continuously release their lending data to the public:
LendingClub Downloads
Prosper Downloads

How many Loans will Lending Club and Prosper make next month? Next 3
months? Next year?
Qualitative Methods
Ad-hoc / make stuff up: used more often than you would think, often for
new products/markets where there isnt much data available.
Delphi Method: iterative forecasts by a room of experts. Panel sees
results from previous round and reforecasts. Several problems with this
method bias of outliers, group psychology effects like herding, etc.
Quantitative Methods
Time Series predict future values based upon past values. Some models
include other regressors (ARMAX), but usually the forecast is based solely
upon observed values of the response.
Nave Forecast this periods value is equal to last periods value

(ARIMA(0,1,0)).
Moving Average This periods predicted response is equal to the

average of the past n periods response. This is known as a MA model of
order n.
Useful for smoothing out noise to see trends in the data.
Moving Average -
The key to time series analysis is to transform the data into a stable
time series:
(1)
Mean is nearly constant
Transformation: take differences (diff() function in R)

(2)
Volatility is nearly constant
Transformation: take logs or powers. Boxcox family of transformations

flexibly covers both:
Y = (lambda*y + 1)^(1/lambda)
Ahhhh, Stability at last! Note that taking several differences of the

data is not at all uncommon.
Once you have a stable time series, you can forecast forward using
exponential smoothing models. Moving Averages are a special
type of ES.
ES models take in all past data, but put different weights on how
recent data should predict the next period.
One ES model is HoltWinters (HoltWinters() in R) that selects a
smoothing parameter automatically.
Decomposition: Sometimes youll want to decompose your time

series into trend and seasonal components. There are several
algorithms to accomplish this (see decompose() and stl() in R).
Lets see if there is any seasonality in the monthly lending data:
ES models assume no autocorrelation, or correlation between the errors

of successive y values.
The correct model might need to take into account substantial
correlation, for example the true model generating the data could be:
y(t) = y(t-1) + e
Todays value is yesterdays value plus some error. This is an
autoregressive model of order 1.
Use lag.plot() or acf() in R to see autocorrelation structure:
There is a model that incorporates all the concepts of differences, MA,

and AR, that is the ARIMA(p, d, q) model.
ARIMA(1, 0, 0) = AR(1)
ARIMA(0, 0, 1) = MA(1)
d = number of times we are differencing the data to obtain stationarity.
I wont go into how to select the values of p, q, it requires learning about
the partial correlation function and the correlation function (see
references at end of slides).
Lets be lazy and use auto.arima() in R to pick them for us:
ARIMA(1, 2, 1) predictions for Lending Club disbursements:

Feb
Mar
114,727,363 123,818,067
Apr
132,671,221
May
141,424,018
Jun
150,134,416
Jul
158,826,902
Make sure you are fitting your model using training data, and validating
your model using test data.
Two most common model validation metrics are:
Mean Absolute Error = mean(|error|)
Root Mean Squared Error = sqrt(mean(error^2))
There is vociferous debate in the forecasting journals about the best
metric to use very domain specific.
What if you want to predict y|x?

For non-time series, a plethora of tools available: multivariable
regression, information likelihood models, machine learning, neural
networks, etc.
For time series, any model that fits y|x used in a forecast will need
forecasted values of x to make the forecast. Usually avoided in practice
unless very predictable xs are chosen.
Scenario Forecasting: fit a model of y|x and then pick different scenarios
for the evolution of x. Provide a forecast for each scenario.
Forecasting Process:
1) Define problem, Gather contextual knowledge
2) Plot like crazy to learn data structure (correlation matrices,
autocorrelation plots, etc.)
3) Split data into train/test set.
4) (Time Series) Transform the data to obtain stationarity.
5) Fit appropriate models, test that errors look like white noise.
6) Apply models to test data set, evaluate using error deviation metrics.
Make forecasts.
7) Re-evaluate next period.
Best Practices
Gather as much contextual knowledge (experts) as possible.
Best Practices
Embrace Uncertainty.
>
Best Practices
Avoid overfitting.
Always train your model and perform model selection on a subset of your
data.
Dont necessarily select the model that best fits your current data.
Dont necessarily select the model that best fits the next n periods.
Remember, you can always attain a perfect fit to the data with enough
parameters.
Best Practices
Make Lots of Forecasts and Calibrate.
Continually asses the probability statements of your model to see how
far it deviates from the truth.
Best Practices
Beware the Lucas Critique.
When your forecast might affect the outcome, calibration is incredibly
difficult.
Platform Forecasting
At Facebook, we have the challenging problem of the need of a platform
forecast.
Our revenue is dependent on our users using the site, and our advertisers
wanting to serve them ads. We control neither supply nor demand, and
thus we need to make forecasts for both.
We also need to understand how the composition of supply and demand
turns into revenue which very much depends on the ads-serving
mechanisms and optimizations we employ, which are continuously
changing.
We use a combination of simulation techniques, experiments, machine
learning and cross-section models, and time series models to estimate
the demand and supply curves are facing.
Resources
Dont google forecasting instead search on methods (time series,
prediction market, neural networks, etc.)
Resources R
Forecasting in R
R zoo package, useful for dates
R forecast package
A Little R Time Series Book
Python/Pandas
Time Series in Pandas
Time Series in Pandas (Video)
statsmodels
Texts
Forecasting Methods and Applications
Time Series Analysis
Forecasting: Principles and Practice
Signal and the Noise: Why So Many Predictions Fail but Some
Don't

Introduction To Forecasting Presentation

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Introduction To Forecasting Presentation

Încărcat de

Drepturi de autor:

Formate disponibile

Intro to Forecasting

Trying to predict the future is like trying to drive down a country

Source: 538 Blog NY Times

"Television won't be able to hold on to any market it captures after the

We will never make a 32 bit operating system. Bill Gates

Example: P2P Lending

Example: P2P Lending

Example: P2P Lending

Example: P2P Lending

Nave Forecast this periods value is equal to last periods value

Moving Average This periods predicted response is equal to the

Mean is nearly constant

Transformation: take differences (diff() function in R)

Volatility is nearly constant

Transformation: take logs or powers. Boxcox family of transformations

Ahhhh, Stability at last! Note that taking several differences of the

Decomposition: Sometimes youll want to decompose your time

ES models assume no autocorrelation, or correlation between the errors

There is a model that incorporates all the concepts of differences, MA,

ARIMA(1, 2, 1) predictions for Lending Club disbursements:

What if you want to predict y|x?

S-ar putea să vă placă și