Sunteți pe pagina 1din 12

ForecastinginSTATA:ToolsandTricks

Introduction
ThismanualisintendedtobeareferenceguidefortimeseriesforecastinginSTATA.Itwillbeupdated periodicallyduringthesemester,andwillbeavailableonthecoursewebsite.

WorkingwithvariablesinSTATA
IntheDataEditor,youcanseethatvariablesarerecordedbySTATAinspreadsheetformat.Eachrowsis anobservation,eachcolumnisadifferentvariable.AneasywaytogetdataintoSTATAisbycutting andpastingintotheDataEditor. WhenvariablesarepastedintoSTATA,theyaregiventhedefaultnamesvar1,var2,etc.Youshould renamethemsoyoucankeeptrackofwhattheyare.Thecommandtorenamevar1asgdpis: . rename var1 gdp Newvariablescanbecreatedbyusingthegeneratecommand.Forexample,totakethelogofthe variablegdp: . generate y=ln(gdp)

DatesandTime
Fortimeseriesanalysis,datesandtimesarecritical.Youneedtohaveonevariablewhichrecordsthe timeindex.Wedescribehowtocreatethisseries. AnnualData Forannualdataitisconvenientifthetimeindexistheyearnumber(e.g.2010).Supposeyourfirst observationistheyear1947.Youcangeneratethetimeindexbythecommands: . generate t=1947+_n-1 . tsset t, annual Thevariable_nisthenaturalindexoftheobservation,startingat1andrunningtothenumberof observationsn.Thegeneratecommandcreatesavariabletwhichadds1947to_n,andthen subtracts1,soitisaserieswithentries1947,1948,1949,etc.Thetssetcommanddeclaresthe variablettobethetimeindex.Theoptionannualisnotnecessary,buttellsSTATAthatthetime indexismeasuredattheannualfrequency.

QuarterlyData STATAstoresthetimeindexasanintegerseries.Itusestheconventionthatthefirstquarterof1960is 0.Thesecondquarterof1960is1,thefirstquarterof1961is4,etc.Datesbefore1960arenegative integers,sothatthefourthquarterof1959is1,thethirdis2,etc. Whenformattedasadate,STATAdisplaysquarterlytimeperiodsas1957q2,meaningthesecond quarterof1957.(EventhoughSTATAstoresthenumber11,theeleventhquarterbefore1960q1.) STATAusestheformulatq(1957q2)totranslatetheformatteddate1957q2tothenumericalindex 11. Supposethatyourfirstobservationisthethirdquarterof1947.Youcangenerateatimeindexforthe datasetbythecommands . generate t=tq(1947q3)+_n-1 . format t %tq . tsset t Thegeneratecommandcreatesavariabletwithintegerentries,normalizedsothat0occursin 1060q1.Theformatcommandformatsthevariabletusingthetimeseriesquarterlyformat.Thetq referstotimeseriesquarterly.Thetssetcommanddeclaresthatthevariabletisthetimeindex. Youcouldhavealternativelytyped . tsset t, quarterly totellSTATAthatitisaquarterlyseries,butitisnotnecessaryasthasalreadybeenformattedas quarterly.Now,whenyoulookatthevariabletyouwillseeitdisplayedinyearquarterformat. MonthlyData Monthlydataissimilar,butwithmreplacingq.STATAstoresthetimeindexwiththeconvention that1960m1is0.Togenerateamonthlyindexstartinginthesecondmonthof1962,usethecommands . generate t=tm(1962m2)+_n-1 . format t %tm . tsset t

WeeklyData Weeklydataissimilar,withwinsteadofqandm,andthebaseperiodis1960w1.Foraseries startinginthe7thweekof1973,usethecommands . generate t=tw(1973w7)+_n-1 . format t %tw . tsset t DailyData Dailydataisstoredbydates.Forexample,01jan1960isJan1,1960,whichisthebaseperiod.To generateadailytimeindexstaringonApril18,1962,usethecommands . generate t=td(18apr1962)+_n-1 . format t %td . tsset t

PastingaDataTableintoSTATA
Somequarterlyandmonthlydataareavailableastableswhereeachrowisayearandthecolumnsare differentquartersormonths.IfyoupastethistableintoSTATA,itwilltreateachcolumn(eachmonth) asaseparatevariable.YoucanuseSTATAtorearrangethedataintoasinglecolumn,butyouhavetodo thisforonevariableatatime. Iwilldescribethisformonthlydata,butthestepsarethesameforquarterly. AfteryouhavepastedthedataintoSTATA,supposethatthereare13columns,whereoneistheyear number(e.g.1958)andtheother12arethevaluesforthevariableitself.Renametheyearnumberas year,andleavetheother12variableslistedasvar2etc.Thenusethereshapecommand .reshapelongvar,i(year)j(month) Now,thedataeditorshouldshowthreevariables:year,monthandvar.STATAhasresortedthe observationsintoasinglecolumn.Youcandroptheyearandmonthvariables,createamonthlytime index,andrenamevartobemoredescriptive. Inthereshapecommandlistedabove,STATAtakesthevariableswhichstartwithvarandstripsoffthe trailingnumbersandputstheminthenewvariablemonth.Itusestheexistingvariableyearto groupobservations.

DataOrganizedinRows
Somedatasetsarepostedinrows.Eachrowisadifferentvariable,andeachcolumnisadifferenttime period.IfyoucutandpastearowofdataintoSTATA,itwillinterpretthedataasasingleobservation withmanyvariables. OnemethodtosolvethisproblemiswithExcel.Copytherowofdata,openacleanExcelWorksheet, andusethePasteSpecialCommand.(Rightclick,thenPasteSpecial.)ChecktheTransposeoption, andOK.Thiswillpastethedataintoacolumn.Youcanthencopyandpastethecolumnofdatainto theSTATADataEditor.

CleaningDataPastedintoSTATA
Manydatasetspostedonthewebarenotimmediatelyusefulfornumericalanalysis,astheyarenotin calendarorder,orhaveextracharacters,columns,orrows.Beforeattemptinganalysis,besureto visuallyinspectthedatatobesurethatyoudonothavenonsense. Examples Dataattheendofthesamplemightbepreliminaryestimates,andbefootnotedormarkedto indicatethattheyarepreliminary.Youcanusetheseobservations,butyouneedtodeleteall charactersandnonnumericalcomponents.Typically,youwillneedtodothisbyhand,entry byentry. Seasonaldatamaybereportedusinganextraentryforannualvalues.Somonthlydatamightbe reportedas13numbers,oneforeachmonthplus1fortheannual.Youneedtodeletethe annualvariable.Todothis,youcantypicallyusethedropcommand.Forexample,ifthese entriesaremarkedAnnual,andyouhavepastedthislabelintovar2,then . drop if var2==Annual Thisdeletesallobservationsforwhichthevariablevar2equalsAnnual.Noticesthatthis commandusesadoubleequality==.Thisiscommoninprogramming.Thesingleequality= isusedforassignment(definition),andthedoubleequality==isusedfortesting.

TimeSeriesPlots
Thetslinecommandgeneratestimeseriesplots.Tomakeplotsofthevariablegdp,orthevariables menandwomen . tsline gdp . tsline men women

Timeseriesoperators
Foratimeseriesy L. L2. F. F. D. D2. S. S2. lagy(t1) Example:L.y 2periodlagy(t2) Example:L2.y leady(t+1) Example:F.y 2periodleady(t+2) Example:F2.y differencey(t)y(t1) Example:D.y doubledifference(y(t)y(t1))(y(t1)y(t2)) Example:D2.y seasonaldifferencey(t)y(ts),wheresistheseasonalfrequency(e.g.,s=4forquarterly) Example:S.y 2periodseasonaldifferencey(t)y(t2s) Example:S2.y

RegressionEstimation
Toestimatealinearregressionofthevariableyonthevariablesxandz,usetheregresscommand . regress y x z Theregresscommandreportsmanystatistics.Inparticular, Thenumberofobservationsisatthetopofthesmalltableontheright Thesumofsquaredresidualsisinthefirstcolumnofthetableontheleft(underSS),intherow markedResidual. Theleastsquaresestimateoftheerrorvarianceisinthesametable,underMSandintherow Residual.Theestimateoftheerrorstandarddeviationisitssquareroot,andisintheright table,reportedasRootMSE. Thecoefficientestimatesarerepotedinthebottomtable,underCoef. Standarderrorsforthecoefficientsaretotherightoftheestimates,underStd.Err.

Insometimeseriescases(mostimportantly,trendestimationandhstepaheadforecasts),theleast squaresstandarderrorsareinappropriate.Togetappropriatestandarderrors,usetheneweycommand insteadofregress. . newey y x z, lag(k) Here,kisaninteger,meaningnumberofperiods,whichyouselect.Itisthenumberofadjacent periodstosmoothovertoadjustthestandarderrors.STATAdoesnotselectkautomatically,anditis beyondthescopeofthiscoursetoestimatekfromthesample,soyouwillhavetospecifyitsvalue.I suggestthefollowing.Inhstepaheadforecasting,setk=h.Intrendestimation,setk=4forquarterlyand k=12formonthlydata.

InterceptOnlyModel
Thesimplestregressionmodelisinterceptonly,y=b0+e.Thiscanbeestimatedbytheregressornewey command . regress y . newey y, lag(k) Theestimatedinterceptisthesamplemeanofy.Whilethiscouldhavebeencalculatedusingother methods,suchasthesummarizecommand,usingtheregress/neweycommandisusefulasthen afterwardsyoucanusepostestimationcommands,includingpredict.

RegressionFitandResiduals
Tocalculatepredictedvalues,usethepredictcommandaftertheregressorneweycommand . predict p Thiscreatesavariablepofthefittedvaluesxbeta. Tocalculateleastsquaresresiduals,aftertheregressorneweycommand . predict e, residuals Thiscreatesavariableeoftheinsampleresidualsyxbeta. Youcanthenplotthefitversusactualvalues,andaresidualtimeseries . tsline y p . tsline e Thefirstplotisagraphofthevariablesyandp,assumingthatyisthedependentvariable,andparethe fittedvalues.Thesecondplotisagraphoftheresidualsagainsttime.

DummyVariables
Indicatorvariables,knownasdummyvariables,canbecreatedusinggenerate.Onepurposeistocreate subperiodsandregimes. Forexample,tocreateadummyvariableequaling0forobservationsbefore1984,andequaling1 formonthlyobservationsstartingin1984 . generate d=(t>=tm(1984m1)) Inthisexample,thetimeindexist.Thecommandtm(1984m1)convertsthedateformat1984m1 intoanintegervalue.Thenewvariableisd,andequals0forobservationsupto1983m12,and equals1forobservationsstartingin1984m1. Tocreateadummyvariableequaling1forquarterlyobservationsbetween1990q1and1998q4,and 0otherwise,(andthetimeindexist)use . generate d=(t>=tq(1990q1))*(t<=tq(1998q4)) Thiscommandessentiallygeneratedtwodummyvariablesandthenmultipliedthemtocreatethe variabled.

ChangingInterceptModel
Wecanallowtheinterceptofamodeltochangeataknowntimeperiodwesimplyaddadummy variabletotheregression.Forexample,iftisthetimeindex,thedataaremonthlyandwewanta changeinmeanstartinginthe7thmonthof1987, . generate d=(t>=tm(1987m7)) . regress y d Thegeneratecommandcreatedadummyvariableforthesecondtimeperiod.Theregresscommand estimatedaninterceptonlymodelallowingaswitchintheinterceptinJuly1987. TheestimatedconstantistheinterceptbeforeJuly1987.Thecoefficientondisthechangeinthe intercept.

TimeTrendModel
Toestimatearegressiononatimetrendonly,useregressorneweywiththetimeindexasaregressor. Ifthetimeindexist . regress y t

TrendswithChangingSlope
Hereishowtocreateatrendwhichchangesslopeataspecificdate(forconcreteness1984m1).Usethe generatecommandtocreateadummyfortheperiodstartingat1984m1,andtheninteractitwitha trendnormalizedtobezeroat1984m1: . generate d=(t>=tm(1984m1)) . generate ts=d*(t-tm(1984m1)) Thenewvariabletsiszerobefore1984,andthenisalineartrendafterthat. Thenregressthevariableofinterestontandts: . regress t ts Thecoefficientontisthetrendbefore1984.Thecoefficientontsisthechangeinthetrend. Ifyouwanttheretobeajumpaswellasachangeinslopeat1984m1,thenincludethedummyd . regress t d ts

ExpandingtheDatasetBeforeForecasting
Whenyouhaveasetoftimeseriesobservations,STATAtypicallyrecordsthedatesasrunningfromthe firstuntilthelastobservation.YoucancheckthisbylookingatthedataintheDataEditor.Butto forecastadateoutofsample,thesedatesneedtobeinthedataset.Thisrequiresexpandingthe datasettoincludethesedates.Thisisdonebythetsappendcommand.Therearetwoformats . tsappend, add(12) Thiscommandadds12datestotheendofthesample.Ifthecurrentfinalobservationis2009m12,the commandadds2010m01through2010m12.IfyoulookatthedatausingtheDataEditor,youwillsee thatthetimeindexhasnewentries,through2010m12,buttheothervariablesaremissing.Missing valuesareindicatedbyaperiod.. Theotherformatwhichaccomplishesthesametaskis . tsappend, last (2010m12) tsfmt(tm) Thiscommandaddsobservationssothatthelastobservationis2010m12,andthattheformattingis monthly.Forquarterlydata,toaddobservationsupto2010q4thecommandis . tsappend, last (2010q4) tsfmt(tq)

PointForecastingOutofSample
Thepredictcommandcanbeusedforpointforecasting,solongastheregressorsareavailable.The datasetfirstneedstobeexpandedaspreviouslydescribed,andtheregressioncoefficientsestimated usingeithertheregressorneweycommands. Thecommand . predict p willcreateaseriespofpredictedvalues,bothinsampleandoutofsample.Torestrictthepredicted valuestobeinsample,use . predict p Torestrictthepredictedvaluestoinsampleobservations(forquarterlydatawithtimeindextandthe lastinsampleobservation2009m12) . predict p if t<=tm(2009m12) Torestrictthepredictedvaluestooutofsample(formonthlydatawiththelastinsample2009m12) . predict yp if t>tm(2009m12)

Iftheobservations,insamplepredictions,andoutofsamplepredictionsarey,p,andyp,theycanbe plottedtogether,butasthreedistinctelements,as . tsline y p yp . tsline y p yp if t>tm(2000m12) Thesecondcommandrestrictstheplottoobservationsafter2000,whichisusefulifyouwishtofocusin ontheforecastperiod(theexampleisforquarterlydata).

NormalForecastIntervals
Tomakeanintervalforecastbasedonthenormalapproximation,youneedwhatarecalledthe standarddeviationoftheforecast,whichisanestimateofthestandarddeviationoftheforecast error.Thesearecomputedusingthepredictcommand.Youfirstneedtoestimatetheforecastandsave theforecast.Supposeyouareforecastingthemonthlyvariableygiventheregressorsxandz,the insampleendsin2009m12andwemakethefollowingcommands . regress y x z . predict p if t<=tm(2009m12) . predict yp if t>tm(2009m12) Thenyouadd . predict s if t>tm(2009m12), stdf Thiscreatesavariablesfortheforecastperiodwhoseentriesarethestandarddeviationofthe forecast.Nowyoumultiplythisbyastandardnormalquantileandaddtothepointforecast . generate yp1=yp-1.645*stdf . generate yp2=yp+1.645*stdf Thesecommandscreatetwoseriesfortheforecastperiod,whichequaltheendpointsofaforecast intervalwith90%coverage.(1.645and1.645arethe5%and95%quantilesofthenormaldistribution).

EmpiricalForecastIntervals
Tomakeanintervalforecast,youneedtoestimatethequantilesoftheresidualsoftheforecast equation.Todoso,youfirstneedtoestimatetheforecastandsavetheforecast.Supposeyouare forecastingthemonthlyvariableygiventheregressorsxandz,theinsampleendsin2009m12 andwemakethefollowingcommands

. regress y x z . predict p if t<=tm(2009m12) . predict yp if t>tm(2009m12) . predict e, residuals Nowwewanttocalculatethe25%and75%quantilesoftheresidualse.Thiscanbeaccomplished usingwhatiscalledquantileregressionwithjustanintercept.TheSTATAcommandisqreg.Theformat issimilartoregress,butyouhavetotellSTATAthequantileyouwanttoestimate. . qreg e, quantile(.25) Thiscommandcomputesthe25%quantileregressionofeonanintercept(asnoregressorsare specified).TheCoef.Reportedinthetableisthe.25quantileofe.Nowyoucancomputetheoutof samplevalues,andaddthemtothepointforecastyptocreatethelowerpartoftheforecastinterval . predict q1 if t>tm(2009m12) . generate yp1=yp+q1 Thepredictcommandusesthelastestimationcommandinthiscaseqregtocomputetheforecast. Inthiscaseitiscomputingtheoutofsample.25quantileofe. Youcanrepeatthisfortheupperforecastintervalendpoint. . qreg e, quantile(.75) . predict q2 if t>tm(2009m12) . generate yp2=yp+q2 Thevariablesyp1andyp2aretheoutofsampleforecastintervalendpointsfory.Youcanplotthe datatogetherwiththeoutofsamplepointandintervalforecasts,e.g. . tsline y yp yp1 yp2 if t>tm(2000m12) Forafanchart,yourepeatthisformultiplequantiles.

ConditionalForecastIntervals
Theqregcommandmakesiteasytocomputetheforecastintervalendpointsconditionalonregressors. Thisisaquiteadvancedtechnique,soIdonotrecommenditwithoutcare.Butthisishowitcanbe done.Asintheprevioussection,supposeyouareforecastingygivenxandz,haveforecast residualse,andoutofsamplepointforecastyp.Nowyouwantoutofsampleconditionalquantiles

ofegivensomeregressors.Supposethatyouthinkthatxhaspredictivepowerforthequantilesof e.Youcanusethecommandsforthe.25quantile . qreg e x, quantile(.25) . predict q1 if t>tm(2009m12) . generate yp1=yp+q1 andsimilarlyforthe.75quantile. Thismethodmodelsthequantilesofeasfunctionsofx.Thiscanbeusefulwhenthespread (variance)ofthedistributionchangesovertime.

S-ar putea să vă placă și