Documente Academic
Documente Profesional
Documente Cultură
Introduction
Package zoo provides methods for dealing with totally ordered indexed observations.
In other words it allows creating and manupulating time series. However the main
objective of this package is handling Irregualr Time Series where the base package of
R doesnt allow in handling irregulalry spaced obeservations. Therefore key feature
is that the independence of particular index/date/time. Not only that this zoo
package is consistant with ts and base R.
From an overall perspective package zoo can be classified into six categories.
1. Creating/Defining a time series (can be both regular and irregular).
2. Wrinting or reading a time series to or from a text document (txt, csv etc.).
3. Manipulating a time series using various functions
4. General functions that heps to handle a zoo series
2.1
From this section onwards main emphasis will be given to irrgularly spaced data
series. However as mentioned above package zoo allows to create both regular and
irregular time series.
Regular will be created using the function zooreg (which is not discussed in
this document)
Irregular will be created using the function zoo
When it comes to function zoo, there are three main arguments.
1. x: a numeric vector or a matrix of time series data
2. order.by: an index vector with unique entries by which the observations in x
are ordered
3. frequency: a numeric indicating the frequency of order.by. If specified zoo
will return a regular time series
>
>
>
>
>
>
>
>
n = 40
# Generating 40 random values from a exponential distribution
# and getting there cumulatinve sum
timeIndex = cumsum(rexp(n, rate = 0.05))
rTimeIndex = round(timeIndex, digit=0)
# Creating dates for 40 days
irts.Date = as.Date("2010-01-01") + rTimeIndex
# Generatin 40 random values from a normal distribution with
2.2
Objectinve writing and reading a zoo series is saving zoo series in text file and retrieving it from that text file once needed. This can be done using the functions
write.zoo and read.zoo respectively.
write.zoo: here we need to specify the name of the zoo series, seperator and the
name of the text file
read.zoo: in order to read we need to specify a the name of the text file, format
of the index (which was saved in the text file) and whether there is a header or
not
> # Writing the zoo series to a text file in order to use them
> # in the future
> #write.zoo(IRTSzoo, sep=" ", "IRTSdata.txt")
> # Reading zoo series from the text file
> # After reading from the text file then it is assigned to the
2.3
Mainly there are five functions which allows us to manupulate a zoo series in different
way.
1. coredata: using which we could extract or replace data in a zoo series
2. index: using which we could extract or replace time index/dates in a zoo series
3. as.yearmon: represents a monthly zoo series
4. as.yearqtr: represents a quarterly zoo series
5. aggregate: provides the summary statistics of a zoo series (monthly means,
monthly sums etc.)
2.3.1
coredata():
20.02
18.69
21.00
20.31
15.75
17.69
20.53
24.03
18.42
21.05
17.60
21.48
2.3.2
index():
2.3.3
yearmon():
2.3.4
"Apr
"Aug
"Oct
"May
"Nov
"Feb
"Mar
2010"
2010"
2010"
2011"
2011"
2012"
2012"
"Apr
"Aug
"Oct
"Aug
"Dec
"Feb
"Mar
2010"
2010"
2010"
2011"
2011"
2012"
2012"
"Jun
"Sep
"Nov
"Aug
"Jan
"Mar
"Apr
2010"
2010"
2010"
2011"
2012"
2012"
2012"
"Jun
"Sep
"Feb
"Aug
"Jan
"Mar
"May
2010"
2010"
2011"
2011"
2012"
2012"
2012"
"Jun
"Sep
"Mar
"Sep
"Feb
"Mar
2010"
2010"
2011"
2011"
2012"
2012"
"Aug
"Oct
"Apr
"Oct
"Feb
"Mar
2010"
2010"
2011"
2011"
2012"
2012"
yearqtr():
"2010
"2010
"2010
"2011
Q2"
Q3"
Q4"
Q3"
"2010
"2010
"2011
"2011
Q2"
Q3"
Q1"
Q3"
"2010
"2010
"2011
"2011
Q2"
Q3"
Q1"
Q4"
"2010
"2010
"2011
"2011
Q2"
Q3"
Q2"
Q4"
"2010
"2010
"2011
"2011
Q2"
Q4"
Q2"
Q4"
"2010
"2010
"2011
"2012
Q3"
Q4"
Q3"
Q1"
"2010
"2010
"2011
"2012
Q3"
Q4"
Q3"
Q1"
[29] "2012 Q1" "2012 Q1" "2012 Q1" "2012 Q1" "2012 Q1" "2012 Q1" "2012 Q1"
[36] "2012 Q1" "2012 Q1" "2012 Q1" "2012 Q2" "2012 Q2"
2.3.5
aggregate():
In the following code first attempts to calculate monthly means by splitting the zoo
series into subsets months. Then it extracts monthly means via the function aggregata. This can be consedered as indirect way of coverting an irregular time series to a
regulat one. Similarly using the second function quarterly means have been calculated
and thus it is similar to converting an irregular time series to a regular quarterly time
series.
> #### Computing summary statistics using function "aggregate"
> monMeans = aggregate(IRTS, as.yearmon(index(IRTS)), mean); monMeans
Apr 2010
17.88500
Apr 2011
19.86000
Feb 2012
18.83750
Jun 2010
20.16333
May 2011
19.41000
Mar 2012
19.78000
Aug 2010
20.19000
Aug 2011
22.61333
Apr 2012
21.48000
Sep 2010
19.96000
Sep 2011
21.25000
May 2012
16.62000
Oct 2010
18.93000
Oct 2011
20.55000
Nov 2010
21.05000
Nov 2011
21.00000
Feb 2011
22.79000
Dec 2011
20.53000
Mar 2011
20.36000
Jan 2012
20.95000
2.4
In this section three functions of the package zoo was considered. First attempts
to check whether the zoo series is a regular one or not. Second and third are the
lagging and differecing functions of a time series. These function will be discussed in
the following sub sections.
2.4.1
The function is.regular function check whether a series of ordered observations has
an underlying regularity or is even strictly regular. The function is.regular takes
two arguments. First is the object representing the series of ordered observations.
Second is a logical argument whether we need to check strict regularity or not.
A time series can either be irregular (unequally spaced), strictly regular (equally
spaced) or have an underlying regularity, i.e., be created from a regular series by
omitting some observations. Here, the latter property is called regular. Consequently,
regularity follows from strict regularity but not vice versa. Thus if we are to use
is.regular for an irregular time series we need to have the logical strict always as
TRUE.
>
>
>
>
[1] FALSE
2.4.2
In time series analysis lagging and differencing are two of the most frequently used
techniques. When it comes to package zoo, it too allows us to use zoo series with the
7
functions lag and diff that is being used in the base package. Precisely speaking
function diff comes under base package and the function lag comes under stats
packagr.
With regard to the function we need to specify two arguments for both the functions
mentioned above. First is the zoo series we created and then for function lag we need
to specify number of lags as the second argument. Similarly we need to specify the
order of difference when it comes to function diff as its secnond argument.
However there is an additional term in both functions when it is used under package
zoo (which is not available in the base or the stats packages). That is the logical
argument na.pad. If na.pad is TRUE it adds any times that would not otherwise
have been in the result with a value of NA. If FALSE those times are dropped.
Following R code demonstrates the use of functions lag and diff in relation to an
irregular time series under the package zoo.
2.4.3
lag():
2010-04-18
18.42
2010-09-14
20.47
2011-02-01
20.36
2011-09-30
20.55
2012-02-20
17.53
2012-03-27
2010-06-06
22.58
2010-09-15
19.38
2011-03-07
19.86
2011-10-14
21.00
2012-02-26
18.21
2012-03-30
2010-06-10
19.49
2010-09-28
20.41
2011-04-04
19.41
2011-11-09
20.53
2012-02-29
18.58
2012-04-19
2010-06-21
21.14
2010-10-06
18.69
2011-05-16
22.83
2011-12-04
17.60
2012-03-06
17.60
2010-08-15
21.33
2010-10-11
17.69
2011-08-01
22.20
2012-01-21
24.30
2012-03-08
19.86
2010-08-23
18.10
2010-10-29
21.05
2011-08-16
22.81
2012-01-28
17.71
2012-03-12
18.30
20.31
24.03
21.48
16.62
2.4.4
2010-04-18
18.42
2010-09-14
20.47
2011-02-01
20.36
2011-09-30
20.55
2012-02-20
17.53
2012-03-27
24.03
2010-06-06
22.58
2010-09-15
19.38
2011-03-07
19.86
2011-10-14
21.00
2012-02-26
18.21
2012-03-30
21.48
2010-06-10
19.49
2010-09-28
20.41
2011-04-04
19.41
2011-11-09
20.53
2012-02-29
18.58
2012-04-19
16.62
2010-06-21
21.14
2010-10-06
18.69
2011-05-16
22.83
2011-12-04
17.60
2012-03-06
17.60
2012-05-13
NA
2010-08-15
21.33
2010-10-11
17.69
2011-08-01
22.20
2012-01-21
24.30
2012-03-08
19.86
2010-08-23
18.10
2010-10-29
21.05
2011-08-16
22.81
2012-01-28
17.71
2012-03-12
18.30
2010-08-15
0.19
2010-10-11
-1.00
2011-08-01
-0.63
2012-01-21
2010-08-23
-3.23
2010-10-29
3.36
2011-08-16
0.61
2012-01-28
diff():
2010-04-18
2.67
2010-09-14
0.44
2011-02-01
-2.43
2011-09-30
2010-06-06
4.16
2010-09-15
-1.09
2011-03-07
-0.50
2011-10-14
2010-06-10
-3.09
2010-09-28
1.03
2011-04-04
-0.45
2011-11-09
2010-06-21
1.65
2010-10-06
-1.72
2011-05-16
3.42
2011-12-04
-1.56
-0.70
0.45
-0.47
-2.93
6.70
-6.59
2012-02-11 2012-02-20 2012-02-26 2012-02-29 2012-03-06 2012-03-08 2012-03-12
4.19
-4.37
0.68
0.37
-0.98
2.26
-1.56
2012-03-14 2012-03-27 2012-03-30 2012-04-19
2.01
3.72
-2.55
-4.86
>
>
>
>
2010-04-08
-4.27
2010-08-24
1.93
2010-11-04
1.74
2011-08-20
-1.56
2012-02-11
4.19
2012-03-14
2.01
2010-04-18
2.67
2010-09-14
0.44
2011-02-01
-2.43
2011-09-30
-0.70
2012-02-20
-4.37
2012-03-27
3.72
2010-06-06
4.16
2010-09-15
-1.09
2011-03-07
-0.50
2011-10-14
0.45
2012-02-26
0.68
2012-03-30
-2.55
2010-06-10
-3.09
2010-09-28
1.03
2011-04-04
-0.45
2011-11-09
-0.47
2012-02-29
0.37
2012-04-19
-4.86
2010-06-21
1.65
2010-10-06
-1.72
2011-05-16
3.42
2011-12-04
-2.93
2012-03-06
-0.98
2012-05-13
NA
2010-08-15
0.19
2010-10-11
-1.00
2011-08-01
-0.63
2012-01-21
6.70
2012-03-08
2.26
Following time series plot represents the original time series and was plotted in order to
make comparison with the time series plot which was created with an order difference
1.
> plot(IRTS, xlab="Index", ylab="Values",
+
main="Time Series Plot - without differencing")
> points(IRTS, col="red", pch=20)
10
2010-08-23
-3.23
2010-10-29
3.36
2011-08-16
0.61
2012-01-28
-6.59
2012-03-12
-1.56
20
16
18
Values
22
24
2011
2012
Index
11
2
0
2
6
2011
2012
Index
2.5
This section mainly focus on interpolating or predicting the missing values of a time
series. Mainly there are 5 methods which aid us doing the same. They are as follows,
1. na.locf: this method replaces each NA with the most recent non-NA prior
to it. We could replace the NA from the following non-NA by setting logical
arument to TRUE.
2. na.fill: this function fills NA values or spcified positions.In this function
12
mainly we need to specify two arguments of which first to the object which
was created under zoo series. Second is the arguments fill which is a three
component list or a vector that is coerced to a list. The three components represent the fill value to the left of the data, within the interior of the data and
to the right of the data, respectively. The value of any component may be the
keyword extend to indicate repetition of the leftmost or rightmost non-NA
value or linear interpolation in the interior.
3. na.approx: is a function which replaces NA values via linear interpolation.
4. na.spline: is a function which replaces NA values via cubic spline interpolation
5. na.StructTS: fills NA values using Season Kalman Filter. Ideal when you
have seasonal variations in the time series data and mainly works with regular
time series as the input object should have a frequency.
Following code demonstrates each of the interpolating methods discussed above. However the 5th menthod, that is estimation via seasonal Kalman Filter doesnt work with
this particular irregular time series as there is no frequency present ins this data.
In order to make a comparison among the alternative interpolating methods the 3rd
value of the time series was changed to a NA. Then using each of the interpolating
methods the 3rd value of the series was estimated and those estimated values are
shown in the time series plot given below.
>
>
>
>
>
>
>
>
13
> #na.StructTS(newIRTS)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
+
+
>
>
>
>
14
24
22
20
18
Value
16
na.locf
na.fill
na.approx
na.spline
Actual
2011
2012
Index
2.6
Package zoo allows to plot different types of plots whether it is a regular or irregular
time series. Mainly there are 3 plots and they are,
plot(): the usual plot function that we used in R and I have already used it in
this document to plot the time series plots under differecing function.
ggplot(): requires the package ggplot2. Here we need to specify three main
arguments. aes represents the axes, data can be specified using the function
fortify where it takes a zoo object and converts it into a data frame and can
be used as given in the R code below. Then we need to specify geom whether
15
Values
22.5
20.0
17.5
201007
201101
201107
Index
16
201201