Revisions Analysis and The Role of Metadata: Andreas Lorenz

Andreas Lorenz Statistics Department Deutsche Bundesbank
Revisions analysis and the role of metadata*

Contribution to the OECD/Eurostat Task Force on Performing Revisions Analysis for Sub-Annual Economic Statistics
Abstract Revisions analysis has become a widely used tool for describing many dimensions of the quality of economic indicators such as the reliability of first estimates and the size and volatility of later revisions. An often neglected aspect in this exercise is the role that metadata plays in the calculation of and use of the results of revisions analyses. The purpose of this paper is to highlight the importance metadata has when carrying out and using revisions analysis. This is done in the form of a case study based on real-time data of the German Index of Industrial Production for the period 1999 to 2007, comprising unadjusted and seasonally adjusted vintages. Changes in the revision pattern can be traced back to changes in the collection and compilation methods and to some extraordinary single events in the period under investigation. Furthermore it can be seen that, without metadata, past revisions analysis may be misleading when making inferences about future revisions. Keywords: revisions analyses, real-time data, metadata, industrial production. JEL classification: C19, C80, C82. Contact: andreas.lorenz@bundesbank.de
This paper presents the authors personal opinions and does not necessarily reflect the view of the Deutsche Bundesbank or its staff. The joint OECD/Eurostat Task Force was established to develop a set of guidelines and best practices for performing and using the results of revision analysis.
Page 1 of 17
Introduction
Revisions analysis has become a widely used tool for characterising many dimensions of the quality of economic indicators such as the reliability of first estimates and the size and volatility of later revisions. The increasing availability of real-time data bases1 and of user-friendly tools2 for performing revisions analyses is fostering their widespread use. Often, the results of such analyses are used to assess the quality of official statistics and for making comparisons across countries. Even more, they are now commonly used for building expectations about future revisions and this influences current analysis and forecasts of economic developments.3 An often neglected aspect in this exercises is the role that metadata plays in performing revisions analyses and in the use of their results. Perhaps the reason is that metadata are sometimes referred to as data that merely act as identifiers and descriptors of the data which are needed to identify, use and process data matrixes and cubes.4 Of course, such a view is much too narrow. According to internationally agreed standards, the metadata contains information about methods used in the collection and generation of data.5 In the field of economic statistics, this would include information about the methodology underlying the collection and compilation of economic indicators, information about the quality of first and consecutive releases (ie amount of missing values in a provisional release) as well as information regarding the revisions policy. The purpose of this paper is to highlight the importance of metadata for performing and using revisions analyses. This is done in the form of a case study based on real-time data of the German Index of Industrial Production for the period 1999 to 2007, comprising unadjusted and seasonally adjusted vintages. For this indicator, a revisions analysis will be performed under the assumption that the user does not have any information about the data other than that contained in the figures themselves. The results will then be contrasted with the ideal case in which the user has knowledge of the most important metadata. It will be shown that the conclusion derived from the revisions analysis depends heavily on the metadata provided. The outline of the paper is as follows. Section 2 describes the origin of the real-time data set that is used for the case study. Section 3 includes a revision analysis for both of the aforementioned assumptions about the information set available to the user. Section 4 concludes.
1
2 3
See McKenzie (2006) and website of the OECD/Eurostat Task Force (http://www.oecd.org/document/10/0,3343,en_2649_34257_39129226_1_1_1_1,00.html) for a listing of various publicly accessible real time data bases. See the tool developed by the OECD/Eurostat task force available via the following internet site: http://www.oecd.org/document/27/0,3343,en_2649_34257_40010971_1_1_1_1,00.html. An example is given in the article Odd numbers from The Economist from January 31st 2008, which uses the results of revisions analysis in this way as illustrated by the following excerpt: Between 1994 and 2004 - years for which figures are no longer likely to be much updated - the average (annualised) revision to the growth rate between the advance and the latest figure was 1.3 percentage points. Recently revisions have tended to be downwards. In the past five years 60% of initial estimates were later restated at a lower rate. For example, the International Organization for Standardisation (ISO) definition of metadata is as general as data that defines and describes other data (ISO/IEC 11179-1, 1999(E): Information technology Specification and standardisation of data elements - Part 1: Framework for the specification and standardisation of data elements, First edition 1999-12-01. See Data and Metadata Reporting and Presentation Handbook, OECD (2007), p. 75.
Page 2 of 17
Data issues
For the purpose of the case study, it would be helpful to be able to base the analysis on both unadjusted as well as on seasonally adjusted data. Since some major methodological changes in the compilation of production statistics took place at the beginning of 1999, it would also be helpful if the vintages start at least at that date and go up to the present. Ideally, the data should be extracted from a comprehensive real-time data base.6 While the Bundesbank has developed a real time data base covering a broad selection of economic indicators, including production statistics, it only began storing the vintages from November 2005 onwards.7 Therefore, other sources had to be used for the vintages published before that date. Seasonally adjusted vintages8 from January 1999 to October 2005 were taken from the Statistical Supplement 4 to the Monthly Report, a periodical which is published by the Bundesbank on a monthly basis and includes, among others, seasonally adjusted time series of production statistics. The unadjusted data used in the analysis were taken from a print publication of the FSO.9 The series analysed is production in the manufacturing sector. The manufacturing sector differs from the industry sector as it is defined in the Statistical Supplement of the Bundesbank, where it comprises the manufacturing sector ex energy as well as the mining and quarrying sector ex energy producing materials. Unfortunately, unadjusted vintages for industrial production according to this definition are not available for the period before November 2005. Therefore, the closest equivalent unadjusted aggregate series from the FSO publication was used.10 Although the two series differ in their composition regarding the inclusion or exclusion of energy, due to the relatively low weight of the energy sector in Germany the discrepancies are small. This difference notwithstanding, the focus of the exercise is not to decompose revisions stemming from revisions of unadjusted data and revisions of seasonal and calendar factors, a case where the difference could indeed be significant. Some further caveats are necessary. The fact that a bulk of the data had to be entered manually from print publications (particularly seasonally adjusted data and unadjusted data for vintages up to October 2005) makes the resulting real-time data set prone to typing errors. Although the data was double-checked, the data collection process for the present case study was not subject to the same scrutiny as the figures published in the Statistical Supplement regularly are. Finally, the resulting real-time data set does not have a purely symmetric triangular shape. Such a symmetric form would result if a) the publication schedule of the producer of the data has the same frequency as the periodicity of the indicator and b) the figures are released regularly for each reporting period. Neither was the case in the period of investigation. Data for January 1999 were not
6 7 8 9 10
For some recommendations on data and metadata requirements for building a real-time database to perform revisions analysis of real-time data see McKenzie and Gamba (2008a). The real-time database of the Deutsche Bundesbank will be made publicly available via the internet (http://www.bundesbank.de/index.en.php) in the course of the year 2008. The seasonal adjustment includes a calendar adjustment procedure. Statistisches Bundesamt, Fachserie 4 Reihe 2.1, various issues. Actually, in December 2003 the FSO began publishing an equivalent aggregate series. However, in order to avoid a break in the series, the closest equivalent series is used for the whole period of investigation.
Page 3 of 17
released in March (as was the case in later years), but together with the figures for February in May of the same year. The result is that the shape of the real-time data set deviates from a symmetric triangle. Furthermore, until the end of 2005, the production statistics were released twice a month. As the Statistical Supplement and the print publication from the FSO that was used to collect the unadjusted data have a monthly frequency, this second vintage in a month is missing in the assembled real-time data set.11 However, the second release was usually only a correction of the first provisional release on the basis of the data from late respondents. Therefore, its information content is included in the vintage of the following month, which includes the revision to the provisional release of the previous reporting month, the new provisional figure of the current reporting month and, possibly, revisions of figures from other reference periods (more details regarding the typical revisions cycle are given in section 3.2.) This procedure was changed in 2006, when the FSO began to publish the revision to the first estimate together with the estimate for the following month, so that there is now only one release per month by the FSO. This means that the preliminary release for the reporting month and the first revision of the preliminary release for the previous month are published at the same time in one vintage. 3 Revisions analysis and the role of metadata
While the applications of revisions analysis are manifold, their usefulness can be seen from two basic perspectives. From the point of view of the producer, they are primarily an instrument for quality monitoring. From the user perspective, they are helpful for building expectations about future revisions of provisional figures in order to come to a better understanding of the current economic momentum, flash estimates or forecasts. The following case study aims to investigate the role the knowledge of metadata can have when interpreting the results of revisions analysis from both perspectives. Some words are in order regarding what kind of metadata is relevant for the case study. In recent years, a number of different initiatives have been involved in the development of standards as to what is to be considered an element of the metadata dimension of time series. One of these initiatives, the Statistical Data and Metadata Exchange (SDMX) initiative, specifies the term in more detail and makes a distinction between structural and reference metadata. Structural metadata are metadata that act as identifiers and descriptors of the data which are needed in order to identify, use and process data matrixes and cubes. Reference metadata include a) conceptual metadata describing the concepts used and their practical implementation, allowing users to understand what the statistics are measuring and, thus, their fitness for use; b) methodological metadata, describing methods used for the generation of the data (eg sampling, collection methods, editing processes); c) quality metadata, describing the different quality dimensions of the resulting statistics (eg timeliness, accuracy).12 For the case study, the focus is on methodological metadata. But before summarising the most important metadata, the exercise will focus on the (not so unrealistic!) case in which the user only
11 12
In the real-time database mentioned in footnote 7, beginning with November 2005 each vintage is stored the day it is released by the FSO. See SDMX (2006).
Page 4 of 17
has the information contained in the vintages themselves. The purpose is to show the limits of the interpretation of revisions analysis in the absence of metadata. An alternative analysis and presentation of the metadata will be given in sub-section 3.2. 3.1 Revisions analysis without metadata
The revisions analysis will focus on month-on-previous-month (mom) growth rates that have been calculated on the basis of seasonally adjusted vintages. The aim is to answer two questions: 1. What does revisions analysis tell us about the extent of revisions at different time periods (beginning with the preliminary releases and their consecutive revisions)? 2. Can this information be used to build expectations about future revisions of preliminary estimates? The answer to the first question gives insight into the effect of revisions of an indicator for a particular reporting period and would be of particular value for producers of official statistics in the process of quality control. The usefulness revisions analysis would probably have for most consumers of economic indicators would presumably depend on its suitability to answer the second question, ie to gain information for building expectations about future revisions. In particular, how good the information set contained in the data themselves is for the quality of these expectations will be of interest. With both questions in mind, the next step is to select an adequate revision measure from the standard possibilities mentioned in McKenzie and Gamba (2008b). A useful measure of the size of the revisions is the Mean Absolute Revision (MAR) which avoids offsetting effects on the indicator from positive and negative revisions:
MAR = 1 n 1 n L t Pt = R t n t =1 n t =1
(1)
Lt denotes the later estimate, Pt is the preliminary (or earlier) estimate, Rt = Lt Pt is the revision and n is the number of observations. As the absolute revision can vary in proportion to the level or the mom growth rate of the indicator, it may be helpful to scale the MAR in terms of the size of the earlier estimates when doing comparisons over time. (This would also be useful for complementing international comparisons because it adjusts for the differing average size of estimates across countries.) An MAR adjusted in such a way, relative mean absolute revision (RMAR), can also be interpreted as a measure of robustness for interpreting revisions of first published estimates as it gives the expected percentage of first published estimate that will be revised over the revision interval being considered:
Page 5 of 17
RMAR =
L
t =1 n t =1
Pt
Lt
R
t =1 n t =1
Lt
(2)
Figure 1 shows a yearly breakdown of the RMAR between estimates of the IIP mom rates at various revision intervals. One result is that the size of the revisions typically increases with the length of the interval being analysed. This is seen most clearly when the averages are calculated for the years 1999 to 2006. With the exception of the year 2005, the RMAR after one month is usually similar to the RMAR values calculated after 6, 12 or 24 months. A key message of figure 1 is that, on average, more than 45% of the initial mom growth rate is revised after 12 months or more. Relative mean absolute revision (RMAR) between estimates of the IIP at various revision intervals for mom growth rates
Seasonally adjusted, percentage points
Figure 1
0.80 0.70 0.60 *) 0.50 0.40 0.30 0.20 0.10 0.00 1999 *) 2000 2001 2002 2003 2004 2005 2006 1999-2006 RMAR between first estimate and 1 month later RMAR between first estimate and 6 months later RMAR between first estimate and 12 months later RMAR between first estimate and 24 months later
*) MAR between first estimate and 24 months later not yet available for 2006 at the time of writing.
Source: Own calculations based on seasonally adjusted vintages. While the RMAR gives a good impression of the overall size of the revisions regardless of their sign, it does not indicate whether revisions to first releases of different reporting periods have a tendency to cancel out over the revision interval under consideration. One measure that gives such an indication is the mean revision. In the case of mean revision, positive and negative deviations of the same amount cancel out, giving an indication of the net effect of subsequent revisions on a time series. The Mean Revision (MR) is calculated according to the following formula (McKenzie and Gamba, 2008b):
R= 1 n 1 n (L t Pt ) = R t n t =1 n t =1
(3)
Page 6 of 17
This indicator answers the question Is the average level of revision close to zero, or is there an indication that revisions are more in one direction than another, suggesting possible bias in the initial estimate?. A breakdown of the mean revisions of the IIP by year is given in figure 2. Again, the measure is calculated for the mom growth rate of the first preliminary release of the IIP and its value 6, 12 and 24 months later. Mean revision between estimates of the IIP at various revision intervals for mom growth rates
Seasonally adjusted, percentage points
Figure 2
0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 *) 0.00 -0.05 -0.10 1999 2000 2001 2002 2003 2004 2005 2006 1999-2006 Mean revision between first release and 1 month later Mean revision between first release and 12 months later Mean revision between first release and 6 months later Mean revision between first estimate and 24 months later *)
*) Mean revision between first estimate and 24 months later not yet available for 2006 at the time of writing.
Source: Own calculations based on seasonally adjusted vintages. The following main aspects can be observed: 1. All in all, the mean revisions are below 1/4 percentage point (pp), apart from for 1999. 2. The first mean revision to the mom rate, which takes place between the first estimate and the figure released one month later, is positive with the exception of the year 2000. 3. The extent of the revision between taking place after 6 months is much higher than that after 1 month. Sometimes most of the revisions are seen 12 or even 24 months later.
Page 7 of 17
4. When averaging over the whole period under observation, the positive mean revision of the mom rate after one month is 0.1 pp after rounding, the mean revision after 6, 12 and 24 months is well above 0.1 pp.13 In summary, on the basis of the average revision of the preliminary release and the figure released one month later in the years 1999 to 2006, the user may expect an upward revision of the mom rate of 0.1 pp after rounding. (Recall that the exercise assumes that the user does not have any other information than that contained in the vintages). The source of the revisions can be revisions to the unadjusted data as well as revisions to seasonal factors. 3.2 Revisions analysis with metadata
What difference would knowledge of the metadata make to the user for understanding past revisions and making expectations about future revisions? To answer this question, the following section summarises the most important methodological issues and idiosyncratic factors in the period under investigation. Afterwards, changes in the metadata (such as changes in the compilation process of the production statistics) will be compared with historical changes in the revisions regime. Finally, the implications of this exercise will be looked at, especially the inclusion of these metadata in the information set of the user for forming expectations about future revisions. 3.2.1 Methodology for collecting and compiling the production statistics
The following section gives a summary of some important methodological issues regarding the compilation of production statistics. In the year 1999, a new survey method was introduced for the compilation of the production statistics. With the aim of lowering the statistical burden on the enterprises, the full reporting sample of the production survey was split into mutually exclusive quarterly and monthly reporting subsamples. In the case of the monthly reporting sub-sample, in each of the Lnder (the German federal states) the largest production units of the economic sectors, covering at least 75% of the sector output produced by firms with 20 or more employees, were obliged to submit a monthly production report. This ensured a national coverage in excess of 80%.14 While the largest firms had to report monthly, the smaller firms only had to report at quarterly intervals. In order to give a closer representation of the monthly production of all enterprises, the monthly figures were benchmarked with the results from the full quarterly sample comprising small and large firms. This was done by comparing the results from the quarterly survey with the quarterly aggregate computed on the basis of the monthly figures and calculating an alignment factor. This factor can only be calculated ex post, ie about 2 months after the end of the past quarter. For preadjusting the most recent monthly figures, the alignment factor of the past quarter was used as an estimate of the current alignment factor (and in the first quarter of year t the factor of the first quar13 14
When applying a modified t-Test, the mean revision after 1 month is marginally significant at the 15% level, the mean revision after 6, 12 and 24 months is significant at the 5% level. See Bald-Herbel (2000), Herbel and Weisbrod (1999) and Jung (2003).
Page 8 of 17
ter in year t-1 was used). After the information of the most recent quarterly survey becomes available, the alignment factor of the past month is replaced with the alignment factor calculated for the most recent data and the pre-aligned figures of the respective quarter are revised accordingly. Since this second revision of monthly figures is based on information from the quarterly survey, it is called quarterly revision.15 It affects the three months of the respective quarter. After the conclusion of the quarterly report for the final quarter of the year, a third revision, the socalled annual revision of all monthly figures of the respective year is performed. After this third revision, the IIP is considered as final. (After that, further changes can be made in the course of rebasing or back-calculation according to a new classification system16). The provisional monthly figures are published according to a schedule fixed in advance for the entire year. It was scheduled about 37 days after the end of the reporting period (t+37). Due to the fact that not all monthly reporting enterprises submitted their report in time for the provisional release (about 10% were usually missing), the first revision to the provisional figure took place after further monthly reports had been received in the same publication month (about t+57 to 62 days). This revised figure is the result of incorporation of the data of late respondents. The yearly revision took place about 2 months after the last quarter of the year. The three later releases for the same month, the corrected figure, the quarterly and yearly revisions, do not follow an exact pre-defined schedule.17 Further insight into the revisions process is gained by taking a closer look at the imputation practice for missing values in the provisional release. With respect to the origins of the provisional figure, local reporting units of the enterprises which report monthly have to submit their output figures after the end of the reporting month to the statistical office of the respective federal state. In case the production data is not available on time, the local reporting units should provide a provisional estimate. Only in cases where the local unit does not submit any figure at all on time, does the statistical office of the respective federal state impute the missing data by using the figure reported by the same unit in the previous month.18 The development of monthly figures may diverge from that of quarterly figures for a number of reasons. As the reporting units for the monthly survey are selected only once per year and not updated for the following eleven months, a dying out sample is to be expected. The reason is that any closure of businesses, eg due to bankruptcy, immediately diminishes the size of the sample, whereas
15
16 17 18
From November 1999 to October 2006, the estimated provisional adjustment of monthly industrial production has been stated in a footnote by the Bundesbank in its Statistical Supplement No. 4 to the Monthly Report, Seasonally Adjusted Business Cycle statistics. The treatment of such benchmark revisions in revisions analysis has been investigated by Knetsch and Reimers (2006). In the Bundesbank publication Statistical Supplement No. 4 to the Monthly Report, revisions of original data are flagged with an r. This imputation procedure implies that if the previous month was a month with many working hours, whereas the reporting month is a month with few working hours (eg due to public holidays), then the use of the figure of the previous month would overestimate the figure of the current month in a way that is systematically dependent on the calendar constellation.
Page 9 of 17
new enterprises will only enter into the sample during the annual update.19 However, quarterly figures are based on a survey updated regularly, which partly explains why, apart from the lower cutoff rate, they may diverge from the aggregated monthly figures. While the aforementioned aspects point to recurrent characteristics of the compilation process, there are also period-specific exceptional revisions in some months, quarters and even years. 3.2.2 Revisions analysis of unadjusted data
In order to uncover the revisions to unadjusted data in its pure form, the following revision analysis is carried out on the unadjusted data. This allows the revisions to be seen in a more direct way where they are not influenced by the update of seasonal factors. Furthermore, it will be performed directly on the levels, not on a transformation like the mom growth rate used in the previous section. It would be useful to see the absolute size of the revisions that take place a) over the whole interval between the provisional release and the revised figure released in the yearly correction, as well as b) over different revision intervals in between these two points in time (ie provisional to first correction, first correction to quarterly correction, quarterly correction to yearly correction). In order to disentangle the absolute size of revisions of the whole interval and those across the relevant revision sub-periods, a measure of Cumulative Absolute Revisions (CAR) is computed according to the following formula:
CARi = Lij Pij
j =1 3
(4)
where i denotes the month and j the revision (first, quarterly and yearly).20 These incremental absolute revisions in a particular month can be visualised with a stacked column chart, whose segments are the incremental absolute revisions and whose total height are their accumulation, which is CARi. Such a stacked column chart gives an impression about the absolute size of revisions over the whole revisions interval from the first provisional release to the last revision and also unravels the distribution of the overall absolute revisions to the respective subperiods.
19
20
Actually, even then, new enterprises would only enter into the sample with a time lag of at least 2 to 3 years, because the sampling universe is the enterprises register, which is, in turn, updated with a time lag of 1 to 2 years using secondary sources such as administrative data from the tax authorities. For example, in this formula the incremental absolute revision of the quarterly revision (j=2) is defined as the absolute value of the figure from the quarterly correction minus the figure from the first revision (ie the figure published 20 to 25 days after the provisional release). The CARi of a particular month i is just the sum of the incremental absolute first, second and third revisions of the first provisional release of the respective reporting month.
Page 10 of 17
In order to show how the CAR evolve over time and trace changes in the metadata back to specific periods, figures 3 to 7 depict the CAR on a monthly basis for the years 1999 to 2007. Cumulative relative absolute revisions by reporting period, 1999-2000
Unadjusted data, percent of preliminary figure
Figure 3
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0 Jan Feb Mar Apr May Jun Jul 1999 Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul 2000 Aug Sep Oct Nov Dec
Preliminary release to first revision
First to quarterly revision
Quarterly to yearly revision
Source: Own calculations based on unadjusted vintages published by the Federal Statistical Office. As can be seen in figure 3, the revisions in the year 1999 the year of the change of the collection and compilation methodology are very high. When looking at the decomposition of the CAR by revision interval, the key message is that in the year 1999 the absolute revision of the first revised figure to the quarterly correction is much higher than the revisions at the other intervals. The reason for this is the change in the compilation method. Since the information from the quarterly production survey was not available for the pre-alignment of the monthly indices at the beginning of that year, no pre-alignment took place, which in turn led to high quarterly revisions.21 Only in the course of the year did the results of the quarterly survey become available, thus making them available for use in the monthly indices. From 2000 onwards the information of the quarterly output survey could be used for the monthly index calculation. Accordingly, the revision from the figures of the first correction to the quarterly adjustment is now much smaller, as illustrated in figures 3 and 4. At the beginning of 2002, the update of the sample of the monthly production survey was interrupted due to the introduction of the new Product Classification for Production Statistics, Edition 2002 (GP 2002)". Accordingly, it was not possible to fill in the sample of the monthly survey for the new eligible enterprises. This resulted in greater revisions between the corrected figure and the quarterly survey (which, contrary to the monthly figures, contains the more comprehensive group of enterprises).
21
See Jung (2003), p. 824.
Page 11 of 17
Cumulative relative absolute revisions by reporting period, 2001-2002

3.0
Figure 4
2.5
2.0
1.5
1.0
0.5
Source: Own calculations based on unadjusted vintages published by the Federal Statistical Office.

2.5
Figure 5
2.0
1.5
1.0
0.5
Source: Own calculations based on unadjusted vintages published by the Federal Statistical Office.
Page 12 of 17

2.0
Figure 6
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Source: Own calculations based on unadjusted vintages published by the Federal Statistical Office. Up to the end of the year 2004 the missing values for the provisional figures were imputed by using previously reported figures. This can lead to systematic over/underestimation depending on the calendar constellation, such as in the case of December 2001. With the reporting month January 2005, the method for imputing missing values was changed. The estimate is now based on the assumption that the mom rate for the non-reporting units equals the mom rate of the data received within the deadlines. This contrasts with the previous method for imputing missing values, which was based on figures from the previous month and was thus dependent on the seasonal and calendar constellation. Simulations show that the new estimation method does not induce an over/underestimation that varies systematically with the calendar constellation. Hence, the average revisions derived from the past do not yield to an expected revision measure for the current end. Or in other words, the assumption, that the past revision regime is still valid, is wrong. Therefore, for the time being no expected correction should be used at the current end. Particularly, it should not be expected that the revision pattern within a year depends systematically on the calendar constellation. In the year 2006 there was no major change in the methodology. However, in 2007 the compilation of the production index changed again. Now the Lnder conduct a survey at local manufacturing units with 50 employees or more. There is no cut-off line at the level of the Lnder. A quarterly production index is still calculated by aggregating the data reported for three months by the local manufacturing units with 50 employees or more and the production data of the other local units of enterprises with generally 20 employees or more in industry which are obliged to report quarterly. Furthermore, an update of the enterprises sample for the monthly survey is now done on a monthly basis. As the whole revisions cycle for the figures of the year 2007 was not yet finished at the time
Page 13 of 17
of writing, figure 7 only shows the measures that can be calculated on the basis of the available data. The results suggest that particularly the revision from the first revised figure to the quarterly correction is indeed smaller in absolute size than the average of the previous years. Cumulative relative absolute revisions by reporting period, 2007
Unadjusted data, percent
1.2
Figure 7
1.0
0.8
*)
0.6
0.4
0.2
*)
0.0 Jan Feb Mar Apr May Jun Jul 2007 Aug Sep Oct Nov
Note: *) Quarterly correction for Q4 as well as preliminary to first revision for the month of December and the yearly correction of 2007 not available at the time of writing.
Source: Own calculations based on unadjusted vintages published by the Federal Statistical Office. 3.2.3 Revisions analysis of seasonally adjusted mom growth rates
Along with information about the methodology underlying the collection and compilation of the production statistics, the metadata also comprises information about the timing of revisions. This allows the calculation of the mean revision for different revision intervals for the mom rates of the IIP. Recall that in section 3 the mean revision was calculated after 1, 6, 12 and 24 months. Since the revision intervals do not exactly match the time lag in months, the procedure in section 3 meant that the mean revision between the provisional release and the release 6 months later would in some cases include the yearly correction and in some cases not. Knowledge of the exact dates of these revisions makes it possible to calculate the mean revisions for exact revision intervals. The result of mean revisions between actual revision intervals is depicted in figure 8. Negative mean revisions across all intervals can be seen in the year 2000 and, regarding the revision from the first estimate to the quarterly revision, in the year 2003. Usually, the bulk of the revisions take place after the first month, sometimes, in the years 1999, 2002 and 2006 they take place after the quarterly revision. When averaging over the years 1999 to 2006, particularly the mean revision between the first revised figure and the quarterly correction is positive and higher than that of the year
Page 14 of 17
2007, when the monthly update of the reporting sample was introduced. On average, the yearly revision does not add much information. Mean revision between estimates of the IIP at various revision intervals for mom growth rates
Seasonally adjusted, percentage points 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 1999 2000 2001 2002 2003 2004 2005 2006 2007 1999-2006 *)
Figure 8
Mean revision between first release and 1 month later Mean revision between first estimate and yearly revision
Mean revision between first estimate and quarterly revision
*) Quarterly correction for Q4 as well as preliminary to first revision for the month of December not available. Yearly correction for the year 2007 not available at the time of writing. Source: Own calculations based on data from the Federal Statistical Office. 3.3 Summary
Awareness of the causes of revisions across different intervals over the revisions cycle is helpful for improving compilation methods. As the case study for the German index of industrial production shows, the official statistical institutes have clearly drawn lessons from the revisions history and improved methods for imputing missing values. The revision measures calculated on the bases of unadjusted levels for each release period and for different revisions intervals reveal that revisions are not constant from year to year or between months within a given year. Changes in the revision pattern can be traced back to changes in the collection and compilation methods and to single events like the interruption of the update of the survey of enterprises in the year 2002. While the exercise is helpful for interpreting the results of historical revisions analyses, it also gives useful information for forming expectations about future revisions. For example, up to the end of the year 2004 the missing values for the provisional figures were imputed by using previously reported figures. However, this can lead to systematic over/underestimation depending on the calendar con-
Page 15 of 17
stellation, such as in the case of December 2001. The method for imputing missing values in the provisional release was changed at the beginning of 2005. Now the rate of change of the units reporting on time are used as estimates for the rate of change of non-respondents. This procedure is not dependent on calendar constellation so the over/underestimation which varies systematically with the calendar constellation is not to be expected from 2005 onwards. Summarising, with knowledge of the metadata, the user would have adjusted his expectations of the revisions process in the years 1999 to 2007 1. He would have treated the year 1999 as an outlier in the revisions history, knowing that the information from the quarterly revisions was not available at the beginning of that year and that the pre-alignment factors could not be calculated. On similar grounds, he would have treated the historical revisions that were available at the beginning of 2002 with caution when building expectations about the revisions for the upcoming months, since the update of the enterprises survey was interrupted in that year due to the introduction of the new Product Classification for Production Statistics, Edition 2002 (GP 2002)" 2. He would have taken into account a change in the estimation procedure for imputing missing values in the provisional release figures from January 2005 onwards. This information would speak against a systematic relationship of the revision after one month of the provisional figures and the calendar properties affecting output in German industry. 3. Finally, he would have considered the changeover from a quarterly to a monthly update of the reporting sample in the year 2007. Before that change, revisions between the first revised figure and the quarterly correction were expected due to the phenomenon of a dying out sample within the year in the monthly survey. With the change in the sample method, there is no longer a reason to expect such revisions in the future. These examples show that the revision analysis is very helpful for understanding the revisions within a regime. However, results of revision analysis of a past regime may not be suited for assessing the revision pattern prevailing at present or in the future. Without metadata, past revisions analysis may be misleading for making inferences about future revisions. 4 Conclusions
Knowledge of metadata is crucial for both viewpoints. First, metadata is important for interpreting the results of revisions analysis calculated using historic data. For example, it is important to know the nature of breaks in the series being analysed which are due to changes in the methodology used for collecting and compiling an economic indicator as the revision measure calculated in such a case will be influenced by the break. Furthermore, knowledge of the publication schedule for revisions, which usually depends on the revisions cycle, which itself is dependent on the production process of statistics, is important in order to choose proper revision intervals over which the revision measures will be calculated. Second, knowledge of metadata is crucial for making proper inferences from revisions analysis based on historical data for building expectations about future re-
Page 16 of 17
visions. Expectations on the basis of the implicit assumption of stability of the metadata (ie the methodology underlying the collection and compilation of the statistic) may lead to false conclusions about future developments when compilation methods change. All in all, the results show that metadata, an often neglected dimension of performing and using revisions analyses, is in fact a key element for interpreting the results of revisions analyses. Ideally, the methodology underlying the economic indicators should be an easily accessible dimension of any concise real-time data set.
References Bald-Herbel, C. (2000). Erste Erfahrungen mit dem neuen Konzept des Produktionsindex fr das Produzierende Gewerbe. Wirtschaft und Statistik 6/2000. Herbel, N. and Weisbrod, J. (1999). Auswirkungen des neuen Konzepts der Produktionserhebungen auf die Berechnung der Produktionsindizes ab 1999. Wirtschaft und Statistik 4/1999. Jung, S. (2003). Revisionsanalyse des deutschen Produktionsindex. Wirtschaft und Statistik 9/2003. Knetsch, T. A. and Reimers, H.-E. (2006). How to treat benchmark revisions? The case of German production and orders statistics. Discussion Paper Series 1: Economic Studies, No 38/2006 http://www.bundesbank.de/download/volkswirtschaft/dkp/2006/200638dkp.pdf McKenzie, R. (2006). Performing Revisions and Real-time Analysis. Introducing the Main Economic Indicators Original Release Data and Revisions Data Base, OECD Statistics Briefs no. 12, 2006. http://www.oecd.org/dataoecd/46/48/37669085.pdf McKenzie, R. and Gamba, M. (2008a). Data and metadata requirements for building a real-time database to perform revisions analysis. Contribution to the OECD / Eurostat taskforce on Performing Revisions Analysis for Sub-Annual Economic Statistics. http://www.oecd.org/dataoecd/47/15/40315408.pdf McKenzie, R. and Gamba, M. (2008b). Interpreting the results of Revisions Analyses: Recommended Summary Statistics. Contribution to the OECD / Eurostat taskforce on Performing Revisions Analysis for Sub-Annual Economic Statistics. http://www.oecd.org/dataoecd/47/18/40315546.pdf OECD (2007). Data and Metadata Reporting and Presentation Handbook. http://www.oecd.org/dataoecd/46/17/37671574.pdf SDMX (2006). Metadata Common Vocabulary BIS, ECB, Eurostat, IBRD, IMF, OECD, UNSD, presented on the SDMX website, available at www.sdmx.org The Economist (2008). Odd numbers. January 31st 2008. http://www.economist.com/finance/PrinterFriendly.cfm?story_id=10609385
Page 17 of 17

Revisions Analysis and The Role of Metadata: Andreas Lorenz

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Revisions Analysis and The Role of Metadata: Andreas Lorenz

Încărcat de

Drepturi de autor:

Formate disponibile

Andreas Lorenz Statistics Department Deutsche Bundesbank

Revisions analysis and the role of metadata*

Preliminary release to first revision

First to quarterly revision

Quarterly to yearly revision

See Jung (2003), p. 824.

Cumulative relative absolute revisions by reporting period, 2001-2002

Preliminary release to first revision

First to quarterly revision

Quarterly to yearly revision

Cumulative relative absolute revisions by reporting period, 2003-2004

Preliminary release to first revision

First to quarterly revision

Quarterly to yearly revision

Cumulative relative absolute revisions by reporting period, 2005-2006

Preliminary release to first revision

First to quarterly revision

Quarterly to yearly revision

Preliminary release to first revision

First to quarterly revision

Quarterly to yearly revision

Mean revision between first estimate and quarterly revision

S-ar putea să vă placă și