Sunteți pe pagina 1din 19

The Identification of Important Innovations

Using Tail Estimators*

Carolina Castaldi# and Bart Los+
#University of Utrecht, Department of Innovation Studies, Faculty of Geosciences, P.O. Box 80115, NL-3508 TC
Utrecht, The Netherlands;

+University of Groningen, Groningen Growth and Development Centre, P.O. Box 800, NL-9700 AV Groningen, The

Preliminary version
Please cite this working paper as:
Castaldi, C. and Los, B. (2008), The Identification of Important Innovations Using Tail Estimators,
ISU Working paper 08.07, Innovation Studies, Utrecht University


International differences in economic performance are often attributed to differences in innovative

performance. Much empirical work supports this contention, but problems in quantifying innovative
output prevent researchers from drawing a clear picture. Innovations are very heterogeneous
regarding their importance, with only very few innovations yielding substantial returns. Citation
frequencies are one measure of the value of innovations. We use a recently introduced technique
based on results from Extreme Value Theory to estimate the characteristics of the tail of the
distribution of citations. We identify important innovations as those that receive a number of
citations higher than the ‘cutoff point’ of the tail of the distributions of citations. The data come from
the NBER Patent-Citations Database. We provide estimates of the proportions of important patents
for 31 technological categories and discuss emerging patterns. Possible implications for technology
policy and innovation management are also drawn.

*The authors thank Bart Verspagen for useful discussions and Colin Webb for offering constructive comments on
peculiarities of the NBER Patent-Citations Database. Sponsorship of the European Commission is gratefully

The Identification of Important Innovations
Using Tail Estimators*
1. Introduction

International differences in productivity performance are often attributed to differences in

innovative performance. Much empirical work supports this contention, but problems in quantifying
innovative output prevent researchers from drawing a clear picture. Innovations are very
heterogeneous in terms of their importance and innovation projects are risky and costly endeavors, with
only very few innovation projects characterized by substantial returns. Empirical investigations into the
relationships between innovation and productivity growth have gained popularity among academics,
especially after the advent of the so-called “endogenous growth theory” in the late 1980s and early 1990s.
It attributed a paramount role to innovative activities and their outcomes in growth processes. Since the
output of innovation activities is hard to measure, several indicators have been used in such empirical
studies. Patent indicators feature prominently among these.
In a seminal paper, Griliches (1990) argued that the use of patent counts as innovation output
indicators is riddled with problems. One of the most prominent problems is that the actual impact of
patents is extremely heterogeneous, both within and across industries or technology fields. For
instance, many patents do not relate to a substantial innovation over current practice, but are mainly
applied for by the eventual patentee with strategic considerations in mind. In a recent series of papers
and books, citation counts have been used to take “importance” of patents into account. Jaffe and
Trajtenberg (2002), for instance, contains some classic articles in which several indicators of
“importance” were constructed and used to analyze the innovative performance of firms, universities
and other research institutes. Jaffe and Trajtenberg also initiated the construction of the NBER
Patent-Citations Datafile that includes the data to operationalize their importance indicators for
empirical research. The most basic indicator is the unweighted forward citation count: the more
citations a patent receives in subsequently granted patents, the more important it is considered to be.
In this paper, we use the NBER data and Jaffe and Trajtenberg’s (2002) forward citation count
indicator to identify important patented innovations. To distinguish between important and ‘other’
patents, we use a statistical procedure that was recently introduced by Silverberg and Verspagen
(2007). The main feature of this procedure is that it divides the total set of patents into a subset of
non-important patents for which the frequency distribution of forward citations is governed by a
lognormal distribution and a subset of important patents for which a Pareto (or, power law)
distribution applies. We do not only use the procedures proposed by Silverberg and Verspagen to
study differences between technological categories, we also provide additional information on the
distributions of the two most essential estimators in their analysis: (i) the number of citations that is
estimated to constitute the boundary between the subsets of important innovations and of ‘other’
patents respectively (the cut-off point), and (ii) the estimator of the fatness of the Pareto distribution

characterizing the distribution of citations to the set of important patents. Since nothing is known
about the distributions of these two estimators as applied in the Silverberg-Verspagen procedure, we
rely on bootstrap methods to gain insights into the most important stochastic properties of these
The organization of the paper is as follows. In Section 2, we discuss data issues. The indicator of
patent importance is introduced and the data used to actually construct it are discussed. Section 3
presents our methodology and Section 4 discusses estimation issues. Section 5 presents the results of
our estimations (both concerning the differences between technology categories and concerning the
distributional properties of our estimators) and Section 6 concludes with some policy implications
and further applications..

2. Data Issues

This paper contributes to the relatively recent literature that attempts to capture the importance of
innovations by means of patent citation data. In one of the pathbreaking articles in this tradition, the
basic source of information is succinctly described as follows:

“If a patent is granted, a public document is created containing extensive information about the inventor, her
employer, and the technological antecedents of the invention, all of which can be accessed in computerized form. Among
this information are “references” or “citations”. It is the patent examiner who decides what citations a patents must
include. The citations serve the legal function of delimiting the scope of the property right conveyed by the patent. The
granting of the patent is a legal statement that the idea embodied in the patent represents a novel and useful contribution
over and above the previous state of knowledge, as represented by the citations. Thus, in principle, a citation of Patent X
by Patent Y means that X represents a piece of previously existing knowledge upon which Y builds.” (Jaffe et al.,
1993, p. 580)

As was first confirmed by Trajtenberg (1990), patents that are often cited by later patents are more
important than patents that are virtually never cited. Of course, this importance depends on the
question whether inventors were really aware of the knowledge claimed in earlier patents. An
affirmative answer to this question is not warranted, since expert employees of the patent office are
the main responsible for adding citations. In a recent paper, Jaffe et al. (2000) use results of surveys
among inventors to conclude that citations do give indications (although noisy ones) of spillovers
from the cited invention to the citing invention.
In this paper, we will use data contained in the NBER Patent-Citations Data File to distinguish
between important and less important innovations. In previous work by us (Akkermans et al., 2006),
three measures of importance were studied, i.e. the “number of citations received”, a “measure of
generality” and a “measure of originality”. The first measure was introduced by Trajtenberg (1990),
the latter two by Trajtenberg et al. (1997). For reasons of space, we will focus on the first indicator in

this paper. We will denote the indicator “number of citations received” by NCITING, in line with the
notation adopted by Trajtenberg et al. (1997). This indicator simply supposes that a patent that is cited
more often than another one has had more impact on subsequent technological developments and
can therefore be seen as more important.
We will define important innovations using rankings of patents based on the NCITING indicator
discussed above. A few things should be taken into account before the indicator values for two
arbitrary patents can be compared directly. First, as is well known (see e.g. Cohen et al., 2000), the
propensity to patent innovations differs considerably across technology fields. In this work, we do
classify patents based on technology classes to avoid such problems. It should be kept in mind,
however, that this problem is not entirely irrelevant, since the level of aggregation is relatively high..
Second, not all citations are received at once. Verspagen and De Loo (1999) report that the (skewed)
distribution of citations to patents issued by the European Patent Office applied for between 1979
and 1997 had a mean of 4.67 years. Based on citations to USPTO patents issued during a much
longer period, Hall et al. (2002) even find mean lags of up to 16 years. The consequence of the often
long lags is that relatively new patents will often have received fewer citations (and/or citations in
fewer technological fields) than older patents. Third, another issue that precludes reasonable
comparisons of citation-based indicators across years relates to observed increasing propensities to
cite. As Hall et al. (2002) argue, increased computerization of the patent system led to less time-
consuming queries by patent examiners, as a consequence of which the citations to patent ratios rose
considerably in the 1980s.
To deal with these differences, we base our rankings on technological category-specific cohorts of
patents applied for in a given year. That is, we first construct datasets of patents associated with
technological category i applied for in year t. The NBER Patent-Citations Datafile contains data on
patent citations to utility patents granted by the U.S. Patent Office in the period 1963-1999. For the
present analysis, we used the large subset of these patents granted in 1970 and later. This dataset
contains over 2.4 millions of patents, of which nearly 1.0 millions were granted to inventors outside
the U.S. These patents include patents granted to individuals and governments, but more than 75%
were awarded to non-governmental organizations (corporations and universities).1
The NCITING importance indicator was taken in unchanged form from the same source. These
indicator values are based on citations included in patents granted from 1975 to 1999.2 Hall et al.
(2002) report that more than 16.5 millions of citations were involved in the underlying computations.
Self-citations (i.e., citations to previous patents granted to the same organization) are included.
We assigned patents to the technological sub-categories defined in Hall et al. (2002), constructed
from grouping patent technology classes. The USPTO classify patents in about 400 main 3-digit
patent classes. Hall et al. (2002) aggregated these classes into 36 sub-categories. We excluded all sub-

1 See Hall et al. (2002, p. 413) for details.

2 The fact that citation data are not available before 1975 led us to the decision not to include pre-1970 patents in our

categories containing miscellaneous classes of a given category, because these categories are by
definition very heterogeneous in terms of technological characteristics (see the Appendix for a list of
the categories used).

3. A Method to Identify Important Innovations

In this section, we give a detailed account of the procedure that we follow to single out important
innovations by means of patent citation data. As mentioned in the previous section, we order the
patents that were assigned to a technology category in a given year on the basis of citation-based
scores. In earlier work (Akkermans et al., 2006), we considered patents that belonged to an upper
quantile to be important. The specific quantiles considered were taken as fixed over time. Although
the results might appear insightful, this ‘fixed quantiles-based’ approach has a number of drawbacks.
Here, we will reflect on these, before moving on to a more promising approach. We start by
indicating why a category-level perspective is required.
First, technological fields vary considerably in terms of the propensities to patent inventions.
Patents are meant to protect inventors from imitation, in order to stimulate innovative activity. Cohen
et al. (2000), however, find that many innovators do not view the patents system as the most effective
way to protect their intellectual property. Keeping their technology secret is often considered a more
attractive option, and many firms just rely on first-mover advantages (lead time). Cohen et al. (2000)
also stress that the main intention of many applicants is not to preclude imitation, but rather to force
other firms into negotiations (often about cross-licensing) or to have potential competitors changing
their technological strategies.3 Opportunities to keep innovations secret or to force competitors into
negotiations over cross-licensing are technology-specific. If few patents are granted to a technology
category, the few patents that are granted will, on average, receive fewer citations than the average
patent granted to a category for which patenting is popular.4 By identifying important patents on the
basis of category-level frequency distributions we correct for inter-category differences in the
numbers of citations that correspond to specific quantiles, but we do not correct for differences in
the shape of these frequency distributions.
Second, technologies differ in terms of the opportunities for innovation they experience.
Utterback and Abernathy (1975) identified innovation patterns that are quite strongly associated with
stages in industry and technology life cycles (see Dosi, 1982, for extensive discussions of the concept
of technological trajectories). In early stages, when no dominant design is in place yet and
technological dynamism is huge, many product innovations are produced. Later on, upon entering the
stage of maturity, the numbers of innovations are generally lower and cost-reducing process

3 See Granstrand (1999) for discussions of “fencing” and “blocking” strategies.

4 Using a European dataset, Verspagen and De Loo (1999) find average received citations-to-patents ratios ranging from
0.39 in the shipbuilding industry to 1.16 in the computer manufacturing industry. For USPTO patents granted in 1980,
we find qualitatively similar results.

innovations tend to dominate over product innovations. Differences with respect to the stage in the
technology life cycle are taken into account by means of the year-specific identification of important
patents. Again, however, this approach remains limited to a correction for differences in the numbers
of citations received by the patent defining a specific quantile, but not for differences regarding
higher moments of the citation frequency distributions.
In summary, we feel that considering a patented innovation that received much more citations
than a simultaneously granted patent as more important is justified. It is clear, however, that it is
probably unwarranted to consider the most heavily cited 10% of amusement devices-related patents
(technology category 25) granted in 1976 as equally important as the most heavily cited 10% of
organic compounds-related (technology category 4) patents in 1995. A very recent empirical approach
proposed by Silverberg and Verspagen (2007) provides a way to overcome this problem. The point of
departure of Silverberg and Verspagen (2007) is the by now uncontested finding that the returns to
innovation are very skewed (see, e.g., Scherer et al., 2000). Only about one in every four innovation
projects yields a positive return and only a few projects generate the big chunk of total returns to
investment in R&D. The specific skew statistical distribution that is most able to describe the
empirical distribution has been a topic of debate. Traditional goodness-of-fit tests suggest that
lognormal distributions do a good job, but more thorough examination shows that Pareto
distributions are superior in matching the observed frequency distributions in the right tail, i.e. the
frequencies for the most valuable innovations. This phenomenon can also be observed for
frequencies of numbers of patent citations.

Figure 1: Fat tails in Innovation Importance Distributions

Source: Authors’ computations on NBER Patent-Citations Datafile. For Communications in 1970, citations have a
maximum at 53 and the tail starts at 29 citations (estimated confidence interval is 22-36). For Drugs in 1980, the highest
number of citations is 201 and patents fall in the tail already if they have received more than 40 citations (17-40 is the
corresponding confidence interval).

To illustrate this, we generated Figure 1 by ordering all patents in our dataset assigned to category 6
(“communications”) applied for in 1970 according to the numbers of citations they received in the
period 1975-1999 and adopting an identical procedure for patents applied for in 1980 assigned to
category 10 (“drugs”).5 These numbers of citations are depicted along the horizontal axis. The
frequencies of patents with a higher number of citations than the value depicted on the horizontal
axis is indicated along the vertical axis. Since both scales are chosen to be logarithmic, a Pareto
distribution would look like a straight, downward sloping line. Exponential distributions would show
curvature, that is, the absolute value of the negative slope would become higher for higher citation
numbers. The solid line in Figure 1 shows that the frequencies of communications patents tend to
follow a lognormal distribution over the entire range of numbers of citations. For drugs patents
(dashed line), however, a mixed distribution seems to depict the observed frequencies more
accurately. For less-cited patents, a lognormal distribution can be shown to fit the ever more steeply

5 We selected these categories and years for ease of exposition only.

declining curve better. The rightmost part of the curve related to patents with 40 citations or more, is
approximately linear, reflecting a Pareto distribution.
Silverberg and Verspagen (2007) present extensive evidence that a mix of lognormal and Pareto
distributions can be used to describe observed frequency distributions for a variety of indicators of
patent importance, such as patent valuations obtained by surveys among inventors and data on actual
revenues from patents. As Silverberg and Verspagen argue in a related paper (Silverberg and
Verspagen, 2003), important innovations come about in a different way than less important ones.
They derive part of their argument from Dosi’s (1982) argument that radical innovations that
constitute a ‘technological paradigm’ are almost always followed by swarms of more incremental
innovations. Given that a dominant design is slowly emerging in such cases, the degree and nature of
uncertainty surrounding innovation processes change over time. Changes in behavior by potential
innovators caused by the changing environment they face could well yield different statistical
distributions that govern innovation frequencies. Curves like for “drugs” in Figure 1 suggest that a
limited number of important innovations took place in 1985 that would not have taken place if all
innovations would follow the lognormal distribution for less-important innovations. As Figure 1 also
indicates that the proportion of important patents may well vary over technologies the
abovementioned fixed quantiles approach adopted by Akkermans et al. (2006) is not ideal indeed.
Instead, a statistical procedure is required to estimate the category-specific and cohort-specific
numbers of citations delineating the ‘transition’ from a lognormal distribution to a Pareto
distribution. We will call this number the “cutoff-point”.

4. Estimating the Cut-Off Point

Results from Extreme Value Statistics (see for instance Coles, 2001) can be used to define and
estimate the key parameters that characterize the tails of a distribution. In particular, given a series of
i.i.d. observations (X1 , X2 , …, Xn ), the maximum will converge to one of three limit distributions:
heavy-tailed (like stable and Student t distributions), short-tailed (like the uniform) or medium-tailed
(like the normal) distributions. Also, the behavior of the tails of a distribution can be approximated by
the so-called Generalized Pareto Distribution. If the tail follows a Pareto law F(x)=1-x-α , a maximum
likelihood estimator of the parameter α can be obtained using the Hill estimator (Hill, 1975). Such
estimator has a very simple expression. Given the rank-order statistics of the sample X(1) ≥X(2) ≥
…≥X(n) , the Hill estimator of the inverse of α is obtained as:

γˆ = (αˆ ) −1 = 1/ k ∑i =1 (ln X (i ) − ln X (k +1) )


Note that the parameter alpha reflects the magnitude of the negative slope of the straight line
characterizing the Pareto distributions in Pareto-plots like Figure 1.

The value of the Hill estimator is a function of k, the number of observations included in the tail.
The corresponding Hill plot can be used to get an idea to the value at which the Hill estimates
stabilizes. For very low values of k the estimates will be highly fluctuating. If the underlying
distribution is Paretian, the Hill estimates will stabilize at a certain value. But if the distribution is not
overall Paretian, including observations from the central part of the distribution will decrease the
validity of the estimator. A method is then needed to estimate the ‘optimal’ value of the parameter k.
Lux (2001) provides an overview of various methods proposed to estimate the parameter k, our
cut-off point. In the computationally convenient procedure adopted by Drees and Kaufmann (1998),
the length of the right tail is first set to one observation. Next, the most likely length is found by
examining the fluctuations in the value of the Hill-estimator when adding more observations to the
tail. Such fluctuations emerge if Hill-estimators are applied to distributions that are not Pareto. If a
predetermined threshold value is exceeded by the fluctuation, an estimate for k is found. We use a
slightly modified version of this Drees-Kaufmann estimator, proposed by Lux (2001): in this version
the stopping rule is modified with a higher threshold so that the tail includes fewer observations from
the central part of the distribution.
Silverberg and Verspagen (2007) did not study the distributional properties of the stochastic
estimators for k and α. It should be noted, however, that the number of important innovations in a
given year and assigned to a specific technological category depends on the estimated cutoff-point.
Consequently, this number is an estimate as well, the outcome of which is stochastic. Because the
distribution of the cutoff-point estimator is not known, we rely on bootstrapping techniques to
obtain information on confidence intervals. We use the by now fairly standard bootstrapping
procedure of drawing a large number of pseudo-samples of a size equal to the real sample, by drawing
from the observations with replacement (Efron and Tibshirani 1986, 1993). For each of these
pseudo-samples the cutoff-points are estimated by means of the Drees-Kaufmann-Lux (DKL)
procedure described above. Next, these are ordered. 90% confidence intervals are constructed by
determining the values of the estimators for the 5th percentile and the 95th percentile of the
corresponding values found for the pseudo-samples.

5. Important Patents: Patterns across Technology Fields

In this section we present results from our estimation procedures. We applied the Drees-
Kaufmann-Lux procedure to sets of patents sharing the same application year and assigned to the
same technological category. We selected patents that received at least one citation. This is done
because the Hill estimator involves taking the logarithm of the variable at hand, so positive values
only can be used. The proportion of important innovations in each cohort is the ratio of the number
of important patents, i.e. those patents receiving more citations than the estimated cutoff point,
relative to the total number of patents with one citation or more.

intervals, also obtained from these sets of 1000 pseudo-samples.
each industry and cohort. BS-left and BS-right refer to the boundaries of 90%-confidence
Notes: BS-mean refers to the mean proportions for 1000 bootstrapped pseudo-samples for
proportions, 1970-1998.
*All indicated proportions are computed as unweighted averages of the cohort-specific







Agriculture, food,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Coating
Gas Agriculture, food,

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031

Figure 2: Proportions of Important Innovations by Category (averages)*

Organic comp. Coating
Resins Gas
Communications Organic comp.
Computer hdw Resins

Figure 3: Estimates of alpha by category (averages)*

Computer peri. Communications
Information stor. Computer hdw
Drugs Computer peri.
Instruments Information stor.
Biotechnology Drugs
Electrical dev. Instruments
Electrical lighting
Electrical dev.
Electrical lighting
Nuclear, X-rays Measuring and
Power systems Nuclear, X-rays

Semiconductor Power systems

Materials Semiconductor
Metal working Materials
Motors, engines Metal working
Optics Motors, engines
Transportation Optics
Agriculture, Transportation
Amusement dev. Agriculture,
Apparel, textile Amusement dev.
Earth working Apparel, textile
Furniture, house Earth working
Furniture, house
Pipes and joints
Pipes and joints

We produced estimates for the proportion of important innovations in each cohort and
calculated bootstrapped estimates of 90% confidence intervals. We also calculated the mean
bootstrap estimate in order to compare it with the point estimate from the DKL procedure. Figure 2
shows the average estimates across the list of categories considered. The variability of the estimates
changes, but remains comparable across categories. The mean of the bootstrap estimates is in line
with the point estimates.
In order to further investigate possible differences across technological fields, we also looked at
the estimates for the parameter α. Figure 3 shows that point estimates for the power law exponent
remains in the range between 2 and 4. The confidence interval, produced by the Drees and
Kaufmann routine, suggests that for a few categories the parameter may also fall in the range of
values below 2, indicating lack of existence of the variance, but never below 1, which would flag the
non existence of the first moment (see Silverberg and Verspagen, 2007). The non-existence of lower
moments has implications for traditional approaches to project choice. Maximization of expected
profits is not possible if the density distribution of profits is so fat-tailed that the mean is non-
existent. Classical risk analyses require the existence of the variance. As can be inferred from Figure 3,
innovation processes in dynamic technologies such as drugs (10) and biotechnology (12) might well
have distributions of importance that do not have a variance. For more mature technology categories
like motors and engines (21 and furniture and house fixtures (28) such a situation appears to be very
Figures 2 and 3 (which relate to the estimated cut-off point and the fatness parameter,
respectively) also tell us something about the stochastic properties of the estimators. Figure 2 shows
that the estimated proportion of important innovations is rather close to the average over
bootstrapped pseudo-samples. The exceptions are agriculture, food, textiles (1), computer peripherals
(8) and agriculture, husbandry and food (24), for which the two types of estimates are more than 1%-
point different from each other. We consider the fact that both estimates are generally close to each
other as a positive property of the DKL procedure adopted by Silverberg and Verspagen (2007). The
variability of the estimates over pseudo samples is quite heterogeneous. For categories like gas (3),
computer peripherals (8) and biotechnology (12), the 90% confidence interval for the proportion of
important innovations ranges from about 1% to more than 10%. For other technology categories,
such as resins (5), electrical devices (13) and power systems (17), the confidence interval spans less
than 5%-points. The heterogeneity of the variability carries over to the estimator for α. For many
technologies, the confidence interval for this fatness estimator spans a range of more than 3 units.
This casts some doubts on the conclusions that can be drawn. It should be noted, however, that the
confidence intervals for technology categories with a low fatness estimate are relatively narrow, which
offers relative strength to results obtained for dynamic technologies such as biotech and drugs.

Table 1 reports the category-specific point estimates of the parameter alpha and of the percentage
of important innovations, together with descriptive statistics on the number of patents included in
each category (averaged across years) and on the citation to patent ratios in 1980. The table is sorted
by the average percentage of important patents. Figure 4 shows a scatter plot of the estimated values
of α and the proportion of important patents for the technological categories considered.

Table 1: Differences in estimated parameters across technological categories, unweighted

averages across years.

Average Average Citation to

estimated Average percentage of number of patent ratios,
alpha important patents patents 1980
51 Materials processing & handling 3.200 2.30% 3659 5.94
43 Measuring and testing 3.595 2.72% 2003 6.95
53 Motors, engines and parts 3.465 2.76% 2585 5.79
21 Communications 3.358 2.82% 3016 9.31
52 Metal working 3.073 2.85% 2023 4.80
14 Organic compounds 2.472 2.94% 2155 3.45
45 Power systems 3.478 2.96% 2401 6.89
68 Receptacles 3.832 3.15% 1445 7.41
65 Furniture, house fixtures 3.856 3.15% 1407 6.45
15 Resins 2.826 3.16% 2339 8.01
41 Electrical devices 3.425 3.20% 2157 6.28
63 Apparel and textile 3.669 3.24% 1139 5.19
55 Transportation 3.239 3.24% 2022 5.21
66 Heating 3.492 3.52% 952 4.84
31 Drugs 2.441 3.54% 1848 6.36
54 Optics 3.353 3.77% 1587 6.95
64 Earth working and wells 3.116 3.87% 1022 6.09
22 Computer hardware and software 3.257 3.90% 2244 15.50
32 Surgery and medical instruments 3.058 4.31% 1888 15.72
61 Agriculture, husbandry, food 2.856 4.69% 1463 6.34
67 Pipes and joints 3.411 4.79% 622 6.01
12 Coating 2.797 4.85% 1026 7.34
24 Information storage 3.112 4.86% 1235 9.29
46 Semiconductor devices 3.667 4.89% 1286 12.51
44 Nuclear and X-rays 3.315 4.92% 1050 7.26
62 Amusement devices 2.800 5.32% 697 5.59
42 Electrical lighting 2.838 5.35% 1083 6.49
33 Biotechnology 2.280 5.75% 602 8.79
11 Agriculture, food, textiles 2.740 5.88% 544 5.55
23 Computer peripherals 2.597 6.09% 591 16.92
13 Gas 3.393 6.11% 352 7.83

The estimated proportion of important patents is positively correlated with the number of
patents in the group. It seems to be the case that the fewer the patents the higher the proportion. Of
course we would need to confirms this, for instance by means of simulations of our estimation

procedure. This could simply indicate, from a combinatorial point of view, that, when there are fewer
patents to be cited, citing tends to concentrate because there are fewer patents to be cited. Instead,
when more patents exist in a field, citations get spread to a higher number of different patents in a
way that it becomes less likely for a few patents to attract a substantial number of citations. If this
were the case, the citation to patent ratios could also be informative. But a crucial assumption is that
citations mostly come from within the same category of the cited patent and this assumption is not
necessarily true. In fact, besides combinatorial considerations, the interesting part of the explanation
of possible differences across categories lies, in our view, in what we know from the literature on the
role of patenting in different technological fields.

Figure 4: Estimates of alpha and estimated percentage

of important innovations, by category (averages)*

Estimated percentage of important innovations

Computer peripherals Gas




Drugs Furniture

Organic comp. Receptacles

1.00 processing

2.00 2.20 2.40 2.60 2.80 3.00 3.20 3.40 3.60 3.80 4.00
Estimate of alpha

* The red lines indicate the averages on the two dimensions.

So far, we have only discussed the cross-sectional evidence and have not taken into account the
evolution in time of the observed patterns. Figure 5 shows how the average estimated percentage of
important innovations has changed in the period considered. At least until 1986, the estimates remain
remarkably stable. After 1986, we observe a clear monotonic decrease in the proportion. This pattern
holds for all categories, but more so for those fields that have developed only recently. Figure 6
depicts the categories in terms of the total number of important patents and the share of those
patents obtained after 1986.

Figure 5: Proportion of Important Innovations over Time*



BS mean
BS left
BS right








*All indicated proportions are computed as unweighted averages of the industry-specific
proportions. See Figure 2 for other notes.

This decrease could be understood in terms of the truncation problem. Patents granted in the
later years can receive much less citations than older patents. Hall et al (2002) show that mean
citations received decline from around 1986 (Figure 8 in their paper). This may have consequences
for our method if important patents tended to receive more citations far after their application year.
In this respect, Hall and Trajetenberg (2005) already show that highly cited patents in the NBER
Database have significantly higher citation lags. We check if their claim applies also when highly cited
patents are obtained using our method. It turns out that the half-life of important patents is generally
higher than the one of less important patents. The half-life is calculated as the median age of all
patents citing the group of patents in the tail for each category and year. We can then claim that our
important patents are patents that receive citations for a longer time span and are not, instead, patents
that receive already in the first years a substantially higher number of citations than less important
patents. This type of investigation is important when considering the use of patent indicators to map
life cycles of technologies (see Haupt et al., 2007).

Figure 6: Share of important patents obtained after 1983.

Proportion of important innovations

Computer Drugs hard & software
patented after 1983

60% peripherals

Biotech Communications

Gas Apparel and
40% textile
Motors &
0 500 1000 1500 2000 2500
Number of important innovations

An alternative explanation of the decrease in Figure 5 could be that the decline is a real, not a
spurious phenomenon. It may be the case that after 1986 the number of non-important patents
increased dramatically because of the increased tendency to patent for strategic reasons and to patents
smaller parts of every innovation. And in fact, one can not in principle reject the hypothesis that the
amount of important innovations decreased because of steadily decreasing technological

6. Conclusions

This paper has presented an application of a statistical procedure recently introduced by

Silverberg and Verspagen (2007) to distinguish between important and non-important patents. The
main feature of this procedure is that it divides the total set of patents into a subset of non-important
patents for which the frequency distribution of forward citations is governed by a lognormal
distribution and a subset of important patents for which a Pareto (or, power law) distribution applies.
We used the NBER data and Jaffe and Trajtenberg’s (2002) forward citation count indicator to
identify important patented innovations. We have discussed the results obtained for cohorts of
patents grouped in 31 technological categories.
This work has several important policy implications and also contributes to the literature on
technological change and innovation. We have shown how the tail estimation procedure could be
used to identify important innovations in a technological field. There is still much debate on what
defines a ‘radical innovation’, also called ‘discontinuous’ or `revolutionary’ (see for instance the review
in Dahlin and Behrens, 2005) and our results provide a way for an alternative operationalization of

the concept. Silverberg and Verspagen (2003) discuss how radical innovation and incremental
innovation reflect different underlying statistical properties. They relate this to theoretical arguments
on the different nature of radical innovation, for instance in terms of level of uncertainty. Radical
innovations bring new ‘technological paradigms’ (Dosi, 1982) and are followed by more incremental
innovations for which uncertainty is much lower. Rosenberg (1969) already suggested that radical
innovations work as ‘focusing devices’ for the subsequent incremental innovations. The percolation
model in Silverberg and Verspagen (2003) and the generalized Polya urn process in Sanditov (2005)
are two recent attempts at proposing generating processes for the observed statistical patterns of the
distribution of the value of innovations. Both models capture properties of the innovation process
highlighted by evolutionary theories of technical change (cf. the seminal contribution of Nelson and
Winter, 1982), and supported by a long-standing tradition of empirical research on innovation.
In a related paper (Castaldi and Los, 2007), we use the tail estimation procedure to study the
international technological specialization in important innovations of European countries and the US.
We assign patents and important patent to countries by means of the information on the country of
first inventor. We then construct indicators of international technological specialization in important
innovations at the industry level. Some countries have a reputation of being very good at producing
important innovations and we wish to test whether this specialization helps them to achieve
substantially higher productivity growth rates.
The method discussed in this paper is general enough to be applied to other data as well. For
instance, scientific publications are known to be characterized by highly skewed distributions of
importance. One could apply the tail estimation procedure as complementary to the existing
scientometric work.


Akkermans, D.H.M., C. Castaldi and B. Los (2008), “Do ‘Liberal Market Economies’ Really Innovate More
Radically than ‘Coordinated Market Economies’? Hall & Soskice Reconsidered”, Research Policy,
Castaldi, C. and B. Los (2007), “International Technological Specialization in Important Innovations: Some
Industry-Level Explorations”, Working Paper, University of Groningen.
Cohen, W.M., R.R. Nelson and J.P. Walsh (2000), “Protecting Their Intellectual Assets: Appropriability
Conditions and Why U.S. Manufacturing Firms Patent (or Not)”, NBER Working Paper 7552 (Cambridge
Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London.
Dahlin, K.B. and D.M. Behrens (2005), “When is an invention really radical? Defining and measuring
technological radicalness”, Research Policy, vol. 34, pp. 717-737.
Dosi, G. (1982), “Technological Paradigms and Technological Trajectories”, Research Policy, vol. 11, pp. 147-
Drees, H. and E. Kaufmann (1998), “Selecting the Optimal Sample Fraction in Univariate Extreme Value
Estimation”, Stochastic Processes and their Applications, vol. 75, pp. 149-172.

Efron, B. and Tibshirani, R. (1986), “Bootstrap methods for standard errors, confidence intervals, and other
measures of statistical accuracy”, Statistical Science, vol. 1 (1), pp. 54-77.
Efron, B. and Tibshirani, R. (1993), An Introduction to the Bootstrap, (London: Chapman & Hall).
Granstrand, O. (1999), The Economics and Management of Intellectual Property (Cheltenham UK: Edward Elgar).
Griliches, Z. (1990), “Patent Statistics as Economic Indicators”, Journal of Economic Literature, vol. 28, pp. 1661-
Hall, B.H., A.B. Jaffe and M. Trajtenberg (2002), “The NBER Patent-Citations Data File: Lessons, Insights,
and Methodological Tools”, in: Jaffe, A.B. and M. Trajtenberg, Patents, Citations & Innovations (Cambridge
MA: MIT Press), pp. 403-459.
Hall, B.H. and M. Trajtenberg (2005), “Uncovering GPTs with Patent Data”, in C. Antonelli, D. Foray, B. H.
Hall, and E. Steinmueller, Festschrift in Honor of Paul A. David, Edward Elgar.
Haupt, R., M. Kloyer and M. Lange (2007), “Patent indicators for the technology life cycle development”,
Research Policy, vol. 36, pp. 387-398.
Hill, B.M. (1975), “A simple general approach to inference about the tail of a distribution”, Annals of Statistics, 3,
Jaffe, A.B. and M. Trajtenberg (2002), Patents, Citations & Innovations: A Window on the Knowledge Economy
(Cambridge MA: MIT Press).
Jaffe, A.B., M. Trajtenberg and M.S. Fogarty (2000), “Knowledge Spillovers and Patent Citations: Evidence
from a Survey of Inventors”, American Economic Review, Papers and Proceedings, vol. 90, pp. 215-218.
Jaffe, A.B., M. Trajtenberg and R. Henderson (1993), “Geographic Localization of Knowledge Spillovers as
Evidence by Patent Citations”, Quarterly Journal of Economics, vol. 108, pp. 577-598.
Lux, T. (2001), “The Limiting Extremal Behaviour of Speculative Returns: An Analysis of Intra-Daily Data
from the Frankfurt Stock Exchange”, Applied Financial Economics, vol. 11, pp. 299-315.
Nelson, R.R. and S.G. Winter (1982), An Evolutionary Theory of Economic Change, The Belknap Press, Harvard
University: London.
Rosenberg, N. (1969), “The Direction of Technological Change: Inducement Mechanisms and Focusing
Devices”, Economic Development and Cultural Change, vol.18, pp. 1-24.
Sanditov, B. (2005), “Patent Citations, the Value of Innovations and Path-Dependency”, CESPRI Working
Paper 177, Bocconi University Milano.
Scherer, F.M., D. Harhoff and J. Kukies (2000), “Uncertainty and the Size Distribution of Rewards from
Innovation”, Journal of Evolutionary Economics, vol. 10, pp. 175-200.
Silverberg, G. and B. Verspagen (2003), Brewing the future: stylized facts about innovation and their
confrontation with a percolation model, ECIS Working Paper 03.06, Eindhoven Technology University.
Silverberg, G. and B. Verspagen (2007), “The Size Distribution of Innovations Revisited: An Application of
Extreme Value Statistics to Citation and Value Measures of Patent Significance”, Journal of Econometrics, 139,
pp. 318-339.
Trajtenberg, M. (1990), “A Penny for Your Quotes: Patent Citations and the Value of Innovations”, RAND
Journal of Economics, vol. 20, pp. 172-187.
Trajtenberg, M., R. Henderson and A.B. Jaffe (1997), “University vs. Corporate Patents: A Window on the
Basicness of Innovation”, Economics of Innovation and New Technology, vol. 5, pp. 19-50.
Utterback, J.M. and W.J. Abernathy (1975), “A Dynamic Model of Process and Product Innovation”,
OMEGA, vol. 3, pp. 639-656.
Verspagen, B. and I. de Loo (1999), “Technology Spillovers between Sectors and over Time”, Technological
Forecasting and Social Change, vol. 60, pp. 215-235.


Category Classification
Nr. Description Sub-category code in Hall et al (2002)
1. Agriculture, food, textiles 11
2. Coating 12
3. Gas 13
4. Organic compounds 14
5. Resins 15
6. Communications 21
7. Computer hardware and software 22
8. Computer peripherals 23
9. Information storage 24
10. Drugs 31
11. Surgery and medical instruments 32
12. Biotechnology 33
13. Electrical devices 41
14. Measuring and testing 42
15. Nuclear and X-rays 43
16. Power systems 44
17. Semiconductor devices 45
18. Materials processing and handling 46
19. Professional and scientific instruments 51
20. Metal working 52
21. Motors, engines and parts 53
22. Optics 54
23. Transportation 55
24. Agriculture, husbandry and food 61
25. Amusement devices 62
26. Apparel and textile 63
27. Earth working and wells 64
28. Furniture, house fixtures 65
29. Heating 66
30. Pipes and joints 67
31. Receptacles 68