Documente Academic
Documente Profesional
Documente Cultură
Corresponding author:
Andrew Bell
School of Geographical Sciences
University of Bristol
University Road
Bristol
BS8 1SS
Andrew.bell@bristol.ac.uk
Abstract
This paper uses random-coefficient models and (a) finds rankings of who are the best formula 1 (F1)
drivers of all time, conditional on team performance; (b) quantifies how much teams and drivers
matter; and (c) quantifies how team and driver effects vary over time and under different racing
conditions. The finishing position of drivers is used as the response variable in a cross-classified
multilevel model that partitions variance into team, team-year and driver levels. These effects are
then allowed to vary by year, track type and weather conditions using complex variance functions.
Juan Manuel Fangio is found to be the greatest driver of all time. Team effects are shown to be more
important than driver effects (and increasingly so over time), although their importance may be
reduced in wet weather and on street tracks. We argue that the approach can be applied more widely
across the social sciences, to examine individual and team performance under changing conditions.
Keywords
Cross-classified models; Formula 1; MCMC; Performance; Sport; Multilevel models.
Acknowledgements
Thanks to Rick Stafford, and the Bristol Spatial Modelling Research Group, for their helpful
comments and advice.
1 Introduction
Formula 1 (F1) is a sport of genuine global appeal. Established in 1950, F1 has also grown into a huge
business enterprise, with sponsorship and commercialism drawn to the sport by the 527 million
television viewers from 187 different countries (in 2010). After the Football World Cup and the
Summer Olympic Games, it is the largest sporting event in terms of television audience (Judde et al.,
2013).
Many of the F1 teams that compete employ statistical analysts to analyse race results; however these
are in general kept undisclosed so that teams are able to keep any tactical advantages these analyses
offer to themselves. As such, there are only a handful of papers in the public domain that have done
systematic statistical analysis of F1 race results, and these are focused on the question of who is the
best driver, and do not consider the question of how much teams and drivers matter in different
contexts. However, the large fan base ensures that there is a plethora of publicly available data on F1
race results online, and the potential for statistical analysis of these data is large.
This paper uses cross-classified multilevel models to produce a more complete picture of what
influences performance in F1 races. As well as producing rankings of F1 drivers which control for the
influence of teams, the models are able to partition variance to see the extent to which teams and
drivers matter. The key methodological innovation of this paper is the use of complex variance
functions, in which the variance depends on predictor variables, to see how team and driver influences
have changed over time, and differ by different driving conditions, as well as to see how driver
rankings vary by these conditions. Such an approach has potential application beyond F1 as the
methodology is applicable in subject areas throughout the social sciences and beyond, such as when
examining changing team and individual performance in firms.
2 Formula 1
The academic literature surrounding Formula 1 is relatively limited. However that literature which
does exist is cross-disciplinary, involving, for example, computational simulations of race results
(Bekker & Lotz, 2009; Loiacono et al., 2010), economic approaches that consider the importance of,
and adaptability of, F1 teams as firms (Jenkins, 2010; Jenkins & Floyd, 2001), analyses of car design
over time from an engineering perspective (Dominy & Dominy, 1984; Dominy, 1992), analyses of
specific tracks (Alnaser et al., 2006) and their impacts on tourism (Henderson et al., 2010), and
historical approaches to the sport (Hassan, 2012). There have been a few statistical analyses of race
results, although these are often limited to a few races or seasons (Bekker & Lotz, 2009; Muehlbauer,
2010).
Phillips argues that this unexpected result is caused by the use of finishing position as the dependent
variable, and prefers the use of (adjusted) points scored as an appropriate measure, since he argues
this is a better measure of achievement in F1. Like Eichenberger and Stedelmann, Phillips controls for
teams. He additionally controls for competition effects (such that drivers are penalised in the ranking
for appearing in less competitive seasons) and, for driver withdrawals, separates driver faults and
technical faults, to ensure drivers are not penalised for team errors. His rankings are based on drivers
3-year (or 5-year) peak performance (rather than their whole career). For Phillips rankings, it is Jim
Clark who comes out top; Stewart, Schumacher, Fangio and Alonso make up the rest of the top 5.
However these results also throw up a few surprises; for example James Hunt is ranked at number six
(Phillips argues that Hunt is indeed underrated by experts).
In sum, these two previous analyses have shown many consistencies, with drivers regarded as greats
by experts coming out at the top. However, different decisions regarding model specification lead to
different results, and both the above analyses produce some results which one might consider
surprising. This is not to doubt the validity of those results simply to state that if you ask slightly
different questions, by defining rankings differently, you are likely to get slightly different results. We
discuss some of these modelling decisions, in the context of our own modelling strategy, in the
methods section below.
assessed as far as we are aware. Yet because there is a relatively large amount of movement of drivers
between teams, the question can be answered with appropriate statistical techniques that can model
this complexity.
Certainly, there are reasons why some teams should in general outperform others. Certain teams have
more funds, are able to employ the best engineers, statisticians and tacticians, and use more advanced
technology than other teams. For example, the Williams cars that were so successful in the mid-1990s
included computerised driver aids; Brawn GP in 2009 used double diffuser technology, which gave
them an advantage and led Jenson Button to win the championship despite having won only one race
in the nine previous years of his career. As well as these specific innovations, team/car performance
will depend on factors such as aerodynamic efficiency, brakes, engines, gearbox, fuel, and more
recently kinetic energy recovery systems which change between teams and year on year (Horlock,
2009:4).
Experts generally agree that the team matters more than the driver, although the extent to which this
is true is hotly debated. Driver Nico Rosberg has stated the respective contributions to be 80% teams
and 20% drivers (Spurgeon, 2009). Others have argued that only the best drivers are really capable of
making a difference in F1 races, with Allen (2000) giving Michael Schumacher as an example of this. A
key contribution of this paper is to statistically evaluate the extent to which teams and drivers matter.
Ayrton Senna, Michael Schumacher and Lewis Hamilton, are noted by experts for their abilities in wet
weather, although in some cases this is based on a few outstanding notable performances rather than
a more general trend. Overall, we might expect rain to introduce additional unpredictability into races,
with even the best teams and drivers more prone to making mistakes.
3 Methodology
In sum, this paper is looking to answer three inter-related questions: who are the greatest F1 drivers
of all time, how much do teams and drivers matter, and how much does the latter change over time,
on different tracks, and in different weather conditions.
These questions can be answered thanks to the multilevel structure that is inherent to F1 race results
data. Specifically, each observed race result can be nested within a driver, a team-year, and a team.
This is not a strict hierarchical structure: since drivers move between teams over their careers, and
teams contain multiple drivers, the levels cannot be hierarchically nested. Instead, it is a crossclassified structure as represented in Figure 1. The variables used in the modelling are summarised in
table 1; these come from two online sources1.
[Table 1 and Figure 1 about here]
The dependent variable is the finishing position of the driver. Where drivers do not finish a
race, they are still ranked on the basis of when they dropped out of the race (with the first to drop out
being in last place). A basic, null model can be expressed algebraically as:
= 0 + + + +
~(0, 2 )
~(0, 2 )
www.race-database.com and www.f1-facts.com/stats. All data was available from these sites as of February
2015.
~(0, 2 )
~(0, 2 )
(1)
Such that 2 , 2 and 2 summarise the between-driver, between team and between year variance
respectively, and 2 summarises the within-race variance net of driver, team and team-year
characteristics.
given race ( ). For the latter, we take each drivers mean finishing positions, and average
these by race, and use this to control for the competitiveness of the race ( ). Thus the fixed part
of equation 1 can be extended to:
= 0 + 1 + 2 + + + +
(2)
Here, we would expect 1 to be estimated at about 0.5 (given that every additional driver means the
finishing position will be on average 0.5 positions worse), and 2 to be negative (since racing against
drivers that usually perform well should lower a drivers finishing position. However, these values are
not of substantive interest; the important thing is that they are controlled for when extracting the
driver-level residuals . The random part of the model remains as in equation 1.
10
This model is further extended to include other variables: year, weather, and track type. In order to
keep the models relatively simple and to avoid convergence issues, these variables were included in
the model individually in separate models. One interesting nuance of the finishing position dependent
variable is that it does not vary a huge amount between races (only varying by the number of drivers
in a race), which means that one cannot find effects on finishing position of race-level variables such
as those used above. Whether it rains or not, the finishing positions will still be filled. However, these
variables can be included in the random part of the model to see how team, team-year and driver
effects vary across the variables. Furthermore, for categorical variables (that is, weather and track
type), these same models can produce separate driver rankings for different values of those variables
(thus showing if drivers are ranked differently in different driving conditions).
To assess the effect of year on driver, team and team-year effects, the model in equation 2 can be
extend to:
= 0 + 1 + 2 + 3 + 0 + 0 + 0 +
3 = 3 + 1 + 1 + 1
0
2
[
] ~ (0, 0
1
01
0
2
[
] ~ (0, 0
1
01
0
2
[
] ~ (0, 0
1
01
2
1
2
1
2
1
~(0, 2 )
(3)
We would expect the fixed effect of Year, 3 to be approximately zero. However, the driver, team and
team-year differentials from this effect (1 , 1 and 1 respectively) could be non-
11
zero, and as such, the variance at each level could vary by Year. The extent to which the variance
changes are quantified with variance functions (Goldstein, 2010): the driver level, team level and
team-year level variances is calculated respectively as:
2
2
= 0
+ (201 ) + (1
2 )
(4)
2
2
= 0
+ (201 ) + (1
2 )
(5)
2
2
= 0
+ (201 ) + (1
2 )
(6)
Note that we additionally tested for a quadratic year term (thus allowing a quartic variance function),
however no improvement in the model was observed.
The model is much the same for weather (with the Year variable replaced by the dummy variable for
rain). For the track type variable, there are three categories, so two dummy variables must be included
in the model. This model is specified as:
= 0 + 1 + 2 + 3 + 4 + 0 + 0
+ 0 +
3 = 3 + 1 + 1 + 1
4 = 3 + 2 + 2 + 2
2
0
0
[1 ] ~ (0, 01
2
02
2
1
12
2
0
0
[1 ] ~ (0, 01
2
02
2
1
12
12
)
2
2
)
2
2
2
0
0
[1 ] ~ (0, 01
2
02
2
1
12
)
2
2
~(0, 2 )
(7)
(8)
2
2
Thus, the driver variance on a permanent circuit is estimated as 0
, for a temporary circuit it is 0
+
2
2
(201 ) + (1
2 ), and for a street circuit it is 0
+ (202
2
) + (1
2 ). There are equivalent variance functions for the team and team-year
levels.
This same model can also be used to calculate separate driver rankings for different conditions (using
the entirety of the data from 1950, in order to include all drivers). Thus, in equation 7 the driver
rankings for permanent circuits is given by 0 , for temporary circuits they are given by
0 + 1 , and for street circuits they are given by 0 + 2 . Rankings for dry and
wet weather conditions can be found in a similar way.
All models were fitted in MLwiN v2.32 (Rasbash et al., 2013), using Monte Carlo Markov Chain (MCMC)
estimation (Browne, 2009)2. Plausible but arbitrary starting values were used for an initial model, and
once that model had run for 10,000 iterations, these estimates were used as starting values for the
The default prior specifications in MLwiN were used: uniform distributions for the fixed effect estimates,
inverse Gamma distributions for the variances in the null models, and inverse Wishart distributions for the
variance-covariance matrices when effects were allowed to vary in other models. The models were also
estimated using Uniform priors for the variances, as suggested by Gelman (2006), but the results were almost
identical and the substantive conclusions did not change. For more on this see Browne (2009).
13
final models, which were run for 500,000 iterations (following a 500 iteration burn in). These were
found to be sufficient to produce healthy-looking visual diagnostics (that is, chain trajectories that
have the appearance of white noise), and effective sample sizes (ESS) of over 1000 for all parameters.
In all cases, variables were allowed to vary at each level one at a time, and model improvements were
calculated using the Deviance Information Criterion (DIC, Spiegelhalter et al., 2002). Where the model
shows substantial model improvement (a decrease in DIC of at least 4) when a variables effect was
allowed to vary at any level, the full model (with the variable effect allowed to vary at all three levels)
was run and these results presented.
2
2
2 + ( )
(9)
where is the sample size of driver-level entity j, 2 is the between-driver variance, and 2 is the
level 1 variance. Thus competing in only one race and winning in a poor car is not enough in our model
to do well drivers must perform consistently well to be sure their good performances are not simply
down to chance.
Second, we use finishing position as our dependent variable, rather than season-average points scored
as used by Phillips (2014). The use of points scored is effectively a non-linear transformation of
finishing position, and whilst one could argue that points scored is a better measure of what makes a
good driver, one could make the argument the other way around equally validly. Furthermore, the
use of points scored effectively penalises drivers of bad cars because it is so difficult to get into the
points positions in such cars; an excellent driver may never get into the points and achieve positions
that are relatively low but actually impressive given the car they are driving. Given we are aiming to
remove team effects, finishing position therefore seems a more appropriate measure for our
purposes. Additionally, because the method used in Phillips (2014) requires reducing the data into
driver-years (rather than driver-races), the full uncertainty around his rankings cannot be assessed.
This does however leave the problem that our level one residuals are unlikely to be Normally
distributed. A square root transformation of the Finishing position dependent variable appears to
solve this problem, however it also produces results that are substantively very similar to those
without the transformation; because of this, and given the easier interpretation of the model with the
untransformed dependent variable, we present the results from the latter.
Other notable differences include:
The use of a different (simpler) correction for competitiveness compared to Phillips 2014 (see
section 3.1);
15
No consideration of differences between driver and team errors as stated above, this is
unnecessary since our model will automatically apportion these to their appropriate level;
We consider a drivers entire career, rather than their 1, 3, or 5 year peak performance (as
considered by Phillips (2014). Thus, to perform well in our rankings, drivers must perform
consistently well, and not just for a portion of their career.
We control for fewer variables than Eichenberger and Stadelmann (2009), since we do not
want to control out any effects which should more appropriately be included in the random
effects.
With each difference, we are not necessarily claiming that our models are better than previous
models; rather that we are defining what a good driver is in subtly different ways that will impact the
results that are produced, but that are substantively reasonable.
4 Results
Model parameter estimates can be found in the online appendix, alongside an MLwiN worksheet that
can be used for replication.
A list of the top 50 drivers can be found in the online appendix (table A7 and A8)
16
Michael Schumacher, who holds the record for the most championships and race victories of any
driver in Formula 1, comes in a relatively modest ninth place. This is in part because those victories
were won in an excellent car, but also because his ranking is dragged down by his more recent postretirement performances (2010-2012) when he performed less well than in the main part of his career
and crucially was generally outperformed by his Mercedes teammate Nico Rosberg. Thus, we re-ran
the analysis with the latter section of Schumachers career treated as a separate driver. The results
are shown in Figure 3: pre-2006 Schumachers ranking rises to 3rd. The only other significant difference
in results between this model and the previous is that Rosbergs ranking falls from 13th to 46th. This is
because Schumachers high standing as a driver in the model effectively deflated 2010-2012
Mercedes team ranking in the first model, meaning Rosbergs performances appeared more
impressive. When treated as separate drivers, post-retirement Schumacher performed less well, the
team effect appears greater, and so Rosbergs performances no longer stand out compared to his
team.
[Figures 2 and 3 about here]
We were additionally able to produce rankings specific to certain weather conditions and track types
(see figures 4 and 5)4. In general, these showed similar results Fangio remained top in all but one of
the categories and the top drivers still populate the top positions. However there are some interesting
points to note. In particular, Whilst the reputations of Ayton Senna and Michaael Schumacher for
being very good wet weather drivers are justified by the data (pre-2006 Schumacher is estimated to
be the second best wet weather driver of all time, whilst Senna is the third best and ahead of his rival
Alain Prost), the similar reputation of Lewis Hamilton is not born out statistically (his ranking does not
differ between wet and dry conditions).
These graphs were produced by models that used separate coding in the random part of the model, to allow
effects and their uncertainty to be most easily computed. These models produce exactly equivalent results to
the contrast coding expressed in equation 7. See Bullen et al. (1997). Schumacher is treated as two drivers in
both figures. The coefficents for these models are given in the online appendix (tables A5 and A6).
17
4.2 How much does the driver and the team matter?
Figure 6 shows the extent of the variances at the team, team-year and driver levels in a model
controlling for number of drivers and race competitiveness. As can be seen, the team effects
significantly outweigh the driver effects, accounting for 86% of driver variation. Furthermore, the
majority (about two thirds) of the team effect is constant for a team and does not change year on year
(although there is substantial variation within teams, year-on-year as well). In other words, the legacy
of a team outweighs any transient effects as teams change year by year.
[Figure 6 about here]
There is also limited evidence that the importance of these levels vary by key variables. Although the
confidence intervals are relatively wide, there is evidence that the importance of the team has
increased over time, whilst the importance of the driver has slightly decreased (Figure 7). There is also
evidence that the team-year is less influential to race results on street circuits, compared to
permanent circuits (Figure 9). The confidence intervals regarding wet and dry conditions are too wide
to be able to make any robust conclusions (Figure 8), but the direction is in line with what one might
expect: that teams are less influential in wet conditions than in dry conditions (in other words there is
more uncertainty to race results in wet conditions).
[Figures 7, 8 and 9 about here]
5 Discussion
5.1 The greatest driver?
As with any ranking system, our claim of who is the best driver should be treated with an appropriate
degree of circumspection. As Figures 2 and 3 show, there is significant uncertainty around each of the
drivers residuals. This is not a limitation of our model, rather we are explicitly quantifying the
18
uncertainty that inevitably exists in a ranking exercise such as this. Having said this, our results present
an interesting ranking when compared to both previous statistical rankings, and subjective expert
rankings5.
The first point to note is that in most respects, our results match those of others: our top ten Fangio,
Prost, Schumacher, Alonso, Clark, Senna, Stewart, Piquet, Fittipaldi and Vettel are all considered by
most models and experts of be among the best drivers of all time. Our model agrees with previous
statistical models (Eichenberger & Stadelmann, 2009; Phillips, 2014) in ranking Prost above Senna (in
contrast to many subjective rankings), and in viewing drivers such as Nigel Mansell, Mario Andretti,
Gilles Villeneuve and Mika Hakkinen as rather overrated by experts.
Our model differs from previous statistical attempts in not throwing up any particular surprises in the
top 10, in comparison to Eichenberger and Stadelman (who placed Mike Hawthorn in 5th) and Phillips
(who placed James Hunt in 6th). Whilst they argue each has been underrated by experts, our model
suggests otherwise (with Hawthorn and Hunt in 34th and 95th place respectively). The high
performance (7th place) of Nico Rosberg in Phillips 2014 was, as he suggests, a result of his partnership
with an out of form world champion, which artificially inflated his results. In our analysis, when
Schumacher is separated into two drivers, pre and post retirement, Rosbergs performance against
the latter appears less impressive and he is placed 46th.
Perhaps the biggest surprise in our results is the high ranking of Christian Fitipaldi at number 12,
despite only competing in three seasons and never making a podium finish. This ranking occurs
because C. Fittipaldi consistently outperformed his team-mates, and because he never raced for a
good team, the standard required to get a high ranking is lower. Other surprises are the low ranking
of champion drivers such as Niki Lauda (142nd) and Alberto Ascari (76th). Lauda only performed notably
well when racing for Ferrari (1974-1977) and his results dropped when racing for other, lower
19
achieving teams. Ascaris performances can also be at least in part attributed to his team (Ferrari); he
also had a high performing team mate, and his result will be shrunk back to the mean because he
raced in relatively few (31) F1 races (see section 3.3).
20
technological advances. Thus, on street tracks, top drivers are able to differentiate themselves from
less good drivers in better cars.
6 Conclusions
We have presented a complex multilevel model which has allowed us to fulfil two related aims: (1) to
find a ranking of F1 drivers, controlling for team effects, and (2) to assess the relative importance of
team and driver effects. Whilst there is significant uncertainty in our results, our models suggest that
(1) Juan Manuel Fangio is the greatest F1 driver of all time; (2) teams matter more than drivers; (3)
about two-thirds of the team effect is consistent over time a legacy effect; (4) team effects have
increased over time but appear to be smaller on street circuits.
As with any ranking system that one could devise, this one has some flaws. First, where drivers have
not changed teams over the course of their career it is very difficult to know whether their
performance is the result of their car, the drivers skill, or a combination of the two (for example a
driver that happens to driver a particular car well) especially if their teammate remains constant as
well. Thus, the model really tells us how drivers perform against their team mates, but those teammates are not randomly selected since good drivers will self-select into good teams. This is most clearly
demonstrated by the high ranking of Christian Fittipaldi, who performed well given his low ranking
team, but who has never been tested in a good car (that counterfactual is not in the dataset see
King and Zeng (2006). Moreover, team orders have (particularly in recent years) been known to be
given to lower ranked team members, encouraging them to allow a favoured team mate to pass them
thus a good team driver may have achieve lower finishing positions simply by following team
orders. However with the observational data that we have, our models are the best we can do.
The model could be extended in a number of ways. For example, additional levels could be added to
further differentiate between different attributes of the team we could include a tyre level, an
engine level, and so on, to assess what attributes of teams matter the most. It could also be interesting
21
to see how these results differ when qualifying position, or fastest lap times, are used as the response
variables.
Finally, we contend, that these the methods used here have a potential broad appeal to researchers
in social science and beyond. The cross-classified structure has potential to assess the importance of
a wide range of social and economic determinants: how much do individuals, teams and companies
affect worker productivity; how much do Primary Care Trusts and neighbourhoods affect health; how
much do classes, schools and neighbourhoods affect educational attainment, and how much has this
changed over time. All of these questions could be answered, where data is available, using models
similar to those used here. The explicit analysis of variances as a function of continuous and categorical
predictors allows for the assessment of performance in complex and changing circumstances
reflecting the reality of the world that is being modelled.
7 References
Allen, J. (2000). Michael Schumacher: Driven to Extremes. Bantam Books.
Allen, J. (2011). James Allen on F1: 2011 - Vettel steals the show. Speed Merchants Ltd.
Alnaser, W.E., Probert, S.D., El-Masri, S., Al-Khalifa, S.E., et al. (2006). Bahrain's Formula-1 racing
circuit: energy and environmental considerations. Appl Energ, 83, 352-370.
Bekker, J., & Lotz, W. (2009). Planning Formula One race strategies using discrete-event simulation. J
Oper Res Soc, 60, 952-961.
Bell, A., & Jones, K. (2015). Explaining Fixed Effects: Random effects modelling of time-series-crosssectional and panel data. Polit Sci Res Methods, 3, 133-153.
Browne, W.J. (2009). MCMC estimation in MLwiN, Version 2.25. University of Bristol: Centre for
Multilevel Modelling.
Bullen, N., Jones, K., & Duncan, C. (1997). Modelling complexity: Analysing between-individual and
between-place variation - A multilevel tutorial. Environ Plann A, 29, 585-609.
Collings, T., & Edworthy, S. (2004). The Formula 1 Years. Carlton Books Ltd.
Dominy, J., & Dominy, R. (1984). Aerodynamic influences on the performance of the Grand Prix
Racing Car. J Automobile Engineering, 198, 87-93.
Dominy, R. (1992). Aerodynamics of Grand Prix cars. J Automobile Engineering, 206, 267-274.
Eichenberger, R., & Stadelmann, D. (2009). Who is the best Formula 1 driver? An economic approach
to evaluating talent. Economic Analysis & Policy, 39, 289-406.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Anal,
1, 515-533.
Goldstein, H. (2010). Multilevel statistical models. 4th edition. Chichester: Wiley.
Hassan, D. (2012). The History of Motor Sport. New York: Routledge.
Henderson, J., Foo, K., Lim, H., & Yip, S. (2010). Sports events and tourism: the Singapore Formula
One Grand Prix. International Journal of Event and Festival Management, 1, 60-73.
22
Horlock, I. (2009). Prediction of Formula One race results using driver characteristics. Project for
Masters Degree in Engineering, University of Bristol.
Jenkins, M. (2010). Technological Discontinuities and Competitive Advantage: A Historical
Perspective on Formula 1 Motor Racing 1950-2006. J Manage Stud, 47, 884-910.
Jenkins, M., & Floyd, S. (2001). Trajectories in the evolution of technology: A multi-level study of
competition in Formula 1 racing. Organ Stud, 22, 945-969.
Judde, C., Booth, R., & Brooks, R. (2013). Second Place Is First of the Losers: An Analysis of
Competitive Balance in Formula One. J Sport Econ, 14, 411-439.
King, G., & Zeng, L.C. (2006). The dangers of extreme counterfactuals. Polit Anal, 14, 131-159.
Loiacono, D., Lanzi, P.L., Togelius, J., Onieva, E., et al. (2010). The 2009 Simulated Car Racing
Championship. Ieee T Comp Intel Ai, 2, 131-147.
Muehlbauer, T. (2010). Relationship between Starting and Finishing Position in Formula One Car
Races. Int J Perf Anal Spor, 10, 98-102.
Phillips, A.J. (2014). Uncovering Formula One driver performances from 1950 to 2013 by adjusting
for team and competition effects. J Quant Anal Sport, 10, 261-278.
Rasbash, J., Charlton, C., Browne, W.J., Healy, M., et al. (2013). MLwiN version 2.28. University of
Bristol: Centre for Multilevel Modelling.
Spiegelhalter, D.J., Best, N.G., Carlin, B.R., & van der Linde, A. (2002). Bayesian measures of model
complexity and fit. J Roy Stat Soc B, 64, 583-616.
Spurgeon, B. (2009). Age Old Question of Whether it is the Car or the Driver that Counts. Formula 1
101. http://formula1.about.com/od/formula1101/a/Age-Old-Question-Of-Whether-It-IsThe-Car-Or-The-Driver-That-Counts.htm [Accessed 17th March 2015].
Spurgeon, B. (2011). The Art of Racing in the Wet in Formula 1. Formula 1 101.
http://formula1.about.com/od/howaraceworks/a/The-Art-Of-Racing-In-The-Wet-InFormula-1.htm [Accessed 17th March 2015].
23
Team-Year
Observation
24
Figure 2: Plot of the top 20 driver-level residuals, representing the top 20 drivers of all time
(1950-2014) according to our model. Number of drivers and race competitiveness are
controlled. Based on model (b) in table A1. The residual value represents the number of
positions one would expect the driver to finish ahead of an average driver driving for an
equally good (or the same) team, with more negative numbers indicating a better position.
25
Figure 3: Plot of the top 20 driver-level residuals, representing the top 20 drivers of all time
(1950-2014) according to our model. Number of drivers and race competitiveness are
controlled. Michael Schumacher is treated as two drivers (pre and post retirement), with
only his pre-retirement performances represented in the graph. Based on model (d) in table
A1. 95% confidence intervals are shown.
26
Figure 4: Plot of the top 20 driver-level residuals, representing the top 20 drivers of all time
(1950-2014) according to our model in (a) wet and (b) dry conditions. Number of drivers and
race competitiveness are controlled. Michael Schumacher is treated as two drivers (pre and
post retirement), with only his pre-retirement performances represented in the graph. 95%
confidence intervals are shown. Based on the model in table A5.
27
Figure 5: Plot of the top 20 driver-level residuals, representing the top 20 drivers of all time
(1950-2014) according to our model, on (a) permanent, (b) temporary and (c) street circuits.
Number of drivers and race competitiveness are controlled. Michael Schumacher is treated
as two drivers (pre and post retirement), with only his pre-retirement performances
represented in the graph. 95% confidence intervals are shown. Based on the model in table
A6.
28
Figure 6: variance partitioning between team, team-year and driver levels (data from 1979
only). Number of drivers and competitiveness of race are controlled. Based on model (a) in
table A1. 95% confidence intervals are shown.
Figure 7: Variance as a function of Year (data from 1979 only). Number of drivers and
competitiveness of race are controlled. Based on model (e) in table A2. 95% CIs are shown.
29
Figure 8: Variance as a function of weather (data from 1979 only). Number of drivers and
competitiveness of race are controlled. Based on model (e) in table A3. 95% CIs are shown.
Figure 9: Variance as a function of track type. Number of drivers and competitiveness of race are
controlled. Based on model (h) in table A4. 95% CIs are shown.
30
Online appendix for Bell, Smith, Sabel and Jones Formula for success: multilevel modelling of Formula One driver
and constructor performance, 1950-2014 tables of model estimates.
Table A1: Models showing variance partitioning, controlling for competitiveness and the number of drivers.
(a)
1979-2014
(b and c) 1950-2014 - Schumacher as 1 driver
Estimate
95% CIs
Estimate
95% CIs
Estimate
Estimate
95% CIs
Estimate
95% CIs
Fixed Part
Constant
14.238
13.582
14.919
14.146
13.814
14.486
14.167
13.836
14.505
14.149
13.815
14.492
14.172
13.839
14.513
Ndrivers -gm
0.453
0.339
0.566
0.472
0.414
0.530
0.425
0.387
0.464
0.471
0.413
0.529
0.425
0.386
0.463
Comp -gm
-0.365
-0.719
-0.01
-0.240
-0.464
-0.016
-0.236
-0.459
-0.012
5.298
3.280
8.179
2.897
2.018
3.986
2.846
1.981
3.920
2.994
2.097
4.105
2.945
2.066
4.045
3.227
2.625
3.917
2.926
2.457
3.443
2.953
2.484
3.472
2.864
2.402
3.378
2.890
2.422
3.406
Random Part
Team Variance
Team-Year
Variance
Driver Variance
1.341
0.909
1.882
1.667
1.245
2.165
1.625
1.213
2.109
1.681
1.265
2.177
1.644
1.235
2.126
Level 1 Variance
35.202
34.373
36.052
35.870
35.167
36.587
35.882
35.174
36.602
35.853
35.148
36.572
35.865
35.158
36.583
DIC:
90311.53
133328.9
133333.4
133318.2
133322.6
95% CIs
Estimate
Estimate
Estimate
Estimate
95% CIs
95% Cis
95% CIs
95% CIs
Fixed Part
Cons
14.242
13.582
14.922
14.258
13.582
14.965
14.211
13.565
14.878
14.223
13.561
14.911
14.23
13.548
14.941
Ndrivers -gm
0.452
0.338
0.566
0.452
0.339
0.565
0.456
0.342
0.570
0.452
0.338
0.566
0.455
0.342
0.568
Comp -gm
-0.377
-0.748
-0.006
-0.375
-0.746
-0.003
-0.342
-0.712
0.026
-0.372
-0.744
0.001
-0.353
-0.724
0.019
Year -gm
-0.004
-0.038
0.031
0.001
-0.055
0.056
-0.004
-0.040
0.032
-0.009
-0.046
0.028
-0.003
-0.060
0.054
5.305
3.285
8.203
4.686
2.735
7.509
4.799
2.883
7.566
5.373
3.311
8.319
4.727
2.77
7.591
Covariance
0.085
-0.007
0.203
0.090
-0.003
0.212
Year
Team-Year
Level
0.011
0.004
0.023
0.011
0.004
0.023
2.873
2.316
3.512
2.145
1.406
2.933
Random Part
Team Level
Cons
Cons
3.236
2.633
3.928
2.448
1.636
3.332
3.211
2.606
3.907
Covariance
0.036
-0.002
0.079
0.016
-0.017
0.052
Year
0.009
0.003
0.018
0.008
0.003
0.015
1.329
0.899
1.865
Driver Level
Cons
0.960
0.518
1.511
0.898
0.496
1.406
Covariance
-0.014
-0.040
0.010
-0.022
-0.047
0.001
Year
0.004
0.001
0.008
0.004
0.001
0.008
35.172
34.34
36.018
35.185
34.352
36.037
Level 1 Var
DIC:
1.345
35.202
90311.87
0.912
34.373
1.887
36.052
1.246
35.213
90306.32
0.837
34.383
1.756
36.068
35.211
90310.3
34.378
36.061
90305.45
90298.35
Table A3: Models with variance as a function of weather (dry/wet contions), 1979-2014
Estimate
Estimate
Estimate
Estimate
Estimate
95% CIs
95% CIs
95% CIs
95% CIs
95% CIs
Fixed Part
Cons
14.241
13.581
14.92
14.289
13.606
14.985
14.212
13.556
14.894
14.251
13.593
14.937
14.29
13.605
14.985
Ndrivers -gm
0.453
0.339
0.567
0.451
0.337
0.566
0.456
0.341
0.570
0.453
0.339
0.568
0.454
0.339
0.568
Comp -gm
-0.365
-0.719
-0.011
-0.357
-0.711
-0.002
-0.370
-0.723
-0.019
-0.363
-0.717
-0.008
-0.360
-0.710
-0.007
Wet
0.006
-0.286
0.298
-0.34
-0.799
0.100
0.001
-0.308
0.309
-0.088
-0.418
0.238
-0.348
-0.816
0.103
5.299
3.285
8.191
5.695
3.574
8.733
5.227
3.226
8.098
5.280
3.258
8.175
5.655
3.542
8.667
Covariance
-1.173
-2.385
-0.237
-1.014
-2.200
-0.076
Wet
0.821
0.296
1.726
0.703
0.223
1.563
3.219
2.619
3.908
3.362
2.724
4.093
Random Part
Team Level
Cons
Team-Year Level
Cons
3.227
2.625
3.916
3.407
2.759
4.150
3.244
2.641
3.937
Covariance
-0.650
-1.377
0.012
-0.481
-1.165
0.131
Wet
0.948
0.232
2.210
0.594
0.132
1.648
1.342
0.908
1.882
Driver Level
Cons
1.441
0.986
2.005
1.409
0.962
1.967
Covariance
-0.327
-0.764
0.031
-0.177
-0.579
0.175
Wet
0.318
0.073
0.868
0.215
0.046
0.644
35.166
34.334
36.013
35.046
34.212
35.900
Level 1 Var
DIC:
1.341
35.205
90313.5
0.910
34.375
1.880
36.055
1.352
35.124
90296.1
0.918
34.293
1.891
35.976
35.094
90305.48
34.257
35.95
90310.03
90293.46
Table A4: Models with variance as a function of track type (permanent/temporary/street), 1979-2014
(a) Track type Fixed Effects only
Estimate
Estimate
Estimate
Estimate
Estimate
95% CIs
95% CIs
95% CIs
95% CIs
95% CIs
14.234
14.235
95% CIs
Fixed Part
Cons
Ndrivers -gm
14.235
13.579
14.917
14.255
13.581
14.951
14.275
13.59
14.983
14.239
13.581
14.919
14.221
13.569
14.897
13.575
14.906
13.574
14.908
14.294
13.586
15.01
0.455
0.34
0.571
0.454
0.338
0.569
0.454
0.338
0.569
0.454
0.338
0.569
0.457
0.341
0.574
0.455
0.339
0.57
0.456
0.340
0.572
0.455
0.338
0.571
Comp -gm
-0.369
-0.726
-0.014
-0.375
-0.731
-0.019
-0.366
-0.721
-0.011
-0.369
-0.726
-0.013
-0.370
-0.725
-0.015
-0.369
-0.723
-0.013
-0.368
-0.723
-0.012
-0.376
-0.734
-0.020
Temp
-0.012
-0.401
0.376
-0.208
-0.728
0.302
-0.015
-0.403
0.373
-0.009
-0.420
0.401
-0.012
-0.400
0.375
-0.103
-0.546
0.336
-0.012
-0.399
0.376
-0.242
-0.787
0.294
Street
0.027
-0.259
0.314
0.026
-0.261
0.313
-0.206
-0.584
0.161
0.027
-0.26
0.312
0.014
-0.287
0.312
0.028
-0.259
0.314
-0.034
-0.399
0.323
-0.161
-0.588
0.258
5.293
3.267
8.197
5.544
3.477
8.507
5.778
3.631
8.840
5.311
3.281
8.215
5.161
3.179
7.996
5.287
3.268
8.179
5.303
3.277
8.205
5.958
3.745
9.128
temp/cons
-0.73
-1.915
0.293
-0.753
-1.981
0.251
Temp
0.588
0.156
1.521
Random Part
Team Level
Cons
street/cons
-0.921
-1.838
-0.242
street/temp
Street
0.313
0.097
0.740
3.225
2.625
3.911
0.520
0.140
1.361
-0.801
-1.765
-0.084
0.193
-0.077
0.614
0.303
0.092
0.741
3.541
2.867
4.315
Team-Year Level
Cons
3.227
2.625
3.918
3.223
2.623
3.909
3.22
2.611
3.921
temp/cons
-0.002
-0.927
0.875
-0.186
-1.102
0.673
Temp
1.870
0.456
4.266
1.368
0.251
3.647
-0.987
-1.693
-0.374
0.497
-0.196
1.540
0.764
0.251
1.668
street/cons
3.572
-1.147
2.902
-1.851
4.337
3.223
2.623
3.913
3.235
2.633
3.927
-0.514
street/temp
Street
0.828
0.306
1.758
1.352
0.918
1.891
Driver Level
Cons
1.341
0.910
1.880
1.336
0.906
1.870
1.336
0.907
1.869
1.343
0.912
1.879
temp/cons
Temp
1.387
0.95
1.933
1.340
0.901
1.895
-0.304
-0.865
0.198
-0.170
-0.697
0.292
0.686
0.175
1.758
0.625
0.168
1.611
0.024
-0.467
0.466
0.428
-0.089
1.170
street/cons
1.366
-0.157
0.925
-0.645
1.917
0.282
street/temp
Street
Level 1 Cons
DIC:
35.207
90315.42
34.379
36.058
35.179
90312.85
34.347
36.030
35.180
90311.3
34.348
36.031
35.086
90309.78
34.246
35.944
35.104
90301.24
34.270
35.957
35.172
90315.46
34.342
36.021
1.058
0.393
2.032
0.977
0.348
1.940
35.078
34.247
35.927
34.840
34.003
35.696
90299.5
90277.19
Table A5: Separately coded model with variance as a function of weather, 1950-2014
Weather with separate coding
Estimate
95% CIs
Fixed Part
cons
Ndrivers -gm
Comp -gm
Wet
14.218
0.469
-0.247
-0.289
13.868
0.413
-0.429
-0.654
14.577
0.526
-0.065
0.073
4.117
3.341
3.188
2.859
2.201
1.952
5.683
4.770
4.830
Random Part
Team Level
Dry
Cov
Wet
Team-Year Level
Dry
Cov
Wet
Driver Level
Dry
Cov
Wet
Level 1 Variance
3.182
2.548
2.756
2.644
1.935
1.845
3.783
3.209
3.871
1.840
1.654
1.757
36.297
1.371
1.145
1.058
35.580
2.397
2.246
2.651
37.030
DIC:
136059
Table A6: Separately coded model with variance as a function of track type, 1950-2014
Track type with separate coding
Estimate
95% CIs
Fixed Part
cons
Ndrivers -gm
Comp -gm
Temporary
Street
Random Part
Team level
Permanent
Perm/Temp Cov
Temporary
Perm/Street Cov
Street/Temp Cov
Street
Team-Year level
Permanent
Perm/Temp Cov
Temporary
Perm/Street Cov
Street/Temp Cov
Street
Driver level
Permanent
Perm/Temp Cov
Temporary
Perm/Street Cov
Street/Temp Cov
Street
Level 1 Variance
DIC:
14.186
0.471
-0.245
-0.227
-0.104
13.829
0.413
-0.429
-0.874
-0.496
14.552
0.528
-0.061
0.428
0.288
4.121
3.419
4.188
3.362
3.086
3.561
2.854
1.993
2.217
2.214
1.710
2.250
5.709
5.239
7.066
4.796
4.877
5.275
3.299
2.844
4.778
2.205
2.466
2.502
2.726
1.911
2.775
1.609
1.455
1.728
3.943
3.852
7.297
2.838
3.587
3.418
1.883
1.373
2.410
1.657
1.618
2.438
36.102
1.394
0.671
1.234
1.144
0.782
1.636
35.383
2.463
2.179
4.092
2.253
2.604
3.410
36.834
136085.7
Table A7: Top 50 drivers based on the driver level residuals from model A1b (Michael Schumacher
treated as a single driver)
Rank
Driver
Residual
Rank
Driver
Residual
-3.663
26
Richie Ginther
-1.465
Alain Prost
-3.227
27
Denny Hulme
-1.447
Fernando Alonso
-3.079
28
Nick Heidfeld
-1.445
Jim Clark
-3.040
29
Kimi Raikkonen
-1.431
Ayrton Senna
-2.701
30
Carlos Reutemann
-1.398
Jackie Stewart
-2.596
31
Tom Pryce
-1.396
Nelson Piquet
-2.535
32
Carlos Pace
-1.380
Emerson Fittipaldi
-2.506
33
John Watson
-1.328
Michael Schumacher
-2.444
34
Mike Hawthorn
-1.319
10
Sebastian Vettel
-2.399
35
Stirling Moss
-1.296
11
Lewis Hamilton
-2.245
36
Jean-Pierre Beltoise
-1.285
12
Christian Fittipaldi
-2.194
37
Patrick Depailler
-1.284
13
Nico Rosberg
-1.995
38
Heinz-Harald Frentzen
-1.270
14
Jenson Button
-1.945
39
Rubens Barrichello
-1.227
15
Graham Hill
-1.944
40
Jack Brabham
-1.209
16
Dan Gurney
-1.911
41
Luigi Fagioli
-1.208
17
Jody Scheckter
-1.740
42
Daniel Ricciardo
-1.194
18
Damon Hill
-1.716
43
Alan Jones
-1.184
19
Louis Rosier
-1.632
44
Martin Brundle
-1.168
20
Ronnie Peterson
-1.601
45
Bruce McLaren
-1.154
21
Elio de Angelis
-1.587
46
Thierry Boutsen
-1.119
22
Pedro Rodrguez
-1.557
47
Keke Rosberg
-1.110
23
Marc Surer
-1.514
48
Clay Regazzoni
-1.054
24
Robert Kubica
-1.478
49
Jacques Villeneuve
-1.030
25
Nino Farina
-1.477
50
Mark Webber
-1.014
Table A8: Top 50 drivers based on the driver level residuals from model A1d (Michael Schumacher
treated as two drivers, pre 2006 and post 2010)
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Driver
Juan Manuel Fangio
Alain Prost
Michael Schumacher (pre-2006)
Fernando Alonso
Jim Clark
Ayrton Senna
Jackie Stewart
Nelson Piquet
Emerson Fittipaldi
Sebastian Vettel
Christian Fittipaldi
Lewis Hamilton
Graham Hill
Jenson Button
Dan Gurney
Jody Scheckter
Damon Hill
Louis Rosier
Ronnie Peterson
Elio de Angelis
Pedro Rodrguez
Marc Surer
Nino Farina
Robert Kubica
Richie Ginther
Residual Rank
-3.682
26
-3.240
27
-3.112
28
-3.103
29
-3.051
30
-2.718
31
-2.607
32
-2.565
33
-2.513
34
-2.386
35
-2.211
36
-2.041
37
-1.949
38
-1.946
39
-1.921
40
-1.771
41
-1.725
42
-1.641
43
-1.605
44
-1.598
45
-1.559
46
-1.530
47
-1.511
48
-1.490
49
-1.489
50
Driver
Denny Hulme
Nick Heidfeld
Kimi Raikkonen
Carlos Reutemann
Tom Pryce
Carlos Pace
Rubens Barrichello
Mike Hawthorn
John Watson
Stirling Moss
Patrick Depailler
Jean-Pierre Beltoise
Heinz-Harald Frentzen
Luigi Fagioli
Martin Brundle
Jack Brabham
Daniel Ricciardo
Alan Jones
Bruce McLaren
Thierry Boutsen
Nico Rosberg
Keke Rosberg
Clay Regazzoni
Jacques Villeneuve
Jean Alesi
Residual
-1.453
-1.451
-1.445
-1.411
-1.407
-1.379
-1.374
-1.342
-1.325
-1.302
-1.300
-1.289
-1.281
-1.223
-1.219
-1.219
-1.196
-1.188
-1.149
-1.140
-1.110
-1.106
-1.076
-1.038
-1.028