Sunteți pe pagina 1din 17

The dynamics of competitive sports are governed by a complex underlying

framework of player attributes and group dynamics (Macdonald, Weld, Arney).


Generally speaking, a collection of great players is assumed to yield great results,
and organizations fight constraints of both the availability of such talent, and the
inherent costs associated with acquiring them, to grow rosters with maximum
potential (Macdonald). Identifying productive players is therefore critical to team
management when drafting and trading players, and targeting free agents
(Macdonald). In Professional sport leagues such as the National Hockey League
(NHL), a teams general manager (GM) is tasked with assembling a winning team.
In the NHL, a GMs job is complicated by the presence of a hard salary cap, which
limits how much the team can spend on its players (Chan, Cho, Novati). Because
the salary cap aims to facilitate parity across the league, GMs must build wellbalanced teams by leveraging any competitive advantage they can find. To make
informed decisions, these decision makers must be able to measure performance of
different players and their contribution of these players to the teams overall
performance (Chan). While prospective player value can be assessed using
individual statistics of past performance, these measures can carry inherent biases,
and arguably do not represent a players total on-ice contribution (Macdonald).
There is much more to a players performance than the commonly referenced goals,
assists, and plus minus statistics. Flaws in these measures include that they do not
account for shorthanded and power play usage, nor the strength of their team and
teammates (Macdonald). In hockey, penalties committed by a player (e.g., tripping
an opposing player with the hockey stick) result in the offending player going to the
penalty box for at least two minutes or until the opposing team scores a goal,
depending on the type and severity of the penalty. During this period, the offending
team is on a penalty kill or is short-handed because it is one player short; the
other team is on a power play because it has a one-player advantage (Chan).
An NHL team typically dresses 20 players for each game: 12 forwards, 6
defensemen and 2 goalies. Forwards are usually partitioned into four lines of 3
forwards each (one center, one left wing, one right wing). Defensemen are paired
off in three pairs. Of the two goalies, one starts and typically plays the entire game,
whereas the other is a backup and sits in reserve in the event the starter is injured
or plays poorly (Chan). The standard number of players that can be on the ice at
the same time from one team is six: three forwards, two defensemen, and one
goalie. Players on the ice may switch with those on the bench during certain
stoppages in play or anytime that the puck is live (as long as no more than six
players of one team are on the ice at any time). The players may move on and off
the ice while the puck is in play adds to fluidity of the game. Typically, an entire line
(three forwards) will go off at once and a new line will replace it (Chan). The same
is true for the defensive pairings. A team can also exchange its goalie for an extra
forward or defensemen at any time a legal substitution can be made. This is often

done during the final minutes of the game by a team that is trailing by one or two
goals (Chan). Because of the complexity and fluidity of hockey developing
advanced statistics that are capable of better quantifying player contributions is an
aspiration of many, advocates of analytics are optimistic that they have uncovered
a systematic approach for gaining valuable insights into the chaotic sport of hockey.

Analytics is defined as the discovery and communication of meaningful patterns in


data. Especially valuable in areas rich with recorded information, analytics relies on
the simultaneous application of statistics, computer programming and operations
research to quantify performance. (wikepidia) Sports analytics has been gaining
widespread notoriety in the general populace. Michael Lewis entertaining story
about the use of data analysis in baseball in Moneyball: The Art of Winning an
Unfair Game (Lewis 2004) is arguably the most visible account of sports analytics.
Many of the strategies documented in Moneyball, which were then employed by the
by the small market Oakland Athletics team to help it compete with teams with
much larger payrolls, have been adopted in some form by many other Major League
(MLB) teams and teams in other sports (Fry, Ohlmann). This thesis sets out to
investigate what analytics are, what role they currently play in sports focussing
primarily on hockey, what their limitations are; and what analytics might look like in
the future.
Sports Analytics didnt exist as a real job description until long after it was a job for
people like Bill James, Pete Palmer, and Tom Tango. They, among others, took to
writing about baseball and using numbers to better understand players and tactics
roughly in the 1970s. The internet came about in the mid-1990s and allowed so
many more people to write, people who may not have had connections to other
people but had connections to the world electronically (Dean Oliver). Analytics is a
relatively new and rapidly evolving set of tools in the business world, and these
tools are being adapted more and more to the world of sports. Analytics includes
advanced statistics, data management, data visualization, and several other fields
(Alamor). Neil Greenberg defines analytics as using all available resources (data,
video, scouting, ect.) in concert to reduce the gap between potential and reality in
sports performance, at the team and individual level (Gordon, 2014). Ed Feng says
analytics is using numbers as a tool to better understand complex phenomena
(Gordon, 2014). Alamar (2013) defines sport analytics as the management of
structured historical data, the application of predictive analytical models that utilize
that data, and the use of information systems to inform decision makers and enable
them to help their organizations in gaining a competitive advantage on the field of
play (p.3). Whatever role analytics play today or in the future, one thing is for

certain they are here to stay and it is safe to say that we are in the infancy of the
technological age of sport analytics especially in hockey.
Every sports organization faces its own set of challenges in introducing and
developing analytics as part of the decision-making process, but understanding the
components of an analytics program will help managers maximize the competitive
advantage they can gain from their analytic investment (Alamor). Hockey is a fluid
sport with players coming on and off the ice without the stoppage of play. It is also
a relatively low scoring sport compared to other sports such as basketball. Both of
these features make evaluation of players difficult (Macdonald (1), Ferrari (2) and
Awad (3)). Detroit Red Wings coach Mike Babcock is of the belief that the more
information he can gather the better he will be at evaluating and

running his team. That's why Babcock is interested in the growing movement of
advanced statistical analysis in the NHL. The push in sports-as in business- to use
analytic tools come from advances in computing power and the availability of
massive amounts of data to both teams and the public, which creates an
opportunity for competitive advantage (Alamor). Having access to information that
competitors do not has a long history of providing teams and businesses with
advantages. Teams such as the Oakland As, Tampa Bay Rays, and San Antonio
spurs have embraced the use of analytics, and all three clubs, though they are in
small markets and so have limited resources, have seen tremendous success, in
part because of the information edge gained by their analytics programs. The three
main components of sports-analytics are; data management, predictive models, and
information systems, and the purpose of any analytics program is to aid the
organizations decision makers (personnel, executives, coaches, trainers, as well as
anyone else involved) in gaining a competitive advantage. Putting the three
components together with the motivation for the program suggests the framework
for sports analytics depicted in Figure 1.1 (Sports Analytics Framework figure 1.1 pg
4 in sports analytic book alamor).
This framework demonstrates the flow of data through an organization as well as
the transformation of that data into actionable information. All types of data first
get organized and processed by the data-management function then provides data
to analytic models and information systems (Alamor). The analytic models use data
in either a standardized fashion to provide results to the information system or on
an ad-hoc basis to answer specific questions for a decision maker. The information
system then presents the resulting information to the decision maker in an efficient
and actionable manner. The fourth leg of the analytic table is leadership.
Understanding the tools of sports analytics is important to create a competitive

advantage, but without leadership that creates an effective analytics strategy and
pushes for the use of analytics within the organization, no analytic investment will
reach its full potential (Alamor). According to Justin Zormelo who states my goal is
to figure out the most efficient, easiest way for my players to win. I use statistics to
figure out the quickest way for us to reach that number before someone else
reaches that number. (ESPN). Zormelo became Kevin Durants personal basketball
coach, through analytics he was able to improve certain aspects of his game such
as footwork, improve his reaction time, increase speed and efficiency, vision and his
performance on the court. In 2014, Kevin Durant became the NBAS Most Valuable
Player. Whenever he (Zormelo) starts working with a new player he takes shot
charts from existing NBA All-Stars and then compares them with his new client. He
suggests that this method shows new players where the kill zones are for the best
in the world and it also allows him to set a percentage goal for his players and their
teams. He quantifies such measurements as distance run during a game in turns
of miles, percentage of success when dribbling to the left versus dribbling to the
right, and the difference between wide open

Jumpers and contested jumpers, he knows the optimal angles for his players on
their cuts to the basket, on their release points and their jump shots (ESPN).
Zormelo contends that Dirk Nowetski, NBA All-Star for the Dallas Mavericks only has
his hands on the ball for a one minute and seventeen seconds during a forty eight
minute game (ESPN). Insinuating that the best players in the world have to make
quick decisions with the basketball in order to be effective, which also allows them
to conserve energy and focus on what makes them successful; in Dirks case
shooting the ball. Decision makers such as Justin Zormelo need to identify the
areas of a players game that the player should focus on in his development,
determine routines for the player to improve, and provide targets and goals so the
player and decision makers know whether the player is progressing as planned.
Analytics can play a key role in this process by assisting decision makers in
identifying goals for the player that will support the team, as well as tracking,
analyzing, and projecting progress so all interested parties know whether a player is
developing.
Building on this framework, the two main goals of the analytics program become
clear. First, a strong sports analytics program will save decision maker time by
making all of the relevant information for evaluating players or teams or prospects
efficiently available (Alomar). Instead of having information scattered among a
variety of different departments within an organization a well organized analytics

program will have all of its information organized and accessible to all of the
decision makers. Consider for a moment all the information that such as financial
projections, medical reports, performance data, as well as all the quantitative and
qualitative data including game-performance statistics, scouting reports and game
video. The role of data management within analytics program is to organize,
centralize, and streamline how data comes into the teams various functions. Good
analytics systems provide decision makers time to analyze relevant information
instead of gathering it (Alamor). The second goal of a sports analytics program is to
provide decision makers with novel insight. As the breadth and depth of the
available data expand the possibility of gaining useful information from those data
grows. Analytic models have many uses, but their core function is to turn raw data
into reliable and actionable information. Careful analysis takes all of the data, finds
meaningful insight into a player or teams current or future performance (Alamor).
NBA teams such as the Portland Trailblazers and Boston Celtics, NFL teams such as
the Philadelphia Eagles and New England Patriots, and MLB teams such as the Saint
Louis Cardinals and San Diego Padres have all had success using analytic tools to
inform the draft process. The Celtics, for example, were able to pick future all-star
Rajon Rondo with the twenty first pick in the 2006 NBA draft part because they
identified rebounding by guards as an undervalued skill in the NBA. As other teams
were picking Randy Foye (seventh to the Minnesota Timberwolves) and Quincy
Douby (nineteenth to the Sacramento Kings), the Celtics were able to select a
player who would develop into one of the top point guards in the league because
other teams did not understand his potential value the way the more analytic
Celtics did (Alamor). Analytic models provide additional insight into draft decisions
by adjusting a players statistics to make them more comparable. For example,
when calculating a quarterbacks yards per pass attempt, a good model will adjust
the raw data to account for the strength of the opposition that the player faced as
well as the abilities of his teammates. This adjustment is still just the first step
because by itself adjusted yards per attempt is still

just a data point. By comparing that adjusted yards per attempt (and other
variables) to the data from all the quarterbacks that have been drafted in the past,
along with their success or lack thereof in the NFL, the analytic model can turn that
data into a probability that the quarterback will be successful at the professional
level (Alomar). Another example of investigating different metrics is Dean Olivers
Floor percentage. In this analytic model Oliver is measuring the percentage of a
teams possessions (in basketball) in which the team scores at least one point.
Basically, floor percentage answers the question what percentage of a teams
times down the floor does it score? A possession in which a team scores at least
one point is known as a scoring possession (Oliver). Analytic models allow decision
makers to gain insight into teams and players that are not possible without

advanced statistical analysis (Alamor). Combining statistical projections with the


input of qualified coaches and decision makers allows them to better predict areas
of improvement or increase future performances. According to Dean Oliver
Predicting the future often means understanding things well enough to be able to
change the future. As stated earlier Justin Zormelo tries to find as much data as
he possibly can on his subjects and then implementing it and getting them better.
He takes those raw numbers and working on whatever drill it is or working on
whatever move with the intention of changing their performance and ultimately the
outcome of the game for them to win(ESPN). It is important to note that analytic
models provide information; they do not make decisions. There are a host of factors
that determine how successful a player will be at the professional level. Many of
these can be accounted for in analytic models, but it is up to decision makers to
weigh all of the relevant information. One example is Sean Smith who developed a
technique he called win Shares, naming it after Bill James technique that he
introduced in his recent Historical Baseball Abstract. Smiths method introduces the
concept of a margin- that level of team points below which team would ever win.
He sets that level at 60 percent of the league average for points scored. By setting
a margin, every point beyond that margin wins games, he says. Each team then
wins a certain number of games, and they may need different numbers of points to
get those wins. So he determines how many points above the margin (in
basketball) equate to a win, and then parses them out based upon a Tendex-like
statistic. The goal of the analytic model is to support the decision-making process
through richer and more accurate input. Analytic models can be powerful tools in
allowing a decision maker to understand a players potential in a new light (Alamor).
A prime example of this is the ERA (Earned run Average) in baseball this metric
which has become commonplace in baseball is actually a great measurement of
ability. ERA indicates how many earned runs a pitcher gives up per twenty-seven
outs he records. As Dean Oliver states baseball doesnt claim that an ERA is a
perfect measure of a pitchers ability, but it works pretty well, starting pitchers in
baseball with consistently low ERAs are usually among the winningest and most
highly valued pitchers. This raises an interesting metric based on usage or
possessions, another example of this is Dean Olivers basketball version which
calculates offensive and defensive ratings measuring points allowed on both sides
of the ball per hundred possessions. Later in this thesis we will look at how these
concepts apply to hockey. Additionally analytic systems can automatically detect
how an upcoming opponents performance has been evolving and can identify the
cause of any changes. For example, it is straightforward for an NBA coach to see
that an upcoming opponent lost six of its last seven games. It is not at all
straightforward for the coach to go through each of those games to determine the
cause of the losses. An analytic system can demonstrate that each of the losses
came against teams that had twice as many three-point attempts from the left
corner than they did against other opponents- giving the coach the insight that the
upcoming opponent does not defend the left corner well (Alamor). Another
interesting example is Dean Olivers usage of box scores in basketball. According to

Dean Oliver a box score provides a summary of the entire game. Its a bigger
picture that what a game recap typically gives someone, because the box score
gives a picture of one game, it provides valuable means of evaluating players and
teams over an appropriate time span. The box score according to Oliver provides all
sorts of plays, all sorts of offenses, and a large variety of player combinations-much
of the information necessary to understand a team. From the box score he can
break down exactly what happened in the game and what were the critical points in
the game or where momentum shifted in favour of one team or the other. The other
information a box score can provide whether or not the game was won on offense or
defense end. Analytics can assist with all of the functions of a sports organization
once the goals of the analytics program are clear. According to Rob Simpson
assistant general manager with the London Knights in the Ontario Hockey League
(OHL) our analytics program focuses on scoring chances plus or minus per shift
because thats what our head coach values the most. Thats the metric we focus
our attention on and it has made a positive contribution to our organization.
In order to gain the most accurate picture of how analytics is employed across
sport, it is useful to first benchmark organizations against the rest of the industry.
While it is generally known which teams employ some level of statistical analysis,
there is a wide range of sophistication in the actual use of analytics. The Sports
Analytics Use Survey (SAUS) is the first survey to explore the use of analytics in
sports organizations in line with the definition and goals of used here. SAUS
questions how different tools (Data Management, predictive models, and
information system) of analytics are used and managed within a sports
organization. Twenty-seven people representing teams from the National Football
league, Major League Baseball, the National Basketball Association, and the English
Premier League responded to the survey. The responses show that some
organizations have embraced all facets of sports analytics, but there is still
significant room for growth and improvement and thus an opportunity for
competitive advantage (Alamor). Both technical issues and management issues
were identified as areas of potential growth for teams use of sports analytics.
Because one of the primary purposes of sports analytics is to save time for decision
makers, SAUS focussed its survey on how data was stored within an organization.
Among respondents 60 percent use five or more sources of information on a regular
basis. The time spent accessing each additional source of information is time that
the decision maker can be given back through efficient information systems
(Alamor). According to Alamor in order to create efficient information systems,
data must be centralized, however SAUS found that only 31.3 percent of
respondents reported that all data are centralized, and another 31.3 percent
reported that only some data are centralized. Again there is opportunity to gain a
competitive advantage here through better data management. Centralization is a
building block for efficient information systems, and teams that have not taken that
step are wasting the time benefits that information systems can provide. The
opposite end of the spectrum from centralized data is having access to data

dependent upon one person. Nearly all organizations report that access to some
data is dependent upon one person, and 43.7 percent report that access to most
data is dependent upon one person. This suggests that access to massive amounts
of valuable information within an organization is highly constrained. Teams that
have invested heavily in analytics and still have data that are not centralized and
are highly inaccessible are not maximizing their analytic investment (Alamor). The
survey also asked if data was being checked for errors as any analysis of bad data
cannot reliably produce good information, how many data programmers were
dedicated to the sports side of the organization, how many statistical analysts were
employed, and finally if their analytic resources were in-line with the teams
strategic plan. (See all graphs from report). It is important to point out that only
27 percent strongly agreed that analytic resources were in line with the strategic
plan. The various components of analytics must support (and even inform) the
strategic plan in order to provide teams with a significant long-term advantage over
their competitors (Alamor). The results of the SAUS provide an important window
into the current position of analytics in sports organizations. Teams are clearly
investing in analytics through hiring personnel and creating more advanced data
systems. Since the field is new, however, teams are not always clear on how to
manage their analytic investment to maximize their return (Alamor).
Sports analytics is a tool very much in its infancy. Only a handful of teams are
thinking about analytics in a truly comprehensive manner, and fewer have
implemented comprehensive programs (Alomar). Many teams are using some sort
of statistical analysis, typically to support player evaluation, and some are using
analysis to support coaching and financial decision as well. But increasing the
chance for long-term success through analytics is dependent on having strong
analytics personnel and organizational structure. According to an article, written in
the Toronto Star throughout the NHL; teams like the Toronto Maple Leafs, New
Jersey Devils, Pittsburgh Penguins and Edmonton Oilers have been gobbling up
executives and statisticians who can walk them through analytics, or advanced
statistics. Once a team has decided to introduce analytics into its decision-making
processes, the challenge is to determine how analytics will fit in an already
established organizational structure (Alamor). Organizations such as the Toronto
Maple leafs hired Kyle Dubas as assistant general Manager to Dave Nonnis. Dubas
a 28 year old born with hockey in his blood that has learned to at least listen to
what the numbers are telling him. In that same article it was mentioned that the
Oilers hired blogger Tyler Delow, whose website mchockey79.com was among the
leading analytics sites and has subsequently been shut down. The New Jersey
Devils have hired Sunny Mehta, a former pro poker player who turned his attention
to analytics. (Toronto Star August 10, 2014). Every organization needs to assess
what their individual needs are and decide how they are going to go about
organizing their analytic process. However its clear there is no one correct or
conventional way to address their needs for an analytics department. According to
Alomar there are two basic models that can be used: either a centralized analytic

department, in which all of analytic resources employed by the team (both human
and technical) are organizationally managed together, or a decentralized model, in
which resources needed by the personnel department are managed by the
personnel department, the resources used by the coaching staff are managed by
the coaches, and so on. Hybrid models that combine the centralized and
decentralized approaches are also possible. Typically, in these organizations the
statistical analysts are specialized to a particular function while the data managers
are a shared resource. As Alomar points out each of these models have both their
strengths and weaknesses, and there is no absolute prescription for success.
Understanding the strengths and weaknesses of each approach is vital to decide
which is in the best long-term interests of a particular team (Alamor).
The Centralized model tends to use resources more efficiently as much of the
technological investment can be shared among team functions. There is, however a
risk in this model, particularly when human resources are low, that one function
could dominate to the detriment of others. The decentralized model according to
Alamor allows each analyst to focus all of his or her time in a particular area and
develop more understanding of its non-analytic aspects instead of relying on an
outsider for information, but that comes at the cost of reducing interaction among
analysis-perhaps reducing advances in the analysis. The ultimate model for the
analytic program will depend greatly on the resources a team is willing to invest in
analytics as well as the willingness of non-analytic personnel to engage with the
tools of analytics (Alomar). Deciding on how a team can best implement an
analytics program to gain a competitive advantage requires understanding each of
the three components of analytics (data management, analytic models, and
information systems) as well as the structural and managerial issues involved. The
principles of sports analytics in order to create efficiencies and consistency are
standardization, centralization and finally integration.
(figure 2.1 analytics in sports). Insert
The first step in helping the decision maker work more efficiently is to standardize
the data within the organization. Standardizing data and data creation and storage
within an organization require knowing the sources of the data. Some data sources
are consistent across all teams. For example, all teams use video, keep box-score
data, and have scouting reports. Teams also have their own unique data sets. The
Houston rockets, for example, employ a team of game charters that collect data
from each game the rockets play (Alomar). Many teams are also increasingly
employing advance technology to help collect data training and conditioning, such
as individual heart-rate monitors worn during practice and training and pedometers
to monitor the distance and speed a player runs. Still other teams use detailed
psychological profiles to evaluate players. All of these data sources need to be
identified in an inventory (Alomar). Identifying, locating, and describing all the data
sources establishes the organizations data inventory. The process of
standardization seems straightforward, but there are actually a variety of areas in

which it can prove difficult. Consider, for example, that from 1991 to 1996 there
were three players names James Williams in the NFL, and two of them were
James E. Williams. All three played different positions for different teams
(Alomar). Additionally, the data enter the organization from a variety of sources.
Each department whether it is coaches, management, scouting, administration all
have opportunities to input data and within that raises the opportunity for
misrepresentations. See table 2.2 data inventory sports analytics.

Good data management reduces the time spent looking for the people that can give
decision makers access to the information they need and provide a team with a
significant competitive advantage. Beyond more efficient access, centralization of
data provides additional benefits in terms of data consistency and accuracy.
Centralization ensures that all decision makers see the same data. Having one set
of consistent data for all decision makers to rely on is commonly referred to as
having one version of the truth. Having one version of the truth provides more
reliability and consistency and has the additional benefit of saving meeting time for
discussing substance instead of background (Alamor).
What Alamor means by this is that all decision makers within an organization are
operating with the same information. When information is isolated to particular
individuals or groups (decentralization), this is what Alamor refers to as a data silo
(p.63). These data silos need to be avoided in order to ensure all decision makers
are operating with the same information or version of the truth (Alamor, 2013).
By having an agreed upon set of information that everyone has access to , an
efficiency is created that would not be present if information was segregated or if
there was a lack of understanding on what the data is saying. When all the data is
centralized, personnel executives can spend more time evaluating and coaches can
spend more time strategizing and coaching providing them an edge over the
competition (Alamor). Once the data has been standardized and centralized, it can
be fully integrated. The integration of data across functions within the organization
allows for seamless access to every departments data. Scouting and medical
reports are linked to play-by-play data, which are linked to video files, and the
connections go on. On its own, each type of data is valuable, but when integrated,
there are synergies created among the different data sources that cannot occur
when data are segregated (Alamor). The three components of data management
discussed here (standardization, centralization, and integration) provide a basis for
an efficient data-management system that will provide a competitive advantage by
saving time for decision makers and creating a more complex picture of the team or
player being analyzed. Its important to note that although a functional analytics
program can be outlined as indicated above (with regards to standardization,
centralization, and integration); the real challenge for an organization is in its
implementation of the analytics systems. Without having strong leadership who
understands that analytics require both an investment in technology and an

openness from all of its members to see the value in initiating the analytics
systems; as Alomar points out its an ongoing process, not a one-time investment,
and staff must be available to work on the systems so it remains up to date.
Without total buy in from the organization and its members as a whole analytics is
only as good as the commitment of its members; to invest their time and data to
the system.
When analyzing the data used in analytics it is important to be thorough in our
examination of understand the difference between data and information. According
to Alamor Raw data are rarely useful because data are just an input, with no
analysis or context. He uses the example if a scout had attended an NBA game
on November 3, 2010, he would have seen Kevin Durant take ten three-point shots
against the Los Angeles Clippers and hit none of them. This is an example of
misunderstanding the difference between raw data and information. Based on this
one game a scout could suggest that Durant is a poor shooter and an inefficient
shooter because he was wasting so many of his teams possessions taking shost he
obviously could not make. If however, those observations were treated as raw data
and the player was evaluated in a larger context that included more games, the
players age, the opponents faced, and so on, a decision maker would see that the
player taking those shots actually shot 36 percent from beyond the three-point line
that season outside of that game, led the league in scoring, and was one of the
most efficient scorers in the league, averaging more than 1.4 points per shot
attempt (Alomar). It is also important to understand the different types of data,
one being quantitative data and the other being qualitative data. Quantitative data
includes such things as box scores, draft-combine results and strength tests.
Qualitative data take a variety of forms, including scouting reports, coaches notes,
and video. It is easy to believe a number because it appears to be a fact,
something indisputable. The problem, however, is that quantitative data are just
data, the lowest input into the analytic process, and without being transformed into
information, they are at best useless and can often be misleading (Alamor). A
perfect example of this is the data gathered by Stats LLC they calculated Kevin
Durants shooting percentage when he dribbled the ball three or more times and
when he dribbled the ball two or fewer times. Comparing the two averages, it
appeared that Durants shooting percentage roughly doubled when he dribbled the
ball two or fewer times. This led one executive in the NBA to suggest that if
opponents forced Durant to put the ball on the floor and dribble more, then his
scoring ability would drop significantly (Alamor). As Dean Oliver points out so
eloquently people jump on isolated events as proof of cause and effect. What this
executive didnt factor in was; what were the distances of the shots between the
two averages? Or perhaps the shots that came after two or less dribbles included
more fast-break dunks and put-backs? What if Durant dribbled less because he was
more often on the wing on a fast break and simply took a pass and dribbled once on
the way to dunking, then comparing that shooting percentage to when he was
creating a shot for himself on the perimeter is meaningless; the two averages

measure entirely different skills (Alamor). The lesson here is that numerical data
are not meaningful on their own. Raw data do not provide a decision maker with
actionable information because they have no context. Only after raw numerical
data are given rich context do they become information that can be used in the
decision making process (Alamor). The problem with qualitative data is that it can
be unstructured in the sense of not having proper ways to organize it. For example
many teams will have medical data or a scouting report that is rarely properly
organized and accounted for. These important documents reside on the computers
or in the notes of the personnel. Its inefficiencies such as these that allow for a lot
of important data to never get turned into useable information. In order to
maximize the return on analytic resources, all data should be centralized so that it
can be processed, turned into useable information and accessed efficiently
(Alamor).
The United States Olympic Committee faces a very specific task: win as many
medals as possible in each and every Olympics. This task is made particularly
difficult by the limited financial resources that the USOC can use to support the
American Olympic athletes. Therefore, the USOC must make sure that it invests
only in athletes with a realistic opportunity to win medals. The decision makers at
the USOC must regularly ask whether spending the next $1,000 on athlete A is
more likely to yield a medal than spending it on Athlete B, even if those two athletes
compete in different sports or even different years (Alomar). Because of the
complexity of multiyear planning and cross sport comparisons, analytic models have
proven to be very helpful in informing these decisions.
Consider a case in which the committee is assessing the progress of a seventeenyear old sprinter. As sprinters generally compete at the Olympic level in their early
to mid-twenties, the decision makers at the USOC must assess the likelihood that
this sprinter will be able to compete at the medal winning level in five to seven
years. The decision makers must examine the athletes record of achievement to
determine whether she or he is on the medal winning path. As an example lets say
that the sprinter ran a 12.1 seconds in competition at age fifteen and now runs it in
10.3, is she on course to have a medal-winning time in either of the next two
Olympic games (Alamor). With no use of analytics the committee has to rely on
coaches opinions or others involved in the sport. By using historical data as well as
the sprinters own performances at sanctioned competitions, a complete picture of
the sprinters progress can be created and analyzed (Alamor). The first step is to
determine what a medal-winning time will be in five-seven years. Olympic times in
the hundred-meter sprint, for example, have continued to drop, which means that
the bar is ever higher for developing sprinters to have a legitimate opportunity to
win an Olympic medal. Using data from international competitions over the last
forty years allows the USOC to project how likely medal times will change over the
next five to seven years. This projection provides the context that the decision
makers need in order to assess the Olympic prospects of a young sprinter (Alomar).

The next step is to estimate the sprinters progress. Data from competitions can be
used to estimate this over the next several years. INSERT figure 4.1
In this analysis, the actual competition times are represented by a solid line, the
timing of the Olympic games is marked by vertical dotted lines, and the projected
medal winning time is represented by the dotted horizontal line. The figure
demonstrates that at the time of the next Olympics, the sprinter will be just over
eighteen years old and will be likely running the hundred meter sprint in
approximately 10.6 seconds. The projected medal winning time is well below that,
indicating the sprinter is will not be ready to compete in those games. The following
Olympics will occur when the sprinter is twenty-two. By this time she is likely to be
running a sub-ten-thousand hundred meters but still not quite fast enough to be in
medal contention. The decision makers at the USOC now have evidence to suggest
that the sprinter is not on track to win a medal in the next two Olympics Games and
must allocate their resources accordingly. The use of resources is now strategic
decision; the decision makers can either cut funding to the sprinter or, if they do not
have better alternatives, closely examine the sprinters training program and
suggest changes so that she or he may get on a medal-winning path (Alamor).
Another example of predictive analytics is illustrated in Dean Oliver Basketball on
paper: Rules and Tools for Performance Analysis where he uses the example of
Derrick Coleman who played for the Charlotte Hornets in the NBA between 19992001. Oliver writes that over the course of the three years he (Derrick Coleman)
was with the team, the Hornets were an ordinary 74-80 when he was in the line up
and a rather dominating 54-20 when he sat out. He refers to this as significant
testing which is a tool used to numerically assess a certain players value. He uses
this term significant Testing to account for the number of games in a stretch, how
well the team played in those games, and how well they were supposed to play
(Oliver). With regard to Coleman, significance testing say that there was a 0.02
percent chance )or two in ten thousand) that the difference between the Hornets
record with and without him was due to just random chance. Something real was
different. Tracking down what it was starts with doing significance tests on the
teams offensive and defensive ratings. Offensively, the Hornets scored 103.0
points per hundred possessions with Coleman in the lineup. Without him, the
Hornets scored 103.9 per hundred possession- this kind of difference-103.0 to
103.9-is small and turns out not to be significant, as it could have occurred by pure
luck with a 56 percent chance (Oliver). For reference, if significance testing returns
a number greater than 5 percent, it is common to consider the difference in
question to be simple luck. In contrast, the Hornets defensive rating improved from
102.5 to 98.6 (Defensive points allowed per 100 possessions by their opponents)
without Coleman- a much bigger change and significant at 1.6 percent. The hornets
legitimately did play better on the defensive end when Coleman wasnt on the floor
(Oliver). A key thing in doing this kind of analysis is to understand that it doesnt
reflect purely upon Derrick Coleman. It also reflects upon the player who replaces
Coleman in the lineup. If the player replacing Derrick Coleman is a star player like

Tim Duncan and the hornets played way better, wed say, dud why was Coleman
playing anyways? If his replacement is a player of lesser ability or players of lesser
ability then there is more of a mystery. Is the sub actually a good player? Were
Colemans teammates playing poorly with him on the floor because Coleman
exudes some negative energy? In this case, since these discrepancy grew out of
three seasons of the Hornets playing better with Coleman out, there were a lot of
players replacing him in the lineup. Rather than looking at the seasons as a group,
you get a cleaner story by looking at the three seasons individually. TABLE 7.1 PG
94
In 1999, Coleman missed thirteen of fifty games in a strike-shortened year and was
replaced by a committee of journeymen: Chucky Brown, J.R. Reid, Charles
Shackleford, and Brad Miller (though Miller has recently come to be view as a pretty
good player). The hornets went 11-2 in the games Coleman missed and 15-22 in
the games he played. As with the overall three years, the defense in 1999 was
significantly better without him, going from 104.5 to 99.3. Were these four players
significantly better defensively than Coleman? By reputation, no one has really said
so, but the numbers suggest it (Oliver). In 2000, Coleman missed only eight of
eighty-six total games (including playoffs), the Hornets going 6-2 in those eight
games. They went 44-34 in the games Coleman played, the difference between
these two records not being statistically significant, mainly because eight games is
not a lot of games on which to base an argument. Though the Hornets again did
play better defensively when Coleman wasnt there, going from 102.2 to 98.3 in
their defensive rating (Oliver). Again his replacement was a committee of nonstars, including Chucky Brown, Brad Miller, Eddie Robinson, and Todd Fuller. Finally,
2001, Coleman was replaced in the starting lineup at the beginning of the season by
P.J. Brown and was relegated to only twenty minutes per game. Even with fewer
minutes, Colemans apparent negative influence on defense continued. When
Coleman was in the lineup, the team defense allowed 101.3 point per hundred
possessions and the team went 15-24. When he was out of the lineup, the team
defense allowed 98.5 points per hundred possessions and the team went 37-16
(including playoffs). Interestingly, the offense also got better this year with
Colemans absence, going from 101.1 points per one-hundred possessions to 103.5.
According to Oliver neither the offensive nor the defensive splits were significant
on their own, but the combination of improvements was significant. Unlike previous
seasons, the Hornets filled Cole-mans missing minutes primarily by increasing the
time starters played. Off the bench, Jamaal Magloire also picked up some of
Colemans time. In every season, the Hornets offense and defense both improved
when Coleman was out, but only the defensive improvement was statistically
significant over the long haul and in any season (Oliver). But this was Olivers only
first step, when realizing that there was a disproportion in the Hornets ability to win
games with Derrick Coleman in the lineup verses without Derrick Coleman Dean
Oliver used analytics to mine through the numbers and ask why? Why did the

Hornets have defensive shortcoming (s) with Derrick Coleman in the lineup? Four
interesting aspects of the game were in question;
1.
2.
3.
4.

Shooting percentage from the field )by its opposition


Getting offensive zone rebounds
Committing turnovers
Going to the foul line a lot and making the shots

Oliver points to these aspects of the game as key indicators that determine the
outcome of a basketball score. When applying these aspects to the Derrick
Coleman scenario one important discrepancy arises. But before we continue its
necessary in order to clearly understand this discrepancy to introduce Olivers
shooting chart figure 2.1 Court Division pg 10
It was examples such as these that have contributed to the growth of analytics in
sports and encouraged some people to apply statistical analysis to better
understand what is occurring on the field of play. With regards to hockey and its
development of analytics one has to look no further that Jim Corsi. Jim Corsi is
currently the goalie coach with the St. Louis Blues (august 10, 2014) played as a
goalie in the WHA with the Quebec Nordiques and 26 games in the NHL (National
Hockey League) with the Edmonton Oilers. He was doing research on how much
work a goalie does in a game and determined that the shots on goal that the NHL
was using didnt believe the simple shots against total- was totally reflective of just
how busy his goalies were in an NHL game (Toronto Star). He believed that the
average of 30 or so shots per game which was the average at the time was a not an
accurate reflection on their workload. So he began measuring all shots and shot
attempts, including ones that were blocked. Because according to Corsi figuring a
goalie had to be in position or moving around regardless. From this statistic he
(Corsi) was able to reason the shot totals were more in the region of 50-70. What
happened next no one could have predicted, another fellow took his stats and
started applying it to players to find out what their contributions were overall, on
both sides of the puck and thats when the Corsi stat was born. According to the
article in the Toronto Star that was around the lost season of 2004-2005 (due to the
lockout) fans had time on their hands and those with spreadsheets and ideas put
their time to good use. The bloggers who most often gets the credit are Tim Barnes,
who formerly blogged anonymously and more famously under the pseudonym Vic
Ferrari as well as California engineer Gabriel Desjardins, who runs Behindthenet.ca.
Barnes came up with many ideas behind hockeyss analytics while Desjardins
website made them accessible-their ideas really took root. Really intese fans were
into crowd-sourcing their ideas in chastrooms and later, twitter. These fans were
unsatisfied with the NHLs plus-minus stat, in large part because they believd goals
were akin to random events. There are more shots and shot attempts in a game
than goals, meaning the larger sample size of Corsi events is more reflective of a
players performance whether a player is on the ice for an unlikely bounce. By
measuring the various shots in 5 on 5, like plus-minus-it puts all players on a level

playing field. The stars dont get the advantage that power plays give them. The
grinders arent hurt by their time killing penalties. They can be measured with
various linemates and against strong or weak opposition. At the same time, the
blogosphere was trying to find out what teams were best at puck possession. The
NHL dropped time of possession as a stat in 2002. So bloggers turned to Corsi as its
proxy, figuring if one team shot the puck more than the other team, it meant one
team controlled the puck more thatn the other. The blogosphere realized that
teams with good Corsi numbers typically went very deep in the playoffs. They told
anyone who would listen and, in the beginning few did, that a good Corsi rating was
a predictor of what was to come. Look no further than the 2013-14 Toronto Maple
Leafs, as the wins piled up through the first 60 games despite routinely being
outshot, the analytics crowd warned it was a mirage based on luck since the teams
Corsi rating was nearing record lows. When the team collapsed in the final quarter
of the season, the analytics crows said the numbers were merely catching up
(Toronto Star). It has taken 10 years and countless Twitter fights for hockey
analytics to go mainstream thanks to the maple leafs and their collapse and their
decision to bring someone in like Kyle dubas and be so public about it. When you
ask Jim Corsi about the emergence of hockey analytics he says Im definitely
please and surprised how its taken off in the last while, Everybody has started to
understand there is more to the game than a haphazard slapping the puck around.
In truth, teams have been dabbling in all kinds of stats, looking for that edge, for
years. In the 1970s, Legendary coach Roger Nielson had his own plus-minus stat
based, subjectively, on who he thought deserved credit or blame for each goal.
Dubbed captain video at the time, he through every scoring plays (Toronto Star).
Former leafs coach Ron Wilson was big on analyzing the game in a stats-based way.
He looked mainly at scoring chances, a subjective look at the action around the net,
and where they came from. He figured if his team out0chacned the opposition on a
regular basis, the wins would come. Scoring chances were typically around 10 per
team per game. But the stats available have simply exploded. Theres Fenwick, a
variation on Corsi inthat it doesnt include blocked shots. Its named after the
inventor, Calgary Flames blogger Matt Fenwick (Toronto Star). Then there ways to
measure Corsi and Fenwick based on line-mates, the strength of the opposition and
what the score is in the game. They are not always user-freindly. Unlike baseball,
which went through its statistic revolution 20 years ago, hockeys stats are not self
evident. Baseball has WHIP (walks and hits per innings pitched) and OPS (on-base
percentage)and slugging percentage and WAR (wins above replacement) which
have joined the mainstream conversation in part because they dont need to be
explained (Toronto Star). Hockey has stats like PDO, which doesnt really stand for
anything but adds shooting percentage and save percentage in an effort to define a
teams luck, and CF%rel, which stands for Corsi for percentage relative to teams CF
% with players not on the ice; needless to say its all very confusing. In the next
section of this thesis I plan to introduce the current analytics in hockey being
measured, illustrate both their strengths and weakness and provide personal
opinion on their overall effectiveness. I share the same philosophy as Jim Corsi who

states the beauty of analytics is it can open up a debate and instruct. There are
routes. There are things to be done. Id like to think you could evolve to another
level and try and combine somehow the ability for a player to keep the puck from
the other team. I will tell anyone who is willing to listen that statistics can be like a
lamp post, you can use it to lean on or you can use it to illuminate. Youve really got
to be careful with all the information were getting (Toronto Star).

S-ar putea să vă placă și