Sunteți pe pagina 1din 70

PLAYER EFFECTIVENESS

Amar Sneh- PGP/19/245


Naman Sharma PGP/19/272
Shashi Kumar Singh PGP/19/288
Umesh Khandelwal PGP/19/295
Ankit Yadav PGP/19/305
The Basics of Baseball

Rules of baseball
THE FIELD
BASEBALL PYTHAGOREAN THEOREM

It is always estimated between 0-1


So win percentage can be given by the following formula
For baseball best result is for exp = 1.9
For football best result is for exp = 2.37
For basketball best result is for exp =
13.91

Bill James came up with the formula for runs created


No. of base runners

Rates at which players are advanced per plate appearance


RUNS CREATED PER GAME
Hitters consume scarce resource outs
In a game teams get 9 innings and 27 outs (9*3).
Outs = AB hits errors*

Extra outs : Sacrifice flies (SF)


Sacrifice Bunts (SAC)
Caught Stealing (CS)
Grounding into double plays (GIDP)

Runs created per


game
EVALUATING HITTERS BY LINEAR
WEIGHTS
Dependent variable Y = runs scored in a season

Independent variable X = BB+HBP, singles, 2B, 3B, HR, Stolen


Bases, Caught Stealing
REGRESSION OUTPUT

Equation obtained by regression analysis

Equation neglecting SB and CS based on p-value


ACCURACY OF LINEAR WEIGHTS VS
RUNS CREATED
Do linear weights do a
better job of
forecasting runs
scored than does Bill
Jamess original runs
created formula?
Linear weights was off
by an average of
18.63 runs (an average
of 2% per team) while
runs created was off by
28 runs per game
TEAM RUNS FROM OBP AND SLG
On Base Percentage: is a measure of
players effectiveness
It is fraction of players plate appearances
he reaches base on a hit, walk or HBP.
But many players with a high OBP do not
hit many home runs.
So we use OPS: On-base plus slugging
which is slugging percentage + OBP
Runs scored = -1003.65+1700.8 X SLG + 3157
X OBP

This indicates that OBP is roughly twice as


important as SLG
MONTE CARLO SIMULATION

We assign an outcome of a home run to a random number


less than or equal to 0.5
And assign an outcome of an out to a random number
between 0.5 and 1
MONTE CARLO SIMULATION
Inputs for Ichiro
simulation
Sample Innings of Ichiro 2004 simulation
WHO IS RESPONSIBLE FOR
A CALLED STRIKE?
THE ART OF PITCH FRAMING

Pitch Framing: Trying to turn balls into strikes


Strike zone not 100% consistent
Area of Uncertainty: Strike Zone Plus/Minus

Artisan of Framing
Catcher: Receiving skills
Pitcher: Ability to locate the pitch
Batter: Body language
Umpire: Personal standards
SIGNIFICANT VARIABLES

Pitch Location
Ball/Strike Count
Command
Pitcher Handedness
Batter Handedness
Home/Road
Pitch Type
VARIABLES ANALYSIS

BALL/STRIKE COUNT Command


Count Balls
(Horizontal Distance )
0 1 2 3
Command Plus/Minus Strike %
1.30 2.50 4.20 5.60
Strike < 3.8 inches 2.80%
0 % % % %
- - 3.8 7.6 inches -0.30%
3.20 1.50 0.10 1.60 7.6 10.4 inches -2.90%
1 % % % % > 10.4 inches -8.60%
- - - -
2.90 2.70 2.10 1.80
2 % % % %
VARIABLES ANALYSIS

PLATE SIDE
Plate Side (Pitcher's Plus/Minus
POV) Bat Side Strike
RHB 1.70%
Left Half
LHB -4.00%
RHB -2.80%
Right Half
LHB 2.00%
CALCULATE STRIKE ZONE PLUS/MINUS

If a pitch is called a strike,: +ve credit to the players involved


If a pitch is called a ball: -ve credit
Example:
Command: 6 inches of target
Pitch: 10 inch vertical, 1 inch off outside edge
Batter handedness: LHB
Count: 2-1
Result: Plus-0.57, points allotted to participants
ITERATION 1

Calculation:
Total points allotted to players involved = Plus-0.57
We treat all four artisan to be equally involved
Thus, Individual contribution: Plus-0.1425
After iteration 1:

Catcher = Plus- Umpire = Plus-


0.1425 0.1425
Pitcher = Plus- Batter = Plus-
0.1425 0.1425
ITERATION 2

Assumption: Umpire: minus-0.005/per pitch, Catcher: plus-0.002/per pitch,


Pitcher: plus-0.018/per pitch, Batter: plus-0.015/per pitch
Calculation:
Combined Strike Plus/Minus: .030/per pitch more strike than expected
Expected Strike Plus/Minus: 0.57 - 0.030 = 0.54, which is equally divided
Thus: Expected Strike per player involved: 0.54/4 = 0.135
After Iteration 2:

Catcher = Plus- Umpire = Plus-


0.137 0.130
Pitcher = Plus- Batter = Plus-
0.153 0.150
ITERATION CONT.

Larger difference between iteration 1 and iteration 2


Continued till difference is insignificant
Normally takes 10 iteration
Iteration Validation:
Pitcher/Catcher correlation(Min 100 pitches): Plus-0.80, Strong
Catcher/Pitcher Correlation(Min 100 Catches): Minus-0.04
RESULTS
STRIKE ZONE RUNS SAVED
CONCLUSION

Strike zone will be umpires strong hold


Sixteen run valued as $11- 12 million in free agent market
Team prioritizing good framing catchers
Example:
Yankees played with young catcher FranciscoCervilli in season 2014
Jose Molina, 36 yrs. old, played for Tampa Bay Ray for 3 years
EVALUATING PITCHERS
AND FIELDERS
EVALUATING BASEBALL PITCHERS AND
FORECASTING FUTURE PERFORMANCE
What is ERA (Earned Run Average)?
Problems with ERA
Evaluating relief pitchers
Using past ERA to predict future ERA
Voros McCracken stuns the Baseball World
DICE: A Better Model for Predicting a Pitchers Future Performance
WHAT IS ERA?

Until recently, ERA was the most frequently used technique for evaluating the
performance of pitchers
Any of the base runners who score or would have scored if Pitchers team made no
fielding miscues causes the pitcher to be charged with an earned run
For e.g. If Joe gives up a triple with two outs in an innings and the next batter hits a
single and a run is scored, Joe is charged with an earned run
Now suppose instead of a single the next batter hits a ball to the shortstop, who
misplays the ball and is charged with an error. If the runner scores, this is an unearned
run
ERA= (earned runs allowed)*9/innings pitched
PROBLEMS WITH ERA

Errors are subjective: Official scorers are biased in favor of the home team
When a starting pitcher is pulled from the game and there is atleast one base
runner, the number of earned runs he gives up depends greatly on the
performance of the relief pitcher
A pitcher with good fielders behind him will clearly give up fewer earned runs
than a pitcher with a leaky defense
Starting pitchers are often evaluated on the basis of their win-loss record
which depends on the batting support the pitcher receives
EVALUATING RELIEF PITCHERS

Relief pitchers are often evaluated on the basis of how many saves they have given in a season. Most
saves credited to relief pitchers are given to a relief pitcher who faces a batter representing the tying
run. The following extract provides the official definition of a save.
The official scorer shall credit a pitcher with a save when such pitcher meets all four of the following
conditions:
1. He is the finishing pitcher in a game won by his team
2. He is not the winning pitcher
3. He is credited with at least a third of an inning pitched; and
4. He satisfies one of the following conditions:
a. He enters the game with a lead of no more than three runs and pitches at least one inning;
b. He enters the game, regardless of the count, with the potential tying run either on base, or AB or on deck; or
c. He pitches for at least three innings
USING PAST ERA TO PREDICT FUTURE ERA

For all the pitchers who pitched at least 100 innings during two consecutive
seasons
(next years ERA)= 2.8484 + .353*(last years ERA)
y = 0.353x + 2.8484
r = 0.34
R-square = 0.1158
VOROS MCCRACKEN STUNS THE BASEBALL
WORLD
McCracken observed that a pitchers effectiveness is primarily based on the following:
1. The fraction of BFP (Batters Faced by Pitchers) that results in balls in play (a ball in play is a plate
appearance that results in a ground out, error, single, double, triple, fly out or line out)
2. The fraction of balls in play that result in hits (i.e. BABIP or Batting Average on Balls in Play)
3. The outcome of BFP that do not result in balls in play. What fraction of BFP that do not yield a ball
in play result in strikeouts, walks HBP or HR?
. McCrackens brilliant insight was that a pitchers future performance with regard to the situations
outlined in (1) and (3) can be predicted fairly well from past performance because these results are
independent of the teams fielding ability (DIPS or Defense Independent Pitching Statistics), but it is
very difficult to predict (2)
. He concluded, The pitchers who are the best at preventing hits on balls in play one year are often
the worst at it the next.
DICE: A BETTER MODEL FOR PREDICTING
A PITCHERS FUTURE PERFORMANCE
DICE = 3.00 + (13HR + 3(BB + HBP) 2K)/IP
For e.g., in 1997 Roger Clemens had the following statistics:
68 BB
7 HBP
292 K
9 HR
264 IP
DICE = 2.08
Clemenss actual ERA in 1998 was 2.05
EVALUATING FIELDERS

Fielding Percentage: The Traditional, Fatally Flawed


The Range Factor: An Improved Measure of Fielding Effectiveness
Problems with Range Factor
The Fielding Bible: A Great Leap Forward
Converting Fielders Scores to Runs
Converting Runs Saved by a Fielder into Wins
Why do the Yankees Underperform?
FIELDING PERCENTAGE: THE
TRADITIONAL, FATALLY FLAWED
Fielding Percentage = (PO + A)/(PO + A + E)
PO = putouts made by the fielder. For e.g., a SS gets credit for a putout when
he catches a fly ball or line drive, tags a runner out, or receives the ball and
steps on second base to complete a force out
A = assists made by the fielder. For e.g., a SS gets credit for an assist when
he throws to first base and the batter is put out
E = Errors made by the fielder. Subjective decision made by the official
scorer.
THE RANGE FACTOR: AN IMPROVED
MEASURE OF FIELDING EFFECTIVENESS
Bill James defined a fielders RF as the sum of putouts and assists a fielder
gets per game played
Then James normalizes this statistic relative to all players in a given position
RF > 1 above average range
RF < 1 below average range
PROBLEMS WITH RF

Number of strike-outs
Left handed pitchers v/s Right handed batters (platoon effect). Right handed
batters are believed to hit more ground balls to shortstop than are left-
handed batters. Thus will have more hits near him resulting in more RF
THE FIELDING BIBLE: A GREAT LEAP
FORWARD
John Dewan determine the chance (based on all plays during a season) that a
ball hit at a particular speed to a zone would be successfully fielded
For e.g., they might find that 20% of the balls hit softly over second base are
successfully fielded by shortstops
A shortstop who successfully fields such a ball has prevented one hit
So this SS has prevented 1 - .2 = .8 hits more than average SS
If not then, SS has given 0 - .2 = -.2 hits more than average SS
CONVERTING FIELDERS SCORES TO RUNS
State Avera Number of Plate State Avera Number of Plate
ge Appearances for this ge Appearances for this
Runs Situation Runs Situation
0000 .54 46180 0100 .93 11644
1000 .29 32821 1100 .55 13483
2000 .11 26009 2100 .25 13588
0001 1.46 512 0101 1.86 1053
1001 .98 2069 1101 1.24 2283
2001 .38 3129 2101 .54 3117
0010 1.17 3590 0110 1.49 2786
1010 .71 6168 1110 .97 4978
2010 .34 7709 2110 .46 6545
0011 2.14 688 0111 2.27 805
1011 1.47 1770 1111 1.6 1926
2011 .63 1902 2111 .82 2380
CONVERTING FIELDERS SCORES TO RUNS

Suppose a SS fails to field a ball with 0 outs and bases empty. Before this ball was hit
the state was 0000 and an average team was expected to score 0.54 runs
If SS gives up a hit to the next batter the new state is 0100 and the average batting
team is expected to score 0.93 runs
If the SS turns a potential hit into an out, then the new state is 1000 and the average
batting team is expected to score only 0.29 runs
Thus, in this situation the SSs failure to prevent a hit cost his team 0.93-0.29=.64 runs
If we average the cost of allowing an out to become a hit over all possible states, we
find that a hit allowed costs a team around .8 runs
CONVERTING RUNS SAVED BY A FIELDER
INTO WINS
Pythagorean Theorem: runs scored/(runs scored + runs allowed)= estimate
percentage of games won
R = runs scored/runs allowed
R/(R+1)= estimate percentage of games won
An average team scores 775 runs and gives up 775 runs (2000-2006) during a season
If a fielder saves 10 runs, then our average team now outscores its opponent 775-765,
which yields a scoring ratio R=775/765 or 1.013
Using Pythagorean Theorem, 162*(1.013/(1.013+1))=82.05
Therefore, 10 runs translates into around 1 game won
WHY DO THE YANKEES UNDERPERFORM?
Team total -139
Position Player Fielding Bible rating
1B Giambi -8
2B Cano -27
SS Jeter -34
3B Rodrigues 2
LF Matsui 3
CF Williams -37
RF Sheffield -38

A major problem with the Yankees is their poor fielding


The Yankee fielders cost the team 139 hits over the course of a season
This translates into 139*.8 = 111.2 runs
This means that the Yankees fielding was 111.2/10 = 11.2 wins worse than an
average teams fielding
LOSS AVERSION: PERSISTENT
BIAS IN THE FACE OF
EXPERIENCE, COMPETITION AND
HIGH STAKES
GOLF TERMINOLOGIES

Putt- light golf stroke made on the putting green in an effort to place the ball into the
hole.
Par- predetermined number of strokes that a scratch (or 0 handicap) golfer should
require to complete a hole, a round (the sum of the pars of the played holes), or a
tournament (the sum of the pars of each round).
Birdie, Eagle and double eagle- scoring under par on any individual golf hole.
Bogey and Double Bogey- score of over par on an individual golf hole
Greens and fairways- golf course comprises a series of holes, each consisting of a
teeing ground, a fairway, the rough and other hazards, and a green with a flagstick
("pin") and hole ("cup").
PROFESSIONAL GOLF- PGA TOUR

Collection of tournaments (40-50 each year)


Around 150 professional golfers compete
18 holes of golf played on each of 4 consecutive days (4 rounds)
Elimination of bottom third after 2 rounds
Total purse of the tournament shared between the remaining players with
winner typically getting 18% of the purse
Each players score is sum total of strokes across all 72 holes
Player with lowest score wins the tournament
PROSPECT THEORY

Loss Aversion
Losses are valued more than the commensurate gains are valued
Value function curve kinked at reference point with steeper gradient for losses
than gains

Risk Shift
Risk seeking in losses and risk averse in gains
Utility function convex in loss domain and concave in gain domain
CONCEPTUAL FRAMEWORK

Objective- to describe the influence that loss aversion may have on putting
Equations:
1. Probability of making a putt

e represents the amount of effort exerted


z represents a vector of other putt characteristics e.g. putt distance etc.
represents random noise
Assumption that f w.r.t.e >= 0 and f w.r.t.e. <=0 indicating that additional effort
weakly increases the probability of making a putt
CONCEPTUAL FRAMEWORK (CONTD.)

2. Utility function

Utility of a player is value that player places on making or missing a putt weighted by
their probabilities and subtracting the cost of the effort.
V is the value function
x is the score earned on making putt and x-1 on missing the putt
Cost(e) is cost of the effort which we again assuming to be strictly increasing and
concave
3. Value function

>=1 is the degree of loss aversion


Simple version of prospect theory
CONCEPTUAL FRAMEWORK (CONTD.)

4. Maximizing the utility function

5. Combining equation 3 and 4

6. Extended version of equation 3

and < 1 are parameters that allow for diminishing sensitivity.


CONCEPTUAL FRAMEWORK

Predictions
1. Controlling for putt characteristics, z, putts attempted for par, bogey and double
bogey will be more accurate than putts attempted for birdie and eagle
2. Controlling for putt characteristics, z, the probability of making a birdie is greater
than the probability of making an eagle putt. In addition, controlling for putt
characteristics, z, the probability of making a par putt is greater than the
probability of making a bogey putt, and the probability of making a bogey putt is
greater than the probability of making a double bogey putt
3. Controlling the putt characteristics, z, players will be more risk averse when
putting for birdie and eagle than when putting for par, bogey or double bogey
DATA AND EMPIRICAL STRATEGY

Aim was to test the predictions made on the basis of equations with the analysis
of the collected data
To test why variation in shot values exists after controlling for distance and other
factors
250 workers employed at PGA tour since 2002 to gather data and info. each week
Data from 239 tournaments completed between 2004 and 2009 is taken for
analyses
Focus on putts and dataset restricted for putts attempted for eagle, birdie, par,
bogey or double bogey.
Data only for players who have at least 1000 putts was taken which leads to data
for 2,525,161 putts and 421 professional golfers
RESULTS: MAIN EFFECTS

Distance is a key determinant of putt success represented in figure 1


Table shows the logit regression analysis of data
Controlling the distance is done by including a seventh-order polynomial
which is necessary and sufficient to control for this variable.
RESULTS: ALTERNATIVE CLASSICAL
EXPLANATIONS
RESULTS: ALTERNATIVE CLASSICAL
EXPLANATIONS
Differences in Player ability
Learning
Differences across holes
Position on the Green
Position in the tournament
RESULTS: ALTERNATIVE CLASSICAL
EXPLANATIONS
Matching Model
ALTERNATIVE PSYCHOLOGICAL
EXPLANATIONS

Overconfidence or cockiness
Positive autocorrelation between shots
Cant account for the findings

Nervousness
Valuing birdie putt more than par putt
Cant be taken into account as professional golfers attempt as many
birdie putts as par putts
Cant explain many other findings
DIFFERENCES ACROSS ROUNDS

Difference in accuracy between birdie and par shots diminishes with progress
rounds
To remove bias, analysis of players playing all the 4 rounds are also taken
Discrepancy in accuracy between birdie and par shots is not automatic
Discrepancy can exist because of reference point theory
DIFFERENCES IN RISK AVERSION

Objectives of a putt: hit the ball into the hole or limit the difficulty of
follow on shot
Risk averse putts sacrifice the likelihood of hitting the ball into the
DIFFERENCES IN RISK AVERSION

Positive numbers represents birdie putt more likely to stop in the box than par and vice-versa
Panel B is for putts attempted from a distance longer than 270 yards
Figure shows that birdie putts are hit less hard than par putts for easy follow-on shots
Increased probability of making a follow-up shot doesnt compensate for loss of accuracy in hitting birdie putts
Birdie putts are missed more often because of both risk aversion and left-right mistakes
KOSZEGI-RABIN REFERENCE POINTS

Usually par is the reference point for golfers initially


Later on, rational expectations might serve as the reference point
Difficult hole and easy hole analysis
V-shaped pattern representing the relationship between relative accuracy of
par and birdie putts
KOSZEGI-RABIN REFERENCE POINTS

Easy holes and difficult holes


Divided the data into quintiles with first quintile for easy holes and fifth for difficult holes
Accuracy difference between par and birdie diminishes for difficult holes
Bogey are as accurate as par for moderate and easy but more accurate for difficult holes
HETEROGENEITY IN LOSS AVERSION

Different loss aversion for different players of different rank


Analysis shows that this doesnt affect our result
SIZE OF THE EFFECTS

Hitting birdie putts accurately would change expected tournament earnings


Overall tournament score will improve by more than one stroke per tournament
Average earnings for top 20 rank players will increase by $640,896\
Concentration and effort are limited resources
RESULTS AND CONCLUSION

Different competing explanations were analysed to explain the difference


observed for birdie and par putts
None of these explanations accounted for pattern of results observed
The gap diminishes as the tournament progresses but doesnt get
extinguished
Although golfers should try to hit each stroke as accurately as possible but
they dont
This shows that it is the loss aversion of players which lead to the differences
in accuracy
DOES COMPETITION MATTER

H1: Putting performance improves when playing partner is playing for birdie
H2: Larger field decreases birdie putting performance
DATA ANALYSIS
RESULTS

Presence of a competitive playing partner putting for a birdie improves birdie


and par putting performance
N-effect: number of competitors in the tournament impacts the task of
putting for birdie negatively
Thank You!!!

S-ar putea să vă placă și