Base Runs
Encyclopedia
Base runs is a baseball statistic
invented by sabermetrician David Smyth to estimate the number of runs a team "should" have scored given their component offensive statistics, as well as the number of runs a hitter/pitcher creates/allows. It measures essentially the same thing as Bill James
' Runs Created
, but as sabermetrician Tom M. Tango points out, BaseRuns models the reality of the run-scoring process significantly better than any other "run estimator".
level, and it accomplishes that goal very well: in recent seasons, BsR has the lowest RMSE of any of the major run estimation methods. But in addition, Base Runs can claim something no other run estimator can -- its accuracy holds up in even the most extreme of circumstances and/or leagues. For instance, when a solo home run is hit, Base Runs will correctly predict one run having been scored by the batting team. By contrast, when Runs Created assesses a solo HR, it predicts 4 runs to be scored; likewise, most linear weights-based formulas will predict a number close to 1.4 runs having been scored on a solo HR. This is because each of these models were developed to fit the sample of a 162-game MLB season; they work well when applied to that sample, of course, but are woefully inaccurate when taken out of the environment for which they were designed. Base Runs, on the other hand, can be applied to any sample at any level of baseball (provided you can calculate the B multiplier), because it models the way the game of baseball operates, and not just for a 162-game season at the highest professional level. This means Base Runs can be applied to high school or even Little League
statistics.
"Base Runs adheres to more of the fundamental constraints on run scoring than most other run estimators, but it is by no means perfectly compliant. Some examples of shortcomings:
One avenue for possible improvement in the model is the scoring rate estimator B/(B + C). There is no deep theory behind this construct--it was chosen because it worked empirically. It is possible that a better score rate estimator could be developed, although it would most likely have to be more complex than the current one."
Baseball statistics
Statistics play an important role in summarizing baseball performance and evaluating players in the sport.Since the flow of a baseball game has natural breaks to it, and normally players act individually rather than performing in clusters, the sport lends itself to easy record-keeping and statistics...
invented by sabermetrician David Smyth to estimate the number of runs a team "should" have scored given their component offensive statistics, as well as the number of runs a hitter/pitcher creates/allows. It measures essentially the same thing as Bill James
Bill James
George William “Bill” James is a baseball writer, historian, and statistician whose work has been widely influential. Since 1977, James has written more than two dozen books devoted to baseball history and statistics...
' Runs Created
Runs created
Runs created is a baseball statistic invented by Bill James to estimate the number of runs a hitter contributes to his team.-Purpose:James explains in his book, The Bill James Historical Baseball Abstract, why he believes runs created is an essential thing to measure:With regard to an offensive...
, but as sabermetrician Tom M. Tango points out, BaseRuns models the reality of the run-scoring process significantly better than any other "run estimator".
Purpose and formula
These were described in Smyth's Base Runs Primer.Advantages of base runs
Base Runs was primarily designed to provide an accurate model of the run scoring process at the Major League BaseballMajor League Baseball
Major League Baseball is the highest level of professional baseball in the United States and Canada, consisting of teams that play in the National League and the American League...
level, and it accomplishes that goal very well: in recent seasons, BsR has the lowest RMSE of any of the major run estimation methods. But in addition, Base Runs can claim something no other run estimator can -- its accuracy holds up in even the most extreme of circumstances and/or leagues. For instance, when a solo home run is hit, Base Runs will correctly predict one run having been scored by the batting team. By contrast, when Runs Created assesses a solo HR, it predicts 4 runs to be scored; likewise, most linear weights-based formulas will predict a number close to 1.4 runs having been scored on a solo HR. This is because each of these models were developed to fit the sample of a 162-game MLB season; they work well when applied to that sample, of course, but are woefully inaccurate when taken out of the environment for which they were designed. Base Runs, on the other hand, can be applied to any sample at any level of baseball (provided you can calculate the B multiplier), because it models the way the game of baseball operates, and not just for a 162-game season at the highest professional level. This means Base Runs can be applied to high school or even Little League
Little League
Little League Baseball and Softball is a non-profit organization in South Williamsport, Pennsylvania, United States which organizes local youth baseball and softball leagues throughout the U.S...
statistics.
Weaknesses of base runs
From the TangoTiger wiki:"Base Runs adheres to more of the fundamental constraints on run scoring than most other run estimators, but it is by no means perfectly compliant. Some examples of shortcomings:
- BsR will sometimes give a negative estimate; this happens when the B factor is negative.
- BsR will sometimes project many more than three runners left on base per inning, despite the fact that three is the upper limit. For example, if walks have a B coefficient of .1, an inning with 10 walks and 3 outs will yield an estimate of 10*1/(1+3) = 2.5 runs, meaning that 7.5 runners must have been stranded.
- Tangotiger's research found that BsR overvalued events within the .500-.800 team OBP range
One avenue for possible improvement in the model is the scoring rate estimator B/(B + C). There is no deep theory behind this construct--it was chosen because it worked empirically. It is possible that a better score rate estimator could be developed, although it would most likely have to be more complex than the current one."