Leafing Through Pages: Analysis of Sports and Other Topics


A Brief Sabermetric Introduction

Filed under: Baseball — David Hunter @ 12:01 PM
Tags: , , , , , , , , , , , ,

Here’s a brief overview of the modern sabremetric statistics being used. First we’ll start with the more well known and transition on to the more complex.

On Base Percentage: Basically is how often the player gets on base, a goal in order to score runs. The more times one gets on base (and doesn’t create an out), the higher the chance of scoring a run. This is often calculated as (H + BB + HBP)/(AB + BB + HBP + SF). That’s the more basic and commonly used version. Other formulas have tried to incorporate Intentional Walks and GIDP (Ground Into Double Plays) for a more complete formula. An example of the latter would be (H + BB + HBP + IBB)/(AB + BB + HBP + IBB + GIDP + SF).

According to the first formula, Derek Jeter put up a .373 OBP in 2002. If you calculate the latter formula, his OBP becomes .367 but is still represented as a solid number.

Slugging Percentage: Basically the number of bases a player gains with their hits. A single counts as 1 base, a double counts as 2, a triple counts as 3, and a home run counts as 4. To calculate this, you divide total bases (e.g. 1B + 2B*2 + 3B*3 + HR*4) by at bats. So using Derek Jeter’s 2002 season as an example, he had 147 Singles, 26 Doubles, 0 Triples, and 18 Home Runs. Therefore the formula would be (147 + 52 + 0 + 72)/(644) for a SLG of .421.

On Base + Slugging: This is adding the OBP to SLG to get what is commonly referred to as OPS. Recent studies of the importance of OBP have confirmed that OBP is roughly 1.5 to 2 times more important than SLG. As a result, some sabermetricians like Tango Tiger have calculated OPS as OBP*1.7 + SLG to help include the importance.

Secondary Average: An attempt to try and calculate how many bases a player gains in a game overall, as opposed to just hits (e.g. Batting Average) or walks (e.g. OBP). This is calculated as (TB – H + BB + SB – CS)/AB. The resulting number is often similar to the range of BA, where any number over .400 is great and anything below .230 is awful.

Derek Jeter in 2002 put up a .297 BA and .373 OBP. His Secondary Average comes out to .283, a solid season but not outstanding. His 2002 teammate, Jason Giambi put up a Secondary Average of .479 in part because he walked a lot (109 BB) and hit for more TB (335 to Jeter’s 271).

Total Average: An attempt to weigh the extent of a player’s ability to contribute offensively while limiting the outs they make. It was created by Thomas Boswell in the 1970’s and is similar to other offensive sabermetric tools. It is calculated as [(TB + HBP + BB + SB) – CS/[(AB-H) + CS + GIDP].

Once again we’ll compare Derek Jeter and Jason Giambi. Derek Jeter’s Total Average in 2002 comes out to .809 and Jason Giambi’s Total Average is 1.136. Again, it’s another offensive statistic that gives additional weight to a player who walks and can hit for power.

Gross Production Average: A statistic created by Aaron Gleeman in order to refine OPS and achieve a more accurate representation of the importance of OBP in comparison to SLG. This statistic is calculated as (OBP*1.8+SLG)/4 and is very comparable to a BA. Anything over .325 is great, anything below .250 is bad.

In 2002 Derek Jeter’s GPA was .273, solid but not that great. His teammate Alfonso Soriano put up a GPA of .286 despite a lower OBP because he slugged .547 to Jeter’s .421. Finally Jason Giambi put up a GPA of .345 because he had a .435 OBP and .598 SLG.

Equivalent Average: A complex offensive statistic that measures offensive value per out and is adjusted for league offense, home park, and team pitching. It includes batting value and baserunning value. The link explains it in more detail but the statistic was created by Baseball Prospectus.

Runs Created: Basically tries to calculate how good a player is at creating runs for his team. The simplest equation is (H + BB) * TB/(AB + BB). The link also offers more advanced RC formulas that try to paint a more complete picture by including weighted aspects and HBP, stealing bases, GIDP, and SF.

Using the very basic RC, in 2002 Derek Jeter created 100 runs. So a lineup roughly of 9 Derek Jeters would score 900 runs in a 162 game season. Jason Giambi created 143 runs. So a 9 man lineup of Giambis would score roughly 1287 runs in 162 game season.

On the pitching side of things, sabermetrics has also been advancing statistics to better show a player’s ability.

WHIP: Basically divides the BB and H by IP. The lower the WHIP, the better as a pitcher doesn’t allow a hitter to get on base. Think of it as a reverse OBP.

2002 Mike Mussina put up a 1.19 WHIP whereas David Wells put up a 1.24 WHIP. Wells allowed 255 H+BB in 206.1 IP while Mussina allowed 256 in 215.2 IP.

HR/9, BB/9, and K/9: Basically averages the HR allowed, Walks allowed, and Strikeouts and divides them by the IP. Then it’s multiplied by 9 to simulate what that pitcher would put up in a complete game 9 inning performance. It’s a good tool to use if a player is unlucky and struggled in his record or ERA to see if his “rate” statistics remained the same or if he suddenly struck out fewer batters or started allowed more home runs than usual. Traditionally, you want a player below 1.0 in HR/9, around 2-2.5 in BB/9, and over 8.5 in K/9. A player under 5.5 K/9 will usually struggle long term career wise unless they have an amazing ability to not allow home runs or walks.

In 2002, Mike Mussina had 215.2 IP with 27 HRA, 48 BB, and 182 K. Thus his HR/9 would be (27/215.2)*9 for 1.13, his BB/9 would be (48/215.2)*9 for 2.01, and his K/9 would be (182/215.2)*9 for 7.61. He would finish the season at 18-10 with a 4.05 ERA.

Compare that to David Wells who went 19-7 with a 3.75 ERA in 206.1 IP. He had a HR/9 of 0.92, a BB/9 of 1.97, and a K/9 of 5.98. As you can see here, Wells allowed fewer home runs and walked fewer batters which probably helped lower his ERA in comparison to Mussina. Mussina’s higher strikeout rate would make him a better bet to produce similar statistics the following season though in comparison to Wells.

Defense Independent Pitching Statistics: More commonly referred to as DIPS, this was created by Voros McCracken to better evaluate pitchers on their own merits (e.g. independent of what their defense behind them contributed). As a result, this was transitioned into Defense Independent ERA or dERA.

Due to the complexity of calculating the above, Clay Dreslough created a formula called DICE or Defense Independent Component ERA. It is calculated as 3.00 + (13*HR + 3*(BB+HBP) – 2*K)/IP.

Tom Tango, also known as Tango Tiger, further refined the above into a statistic called Fielding Indepenent Pitching or FIP. It is simpler and calculated as 3.20 + (13*HR + 3*BB – 2*K)/IP.

The Hardball Times essentially uses DICE but makes it 3.20 at the beginning, rather than 3.00.

In general, the lower the ERA, the better the pitcher may be in general thus if he has a higher ERA than his DICE/FIP, odds are that his defense may have hurt him. The inverse is when his ERA is much lower than his DICE/FIP where his defense may have helped him a lot.

Let’s compare Mike Mussina and David Wells again from their 2002 seasons. Mussina put up a 4.05 ERA and Wells a 3.75 ERA.

Mussina DICE = 3.68 and his FIP (Hardball Times version) = 3.88. David Well’s DICE = 3.72 and his FIP (Hardball Times version) = 3.92. Here we see that Mussina was arguably was a better pitcher than Wells but struggled more due to the Yankees’ defense behind him.


Leave a Comment »

No comments yet.

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: