Sabremetrics – Are We Focused on the Wrong Kinds of Metrics

I like analytics. I’ve made my living as what is now-a-days called a data scientist. But it bothers me when lots of folks quote all these different stats and metrics without really understanding how they are calculated, what they mean and what their limitations here.

In a recent post, 222 questioned why I did not like means (I am paraphrasing 222’s here, so my apologies to 222 if this is a bit off) when I gave ERA and WHIP as two metrics that are means that I did not think were particular good in evaluating relievers.

Another issue with Sabremetrics is how they are used/mis-used in baseball. The whole point behind analysis and coming up with metrics is to help you make decisions, for example:

  • What pitchers should I try to sign
  • Who should I use in this situation
  • and so on, and so on

So the reason I don’t like to focus on just means is that they only tell a very small part of the story. Consider, for example, two pitchers both with a WHIP of 1:

  • Pitcher 1 has 1-2-3 innings half the time, and the other half he gives up two hits – and one of them is a HR
  • Pitcher 2 gives up a hit in every inning, but never for extra bases

With a 1-run lead, which pitcher do you use in the 9th? And, yea, I know that these scenarios are extreme, but they are simple to describe and have analogs that are pretty realistic.

What characterizes the difference between these pitchers is what is sometimes described as the shape of the distribution of the data (e.g., the bell curve many of us are familiar with). Seems to me that in evaluating any player, we need more that just the arithmetic average (AKA the mean).


Consider these two charts/distributions of data. Most baseball data looks more like this than a bell shaped curve. You can have two distributions with exactly the same mean, but one is positively skewed and one negatively skewed. In many subject matter areas this is addressed by using the median instead of the mean for such data. Why not consider that for baseball?

Most/all pitchers in MLB are going to be positively skewed, as otherwise they would never make it to the big leagues.

Wouldn’t it be nice to know how skewed the distribution is before you decide on whether to sign someone and, more importantly, when to use him.


Lets looks at another chart/graph that illustrates a different characteristic, but is still about the shape of the data. All three of these curves (green, red, and blue – not going to get that geeky to use the terms)  have the same mean (and the same median).

If I am a GM or a manager (or, yes, a fan too), I would prefer to have the guy with the green graph vs. the blue one. Wouldn’t you?

So, the reason I am not a big fan of just looking at means is because they ignore too much of the information. I want, and need to know more. And the information is there. Someone just needs to figure out how to incorporate this additional information into a simple to understand set of metrics.

So one of my challenges over the next weeks and months is to come up with a hopefully better metric to evaluate relievers. I have some ideas that expand upon WHIP so it includes the number of bases (the slugging issue that Ghost has mentioned in other posts and comments).

And in order to do that I am going to be asking for your help in evaluating how candidate metrics handle a number of scenarios. For example:

  • Pitcher 1 starts the 9th and loads the bases (three seeing eye/infield singles) and is replaced by pitcher 2 who gives up a HR and gets two outs.  Pitcher 1 WHIP is 9 (three walks/hits in 1/3rd of an inning), and his ERA is 81. Pitcher 2 WHIP is 1.5 (one hit in 2/3rd of an inning) and his ERA is 13.5.  REALLY? Why the disparity?

I am going to be coming up with a list of scenarios for a future post so we can discuss what we collectively think of each.

In the meantime, my apologies for the geekyness of this post. Future posts on the topic of new metrics will hopefully be less geeky. .

This entry was posted in Sabermetrics. Bookmark the permalink.