Baseball’s BABIP is more than just a number!

Hello again! Spring training is finally upon us and the regular season is firmly on the horizon just 30 days away. A few weeks ago I talked about my belief that the Nationals wanted a more contact based approach. At the end, I also discussed why this was beneficial. In particular I cited the volatility of Batting Average On Balls In Play (BABIP), and how that volatility is more pronounced in players with higher strikeout rates.

I mentioned that BABIP is often seen as a measurement of “luck”. That really isn’t doing it justice though. Some players are able to sustain high BABIP throughout their entire careers, while other players consistently post low ones. So what are the main factors that affect a players BABIP?

*For a quick recap, BABIP is the acronym for Batting Average On Balls In Play, is the number of hits that a player got, after making contact with the ball, and forcing a fielder to field it. So home runs and walks are not factored in because they do not have any effect on a players BABIP. The formula for BABIP is (H – HR)/(AB – HR – K + SF).  Let’s use Juan Soto‘s .338 BABIP as an example. His hits were 121 minus 22 home runs = 99. His 414 at-bats minus 22 HR minus 99 strikeouts plus 0 sacrifice flies = 293.  Now divide 99 / by 293 = .338. 

There are numerous ways we could try to figure out what the largest contributors to BABIP are. I chose to simply use correlations (all from 2018 data) to see which stats most linearly followed the players BABIPs. Without delving too deep into math, each stat returns a correlation between -1 and 1. The farther that number is away from 0, the more likely that those statistics share a relationship of some kind. So a correlation of 0.25 is decent, 0.4 is great, and anything above 0.5 is showing a very strong relationship. So without further ado, I ran through 54 different statistics to see which correlated the most with BABIP.

[AVG, OBP, wOBA, wRAA, wRC+, OPS, wRC, LD, FB, IFFB, Oppo, SLG, IFH, GB/FB, Spd, Soft, Pull]

 

Stat Correlation Stat Correlation
Average 0.73 IFFB% -0.37
OBP% 0.60 Oppo% 0.34
wOBA 0.50 Pull% -0.34
wRAA 0.49 SLG% 0.32
wRC+ 0.48 IFH% 0.28
OPS 0.46 GB/FB 0.28
wRC 0.43 Speed 0.25
LD% 0.42 Soft -0.25
FB% -0.39

A lot of info! There are some interesting concepts in there, but also a lot of noise distracting us. It makes sense that a high BABIP would result in a high batting average (BA), the same goes for OBP. It also makes sense that players with higher averages will have a higher wRC+ and wRAA because those are measures of offensive ability. Players with high BABIP will tend to have done better, but there are exceptions to the rule for extremely high strikeout players. So let’s clean it up, and only look for factors which we think might influence BABIP, not things that BABIP is influencing.

Stat Correlation
LD% 0.42
FB% -0.39
IFFB% -0.37
Oppo% 0.34
Pull% -0.34
IFH% 0.28
GB/FB 0.28
Speed 0.25
Soft -0.25

 

Here are nine statistics with a correlation over 0.25, which we know BABIP cannot influence. So let’s go through them as groups:

LD%/FB%/GB% The correlations here make a lot of sense. Ground balls have the lowest batting average results, line drives have the highest. So a player with a higher line drive percentage, is more likely to maintain or have a high BABIP, compared to a player that hits balls on the ground. The most interesting though is FB% relationship of -0.39. Negative numbers also indicate correlations, but they indicate correlations in the opposite direction. So a correlation of 0.42 means a high LD% correlates with a high BABIP. A correlation of -0.39, means a high FB% correlates with a low BABIP. One of the main points of the fly ball revolution was the promise of increased power, but perhaps at the expense of average.

Nationals’ hitting coach Kevin Long talked this off-season about how Harper bought into hitting fly balls too much. Once he went back to hitting line drives the rationale went, the average and the rest of his numbers went back up. Bryce Harper‘s first half BABIP and LD % were .226 and 20.9. In the second half, he flipped that to .378 and 24.2. I’m not saying that a 3% increase in LD% increased his BABIP by .150 points.

“[Harper]  made his father [and] everybody involved got caught up in the launch-angle stuff,” Kevin Long said. “He literally tried to hit the ball in the air way too much. When we started simplifying, we started calling them ‘boring line drives.’ Let’s go do some BLD work we’d say. It was the most boring cage work you’ve ever seen, but it translated.”

We can break down LD% down a little deeper though. Using Baseball Savant we can see that at one point in July he was hitting fastballs at a launch angle over 20 degrees, while hitting breaking and off-speed pitches below 10 and 5 degrees. By the end of the year though, those three only ranged between 18 to 10 degrees. IFFB% also does not correlate with good results. This makes sense given infield fly balls almost always results in an out. So players who hit high numbers of infield flies will also be more likely to have fewer hits.

 

Spd and IFH%  These two also make a lot of sense. BABIP is all about racking up as many hits as you can out of your at bats. A player who can turn a few extra outs into hits every year with his speed can add a few extra points of BABIP ever year too.

Oppo% vs Pull% We have two polar opposite correlations for two polar opposite hit directions. Hitters who hit the ball more to the opposite field, carry higher BABIP in general. We can rationalize this one out fairly well as well though. The advent of the shift has cratered the ability of certain hitters, particularly the pull hitter. Players who pull the ball more (particularly on the ground) are the most likely to be shifted against. As a result of the shift, they are bound to some hits, maybe even a substantial enough. The ability to hit to the opposite field not only lessens the damage of the shift, but I think we could also see it as a sign of advanced bat control to an extent.

Speed or lack thereof certainly affects the BABIP. Trea Turner is one player who you could see the correlation in this quote from Kevin Long speaking about Brett Gardner when Long was the hitting coach with the Yankees.

“…You don’t want a speed guy to make his outs in the air,” Kevin Long said on Brett Gardner. “I don’t want fly balls. Certainly a live drive or in between a fly ball and line drive—those will work as well. But I think that’s just a matter of him getting the head out a bit more and being a bit more aggressive and saying, ‘I’m not going to be late.’ And any time you do make your contact point out further, you’ll hit some more fly balls.”

The Nationals certainly have their share of speed guys. Line drives have the highest BABIP rate which is what Kevin Long says he teaches — not launch angle — just “boring line drives” and some of those become home runs. Kevin Long also will tell you that a groundball has about a .222 BABIP. Compare that to a line drive rate of .685 according to Fangraphs. Not surprisingly, a flyball has a lower average than the groundball does at .207. Again, this is more about the student than the teacher. You can teach the Charlie Lau method, Craig Wallenbrock, or Kevin Long, but if you are popping up balls or pulling them into shifts, you are making outs. Long’s success with Daniel Murphy was based on line drives and going with the pitches which is why he found success.

Nationals’ BABIPs

The Nationals last year had the 12th highest BABIP at .297. For reference, the Rays led with .317 and the Angels held the rear with .277. Here is where the Nats ranked collectively in the stats mentioned above.

Stat 2018 Rank
LD% 23rd
FB% 16th
IFFB% 15th
Oppo% 6th
Pull% 25th
IFH% 29th
GB/FB 11th
Speed 8th
Soft 9th

There are some underlying stats to like here. The Nationals posted the 6th highest Oppo%, 8th in Speed, and middle of the pack in IFFB% and FB%.

There is also some tidbits that are a little worrisome though too. One thing that very quickly jumps out is that the Nats had the 2nd fewest infield hits of any team in the majors. This seems strange given this isn’t a team that is lacking speedsters. Speed score agrees too, and thinks the Nationals were the 8th fastest team last year. Why they would have had so few infield hits, I can’t really speculate on. It’s possible some of it really is just luck, and they just placed balls in poor places or their batters were more predictive for shifts or a combination of factors. It is definitely something to see if it trends into this year though.

There is a lot of information and formulas to digest here. The luck factor is supposed to even out over a long season, but as you watch each at-bat it becomes the smallest subset of the sample size which will distort the statistics until you have a large enough sample size. As you analyze that luck,  you then have the effect of one occurrence of BABIP if the ball was put in play. Depending on a fans’ perspective, a player or pitcher can be BABIP’d which is a verb used when that player was lucky or unlucky which again is solely based on the observer’s perspective. You saw it in yesterday’s game when Trea Turner smashed a line drive to start the game that was caught on a fine defensive play in deep centerfield for an out versus Andrew Stevenson‘s line drive to Dexter Fowler which was misjudged and became a double. Have a good time discussing this while also using some of this information in analyzing future games.

*Baseball photo art from the NY Times.
This entry was posted in Analysis, Feature, Uncategorized. Bookmark the permalink.