Sam Mondry-Cohen of the Nationals presented at the DC chapter of the Society for American Baseball Research (SABR) on Saturday, January 30, 2016. He provided an excellent overview of how the Nationals’ analytics group has grown and some of the things he and his folks are researching as well as what MLB Advance Media (MLBAM) is doing. What follows is a laundry list of many of the topics he discussed.
But first, an announcement. The meeting of the Baltimore SABR Chapter is this coming Saturday, February 6.
Sam started working for the Nationals as an unpaid summer intern around 2009 (or perhaps 2010) while he was in college. After graduation he received a paid internship from the Nationals. At that time only about 10 teams had staff dedicated to analytics. There are a lot more teams with analytics groups now and Sam has grown his team – there are now 5 staff members who work on analytics projects for the Nationals.
He described the efforts he originally worked on very briefly. The focus then was more about trying to tailor the analytics and the approach to a specific team (in his case the Nationals). He used the Rockies as an example commenting (not surprisingly) that the public numbers (which I took to mean both data and resulting metrics) were not of much interest. Teams have access to other data and so the focus was on evaluating what would happen in Nats Park. So the effort was to adapt the research results available to the general public and apply it to the Nationals. So he did distinguish between the public data and the techniques that were known to the public. Another point he made was that they were not that interested in describing what had happened (e.g., last year) and were much more interested in projecting what would happen in the next year. He used Bryce Harper winning the MVP as an example – they are trying to determine what will be needed for Bryce to win the MVP next year.
Sam identified three data sources, several of which are specific to and private for a specific team:
- Data from scouts.
- The new technology that tracks the ball (e.g., Statcast – more on that later).
- Box score type data.
While Sam did not dismiss the box score type data, he made it very clear that the other two sources were of more interest to the Nationals and thus got more attention.
Private Team Data
Sam said that the PitchFx data that is widely available to the public was originally supposed to be private. The company that did the original work on the technology planned to keep it private so that they could sell it to MLB teams. But it was released to the public by mistake.
However there are a parts of that data and related data that are not available publicly and are private (i.e., sold) to each MLB team (it was not clear if a team only got data for their games, but that seemed to be his implication). For example, bat speed has been available for purchase for roughly the last 5 years. The Nationals have been aggressive in purchasing and using such data.
He then segued into the Statcast data. MLB is pushing having this technology in every stadium and plans to make it public. The net effect of this is that the advantage of such data to teams who have been purchasing and using it (e.g., the Nationals) will lessen over time).
Sam described some research being done by Daren Willman (@darenw on Twitter), formerly of Baseball Savant and now with MLB Advanced Media. This was a fascinating discussion about what research is being done in the area of tracking batter contact and the fielder’s responses to the batted ball. He even joked about the fact that they could also track the umpires movement and responses. 🙂
The tweet below is an example of an attempt to better visualize a fielders range. What Daren is working towards is to eliminate the significance of starting position so that what is being displayed is a reflection of the fielder’s fielding skills – an attempt to separate the impact of scouting, coaching and positioning from fielding ability.
My 1st attempt at normalizing out OF charts from the same starting point. Heres Cain & Kiermaier (15MPH+ & 6sHT) pic.twitter.com/abn7RRuXVi
— Daren Willman (@darenw) January 24, 2016
This link shows the range graph for Kevin Kiermaier (Gold Glove CFer for the Rays) vs. Lorenzo Cain who was not even a Gold Glove finalist. Daren limited the data to those batted balls (that were caught) that had a hang time of at least 6 seconds and where the fielder reached 15mph when running to make the catch. Sam also showed some charts that showed the distribution (again, adjusting for starting position of the fielder) of balls that were caught vs. not caught. Note that if you click on the tweet and scroll a bit you can see the same chart without the lines. Removing the additional clutter created by the lines allows for a clearer view of the range for both fielders.
While I am sure that Nationals fans would love to see such details about Nationals players I suspect that Sam views that as proprietary IP (intellectual property) both in terms of the techniques they are using as well as the results.
Basically the analytical groups for teams can now track every movement a player makes.
Sam opened up the floor for questions and a summary of that discussion follows. Note that I have paraphrased the questions here as I concentrated on taking notes on his responses.
Do they use the PitchFx data to evaluate umpires?
They look at that data more to evaluate umpire tendencies and what their strike zone is as opposed to how close they are to the real strike zone – and the umpire he mentioned was Joe West.
I had a chance to chat very briefly with him after his talk and he did not disagree with the results I described to him and posted in Umpires vs. PitchFx Game Day Data where roughtly 33% of called strikes are actaully balls as well as the fact that Joe West’s tendencies include not calling high strikes.
He also commented that there are differences based on type of pitch – if a fastball and a curve both cross the front of the plate at exactly the same position, the fastball is much more likely to be called a strike.
Has he looked at the impact of shifting defenses and does he think it will become more prevalent?
He had a very short response to this question saying that batters were not doing a good enough job in adjusting to the shift and he mentioned bunting as an example.
Is pitch framing real?
Sam responded that the idea of pitch framing has been around for a very long time. The difference is that it can now be measured and quantified.
Is what he is doing a good career path?
He commented that it was a good time to get into this field and the attributes that he mentioned as important are:
- Being able to write computer programs that can manipulate very large files (in the computer industry this is referred to as Big Data). Analyzing video files with 30 shots per second generates a lot of data.
- He added that most of the analysis and graphical display is being done with R.
- Data visualization skills.
- And he added that there are lots of opportunities in this field and mentioned that the Mets are now looking for such resources.
Is the league using this data for safety – e.g., pitchers?
It seems that he misunderstood this question as it sounded like the question was about line drives hitting the pitcher. Sam answered it by talking about Tommy John surgery. He said the league is doing a lot of research in that area but that his focus is very different. He is more interested in predicting which Nationals pitchers are likely to need it so as opposed to just understanding better what causes it and how to prevent it.He then mentioned (as many of us here know) that the Nationals are making a significant investment in analytical techniques for medical research as it relates to player health and injuries.
Does he work directly with the players on his (and his team’s) findings?
He and his folks work exclusively with Rizzo and the Coaches and Scouts. Insights and discoveries are shared with them and then it is up to the coaching staff to determine how to be use that with the players. He did comment that Max Scherzer was particularly interested in this data and research.
Update: Check out Section222’s comment for some additional info from the presentation.