PitchingBot is a model I have made to evaluate pitch quality from the characteristics of the pitch alone. This post goes through the details of making and testing PitchingBot before giving some topic ideas for future posts which will use the model.
What is PitchingBot?
Motivation
Flashback to June 2020, Coronavirus lockdowns are widespread and there is scarce hope for a 2020 baseball season. I'd been using plenty of data science and machine learning techniques in my work and was looking for a way to fill the baseball-shaped void in my life. I came up with the idea to try and measure pitch quality by using pitch characteristics alone, there is a wealth of public data on every pitch thrown in the major leagues thanks to Statcast and I thought I could put it to good use.
This is not a unique idea, quality of pitch metrics have been made before. QOP is one example that uses linear regression on multiple variables including speed, location, and various parameters describing pitch break. To me, QOP felt like it had a little too much human influence in it, for example, the relative value of a pitch's location should be able to be determined by the model, not used as an input. Additionally, the choice of variables seemed rather limited as there was no contextual information on the pitch such as batter handedness or the count.
Ethan Moore wrote a great article on his own pitch evaluation model. I only saw this after I had made my initial version of PitchingBot but it provides useful context on other public attempts at pitch evaluation. He used a k-nearest neighbors model to evaluate pitches, taking the mean of the results from the 100 closest pitches to the evaluated pitch. This is very similar to what I have done with PitchingBot, however, I have used a larger number of input variables and a different model algorithm.
PitchingBot Description
An overview of my first iteration of PitchingBot can be found on the Fangraphs community blog. PitchingBot has since been upgraded to include more variables and to predict more outcomes.
PitchingBot is a machine learning model built in R using XGBoost. Machine learning can get pretty complicated and even I don't understand it most of the time, but for certain applications where large amounts of data are available, it can be a powerful tool. The inputs used by PitchingBot are:
- Pitch Type
- Pitch location as it crosses the plate
- Vertical and horizontal movement
- Velocity
- Spin rate
- Pitcher arm slot (release point x and z)
- Pitcher handedness
- Batter handedness
- Count (balls and strikes)
- Swing
- Swinging strike
- Called strike
- Ball
- Foul ball
- Ball in play
- Contact
- Groundball
- Line drive
- Flyball
How was PitchingBot made?
To train PitchingBot, I used all the data in the Statcast era 2015-2020. The baseballr package was exceedingly helpful for scraping the millions of pitches thrown during this time. I threw away pitches that had incomplete tracking data which reduced the size of the dataset by around 10%. Linear weights were used to find the run value for different events, including changes in the count and batted ball events.
To train PitchingBot I split the data randomly into two sets, 80% of the data for training and 20% for testing. The model was only allowed to see the training data while it learned how to predict pitches, then the model was evaluated on the test data. This approach is used to avoid overfitting which would reduce the model's predictive power. There are several parameters of the XGBoost model which were tuned to produce the smallest error in the test set. This was also done for all four models which make up PitchingBot.
Testing PitchingBot
Baseball is a sport that is notoriously random. The results of individual MLB games aren't too different from coin flips for all but the most lopsided matchups. This means that any attempt to predict the outcomes of pitches will have large error bars.
To see if PitchingBot is working correctly, we can look at the accuracy of the predictions, and whether they line up with our expectations of what makes a good pitch.
In my initial investigations, I found that PitchingBot's predictions agreed with conventional knowledge on what makes a good pitch. PitchingBot thinks it is best to throw pitches in the corners of the zones with high velocity, movement, and spin rate. Also the pitchers with the best results in 2020 as predicted by PitchingBot were Gerrit Cole, Zac Gallen, Jacob deGrom, Trevor Bauer, & Blake Snell which agrees with our preconceptions of which pitchers throw the best pitches.
PitchingBot's predictions of pitch value agree with subjective views on pitch value. Here it is best to throw a 3-2 fastball in the corners of the zone |
We can look at the run values predicted by PitchingBot vs their actual results in the following graph
The pitches are grouped by their predicted run value and then compared to the mean run value that resulted. The blue line shows a 1:1 ratio which ideally PitchingBot would follow as closely as possible. The size of the dot represents the number of pitches in that group. We can see that for pitches with above-average run value (bad pitches), PitchingBot is pretty accurate and the dots follow the line closely. Meanwhile, for pitches with negative run value (good pitches), PitchingBot is hedging its bets slightly and doesn't think they are as good as they really are. This is an interesting finding and a future blog post will certainly look at where this effect comes from since fixing this will lead to improvements in the model.
We can also look at PitchingBot's predictions of specific events to assess whether the probabilities are accurate.
The above graph groups PitchingBot's predictions of the probabilities of specific events compared to their actual probabilities. For each category, a horizontal line is shown which corresponds to the average rate of that event and a dashed line shows a 1:1 ratio of predictions to actual rates. For the most useful predictions, the red dots would be concentrated along the dashed line while being as close to 0 or 1 as possible. It looks like PitchingBot is making accurate predictions for a large range of events, with particularly good predictions for swing %, called strike %, ball %, and contact %.
PitchingBot can struggle to predict batted ball events, rarely giving high likelihoods for any batted ball type. This is understandable as there is a large amount of uncertainty about where the ball will go before it is hit and PitchingBot doesn't even know if the ball will be hit into play. The model is most accurate and gives the highest likelihoods to groundballs, while line drives are almost never predicted with greater than 20% probability.
Uses & Limitations
PitchingBot can tell us predicted run values and event likelihoods for all pitches thrown in Major League Baseball, regardless of the specific players involved. This is incredibly useful and can tell us about what makes a good pitch, along with who throws them. Having pitch level expected stats gives us the ability to examine which pitchers, batters, and even catchers perform above expectations on the most granular level possible.
The effects of pitch sequencing, tunneling, and other deception techniques are not included in the model which limits its predictive power. In addition, PitchingBot's assumptions are based on the performance of the average batter, in reality batters are idiosynchratic and it is better to throw in the area where a batter is weak than PitchingBot would predict.
On this blog I aim to investigate the wealth of data provided by these pitch values and predicted results. Topics include:
- Who throws the best pitch of each type?
- What are some of the most dominant games pitched according to PitchingBot?
- Evaluating catcher framing and game calling
- Oddities - events which PitchingBot thought were extremely unlikely
- The predictive value of PitchingBot's predictions
- Pitchers who overperform expectations based on their raw pitch quality, and how they do it
Enjoying your analysis. I am curious about the qualitative score you assign to 'Command'. How do you arrive at the metric?
ReplyDeleteFor different pitch types and the ball/strike counts, it's possible to work out the probability of different events happening when the ball is thrown in different areas of the zone. Run values can be given to these predicted events , e.g. a swinging strike has a lower run value than a hard hit flyball.
DeletePitchers with better "Command" grades throw the ball in areas that are likely to lead to lower run value events. But the command model doesn't know anything about the "Stuff" of the pitch, it purely judges based on location given the pitch type and count.
I understand the response. I am confused though. To me a command score would be a qualitative assessment between intent and actual. If a pitcher intends to hit a specific part of the zone and his actual pitch hit that spot then he would have a higher command score. The score would depreciate coincident with the scale of missing that spot. I think your model assumes that a pitcher intends to hit a specific spot based on the models probability of success. This is a topic that really intrigues me. If you are open to it...I would love to trade contact information so we could arrange to discuss it in more detail.
ReplyDeleteI agree, command is a bit of a misnomer. A more accurate term would be: "throws pitches in locations more likely to result in good outcomes".
DeleteOf course sometimes a pitcher has a valid reason for not aiming at the optimal location. Maybe they're setting up a pitch or exploiting a hitter's weakness. Unfortunately I cannot know where a pitcher is aiming so this is the best proxy I can come up with.
You can find my contact information on my website:
https://www.pitchingbot.com/contact/
Very cool, a clever use of xgboost for sure. I'd love to hear about the xhboost hyperparameter values you tuned. From experience, I've had great success using early stopping.
ReplyDelete