PitchingBot - An Overview

PitchingBot is a model I have made to evaluate pitch quality from the characteristics of the pitch alone. This post goes through the details of making and testing PitchingBot before giving some topic ideas for future posts which will use the model.

What is PitchingBot?

Motivation

Flashback to June 2020, Coronavirus lockdowns are widespread and there is scarce hope for a 2020 baseball season. I'd been using plenty of data science and machine learning techniques in my work and was looking for a way to fill the baseball-shaped void in my life. I came up with the idea to try and measure pitch quality by using pitch characteristics alone, there is a wealth of public data on every pitch thrown in the major leagues thanks to Statcast and I thought I could put it to good use.

This is not a unique idea, quality of pitch metrics have been made before. QOP is one example that uses linear regression on multiple variables including speed, location, and various parameters describing pitch break. To me, QOP felt like it had a little too much human influence in it, for example, the relative value of a pitch's location should be able to be determined by the model, not used as an input. Additionally, the choice of variables seemed rather limited as there was no contextual information on the pitch such as batter handedness or the count.

Ethan Moore wrote a great article on his own pitch evaluation model. I only saw this after I had made my initial version of PitchingBot but it provides useful context on other public attempts at pitch evaluation. He used a k-nearest neighbors model to evaluate pitches, taking the mean of the results from the 100 closest pitches to the evaluated pitch. This is very similar to what I have done with PitchingBot, however, I have used a larger number of input variables and a different model algorithm.

PitchingBot Description

An overview of my first iteration of PitchingBot can be found on the Fangraphs community blog. PitchingBot has since been upgraded to include more variables and to predict more outcomes.

PitchingBot is a machine learning model built in R using XGBoost. Machine learning can get pretty complicated and even I don't understand it most of the time, but for certain applications where large amounts of data are available, it can be a powerful tool. The inputs used by PitchingBot are:

Pitch Type
Pitch location as it crosses the plate
Vertical and horizontal movement
Velocity
Spin rate
Pitcher arm slot (release point x and z)
Pitcher handedness
Batter handedness
Count (balls and strikes)

PitchingBot is not just one model. I have made several variants to measure different aspects of pitching. The base model takes all the input data and predicts a run value for the pitch compared to the average pitch in that count. I have also created stuff and command models to measure these different aspects of pitching. The stuff model removes all information on the location of the pitch, meanwhile, the command model removes inputs such as the pitch speed, spin rate, and movement.

Finally, there is a classification model which predicts the likelihood of different events for a particular pitch. The events predicted are:

Swing
Swinging strike
Called strike
Ball
Foul ball
Ball in play
Contact
Groundball
Line drive
Flyball

Predicting different events gives us more to work with than just the run value of a pitch. It can also allow us to do more in-depth analysis into what the model thinks of certain pitchers.

How was PitchingBot made?

To train PitchingBot, I used all the data in the Statcast era 2015-2020. The baseballr package was exceedingly helpful for scraping the millions of pitches thrown during this time. I threw away pitches that had incomplete tracking data which reduced the size of the dataset by around 10%. Linear weights were used to find the run value for different events, including changes in the count and batted ball events.

To train PitchingBot I split the data randomly into two sets, 80% of the data for training and 20% for testing. The model was only allowed to see the training data while it learned how to predict pitches, then the model was evaluated on the test data. This approach is used to avoid overfitting which would reduce the model's predictive power. There are several parameters of the XGBoost model which were tuned to produce the smallest error in the test set. This was also done for all four models which make up PitchingBot.

Testing PitchingBot

Baseball is a sport that is notoriously random. The results of individual MLB games aren't too different from coin flips for all but the most lopsided matchups. This means that any attempt to predict the outcomes of pitches will have large error bars.

To see if PitchingBot is working correctly, we can look at the accuracy of the predictions, and whether they line up with our expectations of what makes a good pitch.

In my initial investigations, I found that PitchingBot's predictions agreed with conventional knowledge on what makes a good pitch. PitchingBot thinks it is best to throw pitches in the corners of the zones with high velocity, movement, and spin rate. Also the pitchers with the best results in 2020 as predicted by PitchingBot were Gerrit Cole, Zac Gallen, Jacob deGrom, Trevor Bauer, & Blake Snell which agrees with our preconceptions of which pitchers throw the best pitches.

PitchingBot's predictions of pitch value agree with subjective views on pitch value. Here it is best to throw a 3-2 fastball in the corners of the zone

We can look at the run values predicted by PitchingBot vs their actual results in the following graph

The pitches are grouped by their predicted run value and then compared to the mean run value that resulted. The blue line shows a 1:1 ratio which ideally PitchingBot would follow as closely as possible. The size of the dot represents the number of pitches in that group. We can see that for pitches with above-average run value (bad pitches), PitchingBot is pretty accurate and the dots follow the line closely. Meanwhile, for pitches with negative run value (good pitches), PitchingBot is hedging its bets slightly and doesn't think they are as good as they really are. This is an interesting finding and a future blog post will certainly look at where this effect comes from since fixing this will lead to improvements in the model.

We can also look at PitchingBot's predictions of specific events to assess whether the probabilities are accurate.

The above graph groups PitchingBot's predictions of the probabilities of specific events compared to their actual probabilities. For each category, a horizontal line is shown which corresponds to the average rate of that event and a dashed line shows a 1:1 ratio of predictions to actual rates. For the most useful predictions, the red dots would be concentrated along the dashed line while being as close to 0 or 1 as possible. It looks like PitchingBot is making accurate predictions for a large range of events, with particularly good predictions for swing %, called strike %, ball %, and contact %.

PitchingBot can struggle to predict batted ball events, rarely giving high likelihoods for any batted ball type. This is understandable as there is a large amount of uncertainty about where the ball will go before it is hit and PitchingBot doesn't even know if the ball will be hit into play. The model is most accurate and gives the highest likelihoods to groundballs, while line drives are almost never predicted with greater than 20% probability.

Uses & Limitations

PitchingBot can tell us predicted run values and event likelihoods for all pitches thrown in Major League Baseball, regardless of the specific players involved. This is incredibly useful and can tell us about what makes a good pitch, along with who throws them. Having pitch level expected stats gives us the ability to examine which pitchers, batters, and even catchers perform above expectations on the most granular level possible.

The effects of pitch sequencing, tunneling, and other deception techniques are not included in the model which limits its predictive power. In addition, PitchingBot's assumptions are based on the performance of the average batter, in reality batters are idiosynchratic and it is better to throw in the area where a batter is weak than PitchingBot would predict.

On this blog I aim to investigate the wealth of data provided by these pitch values and predicted results. Topics include:

Who throws the best pitch of each type?
What are some of the most dominant games pitched according to PitchingBot?
Evaluating catcher framing and game calling
Oddities - events which PitchingBot thought were extremely unlikely
The predictive value of PitchingBot's predictions
Pitchers who overperform expectations based on their raw pitch quality, and how they do it

I hope this clears up any questions people may have about how the model works but get in touch if you have any more questions! @Pitching_Bot on Twitter.

PitchingBot: Now With Seam-Shifted Wake

Introduction This is the final update that I'll be making to my pitch quality model. The main update is the inclusion of spin/movement axis differences which was a clear weakness of my previous work. In addition, I've reworked the format of the underlying models. These approach changes have limited me to data from 2020 onwards, but greater accuracy has been achieved in almost every model prediction. This model update will only be used on my main website , other apps will continue to use the old version. Pitching grades from before 2020 will be grandfathered in using the old model. For more information on older versions of the model, see these blog posts: [1] , [2] Spin Axis and Efficiency On a pitch-by-pitch basis, the observed spin and movement axis from Statcast data can be compared to produce the axis difference metric, commonly attributed to Seam-Shifted Wake (SSW). Increased SSW is generally correlated with improved performance as seen in the graph below, which used the o...

ScottJanuary 2, 2022 at 10:01 AM
Enjoying your analysis. I am curious about the qualitative score you assign to 'Command'. How do you arrive at the metric?
ScottJanuary 2, 2022 at 2:17 PM
I understand the response. I am confused though. To me a command score would be a qualitative assessment between intent and actual. If a pitcher intends to hit a specific part of the zone and his actual pitch hit that spot then he would have a higher command score. The score would depreciate coincident with the scale of missing that spot. I think your model assumes that a pitcher intends to hit a specific spot based on the models probability of success. This is a topic that really intrigues me. If you are open to it...I would love to trade contact information so we could arrange to discuss it in more detail.
Jason BrownleeMay 22, 2024 at 1:21 PM
Very cool, a clever use of xgboost for sure. I'd love to hear about the xhboost hyperparameter values you tuned. From experience, I've had great success using early stopping.

Ahead in the count

Search This Blog