A PitchingBot Overhaul

This is a post to describe various updates to my pitch quality models.

For the post describing the initial models see here.

For a link to the app where you can explore the model ratings click here.

Why Change?

There have been many developments in the pitch evaluation model public space since I initially created my models. Max Bay and Eno Sarris' Pitching+, and others have entered the conversation about how we rate good pitches.

These developments have helped me to realise some of the shortcomings of my models and have given me ideas on how to improve their accuracy.

What's changed?

These are the main updates which I have added to my models:

Included fastball velocity and movement to the input variables
More ball in play predictions, including exit velocity in addition to batted ball type. This allows me to estimate HR% for pitchers too
Each prediction model now has more finely tuned hyperparameters than before
Stuff models now only predict events which occur on swings, this is because otherwise I have to infer swing% and zone% from stuff alone.
On the app display I've added percentile sliders for various rate statistics, reminiscent of those on the BaseballSavant website

If you aren't interested in all the details then you can stop reading here, otherwise the following sections will give more information on each change.

Fastball Quality Input Variables

My previous version of the pitch evaluation models treated every pitch individually, this meant that there was no way for the model to know who was throwing the pitch however this removes important context which can improve the models significantly.

If a pitcher has a good fastball then it improves their other pitches accordingly. This is because the better the fastball is, the less time that hitters have to react to breaking balls and offspeed pitches. The graphs below show this effect in my old models. They were underrating the non-fastball pitches of pitchers with good fastballs.

A new input variable which has velocity and movement differences for each pitch from the fastball average on an appearance basis has fixed these differences.

Here is a graph which shows the importance of the relationship between a fastball and an offspeed pitch. Changeup whiff rate is heavily dependent on both the vertical movement difference and the velocity difference to a pitcher's fastball.

Ball in Play Predictions

My old models predicted whether a ball in play would be a groundball, line drive, or flyball, but there's clearly a large difference between a weakly hit ball and a hard hit ball. Weakly hit flyballs are awful for the hitter, while hard hit flyballs are the best batted balls by run value. Pitchers only have limited control over exit velocity, it is more affected by a hitter's power. However pitchers can have some effect which is visible over a large sample size. I aimed to include predictions of how hard a ball would be hit in order to improve my evaluation of balls in play.

I modified my ball in play prediction model to assign probabilities to 15 different events. These are the three main batted ball types: groundball, line drive and flyball, along with exit velocity in 5 bins with edges at: 90mph, 95mph, 100mph, 105mph. The frequency and run value of these batted ball types are shown below, the colour of the dots represents the run value and the size represents the relative frequency.

Hard-hit flyballs are most valuable but also the rarest batted ball type. These exit velocity bins are rather arbitrary and if I find a much more useful way to bin them then I'll switch to using that instead.

The graphs below show that the probabilities produced by the models are representative of the actual probability of a particular batted ball event happening. HH1 through HH5 in the second set of graphs represent my exit velocity bins from low to high.

Percentile Sliders

A useful feature of the website BaseballSavant is the box at the top-right of every player page with a set of sliders, these quickly show the relative ability of the chosen player in a variety of stats. I decided to add my own version of these sliders to show how players perform in my models relative to other players.

All stats should be reasonably self explanatory, higher is always better in these ranks. For each stat a simple google search of the stat without the preceding "x" should give the definition if you are unsure.

It is important to note that these sliders show the percentile rank of a pitcher in each stat, not the value of the stat itself!

xHardContact% is defined as the predicted fraction of batted balls hit harder than 95 mph.

xHR% is the predicted home run rate, found by using the predicted rates of the different types of balls in play defined above.

As an example here's Corbin Burnes, who's great at everything!

Summary

A web app containing pitcher ratings using this method can be found here. If you have any questions feel free to send me a message on Twitter, put a comment below, or find my email address on my website.

PitchingBot: Now With Seam-Shifted Wake

Introduction This is the final update that I'll be making to my pitch quality model. The main update is the inclusion of spin/movement axis differences which was a clear weakness of my previous work. In addition, I've reworked the format of the underlying models. These approach changes have limited me to data from 2020 onwards, but greater accuracy has been achieved in almost every model prediction. This model update will only be used on my main website , other apps will continue to use the old version. Pitching grades from before 2020 will be grandfathered in using the old model. For more information on older versions of the model, see these blog posts: [1] , [2] Spin Axis and Efficiency On a pitch-by-pitch basis, the observed spin and movement axis from Statcast data can be compared to produce the axis difference metric, commonly attributed to Seam-Shifted Wake (SSW). Increased SSW is generally correlated with improved performance as seen in the graph below, which used the o...

UnknownFebruary 27, 2022 at 5:03 PM
I would like to generate your statistics myself using R and XGBoost, how did you code it?
AnonymousApril 13, 2022 at 9:02 PM
could you add expected flyball%? (xFB%)
AnonymousMay 1, 2022 at 8:01 PM
Is this similar to the blob app that driveline baseball uses? Very nifty!

Ahead in the count

Search This Blog