Skip to main content

PitchingBot: Now With Seam-Shifted Wake

 Introduction

This is the final update that I'll be making to my pitch quality model. The main update is the inclusion of spin/movement axis differences which was a clear weakness of my previous work. In addition, I've reworked the format of the underlying models.

These approach changes have limited me to data from 2020 onwards, but greater accuracy has been achieved in almost every model prediction.

This model update will only be used on my main website, other apps will continue to use the old version. Pitching grades from before 2020 will be grandfathered in using the old model.

For more information on older versions of the model, see these blog posts: [1], [2]

Spin Axis and Efficiency

On a pitch-by-pitch basis, the observed spin and movement axis from Statcast data can be compared to produce the axis difference metric, commonly attributed to Seam-Shifted Wake (SSW). Increased SSW is generally correlated with improved performance as seen in the graph below, which used the old version of my model. 


Spin Efficiency is estimated by using total movement and spin rate, it is not a direct measurement. High movement with a low spin rate implies a high spin efficiency, and vice-versa.

To see why these aspects are important, let's look at what the models think will make a good combination of spin efficiency and axis difference on a sinker (Any variation of the following graphs can be made using the "Stuff Plots" tab on my website). I'll use Shohei Ohtani's new sinker as a reference.





High whiff rate, low hard-hit rate, and high groundball rate, it looks like Ohtani's new sinker is in the perfect place on this diagram!

Model Design

I've now split the models into three major categories:
  • Fastball (Four-seam, sinkers, primary cutters)
  • Breaking Ball (Slider, curveball, secondary cutters)
  • Offspeed (Changeups and splitters)
Separate models are trained for these general pitch categories for each event prediction.

The Stuff models are also split in this way, the Stuff models only attempt to predict what happens on a swing event, because I did not want this model to make a proxy for zone rate or called-strike rate on non-swing events.

The command model has been left as-is for now.

In this update, I left much more time for hyperparameter tuning in the xgBoost models. A much lower minimum learning rate was used which has led to improved performance at the expense of not being able to use my laptop for a few days while finding the ideal set-up for each model!

Model Accuracy

This new model design has resulted in increased model accuracy in every prediction, the table below shows the improvement on 2022 data, which was unseen for both the old model and the new one. Part of the accuracy increase could be that the model is more well adjusted to the modern league baseline, as it hasn't been trained using older data. However, the improved R² value implies that the model has improved beyond just a better baseline.

RMSE

Statistic Old Model New Model Difference
Swing% 0.3878 0.3841 -0.0038
SwStr% 0.3127 0.3115 -0.0011
GB% 0.4713 0.4699 -0.0015
LD% 0.4254 0.4248 -6e-04
FB% 0.4474 0.4457 -0.0017
HardHit% 0.4613 0.4581 -0.0032
Whiff% 0.3902 0.3885 -0.0017
Foul% 0.4876 0.4867 -9e-04
Ball% 0.2236 0.221 -0.0027
CS% 0.2174 0.2122 -0.0052
HBP% 0.0612 0.0618 6e-04
xRV 0.1382 0.1381 -1e-04


Statistic Old Model New Model Difference
Swing% 0.3971 0.4088 0.0117
SwStr% 0.0856 0.093 0.0073
GB% 0.0964 0.1016 0.0052
LD% 0.0034 0.0044 9e-04
FB% 0.0954 0.0996 0.0042
HardHit% 0.1047 0.1157 0.011
Whiff% 0.1994 0.2059 0.0065
Foul% 0.0496 0.0527 0.0031
Ball% 0.7708 0.7754 0.0046
CS% 0.7802 0.7911 0.0109
HBP% 0.3071 0.3047 -0.0025
xRV 0.0801 0.0831 0.003

Notable Grade Changes

With a reasonably large change in the underlying models, some pitchers and pitches will now have different evaluations. Here are some of the largest changes:


And in Stuff:



Closing Remarks

I'll be leaving my models and baseball analysis alone for a while after this update. I'm starting to write my thesis and so I anticipate having less free time this Autumn.

The work I do after graduating may limit the capacity to which I can maintain PitchingBot, but I'm committed to finding a new publicly available home for it if necessary.

Comments

Popular posts from this blog

Custom Pitch Stuff Grades

  I've made an app allowing anyone to see what my models think of any hypothetical pitch.

Don't Let Opposing Hitters See the Same Reliever Too Many Times, Especially in the Postseason.

Here I show how relief pitchers get significantly worse results when hitters see them on multiple occasions in a short time period.

PitchingBot - An Overview

PitchingBot is a model I have made to evaluate pitch quality from the characteristics of the pitch alone. This post goes through the details of making and testing PitchingBot before giving some topic ideas for future posts which will use the model.