PitchingBot: Now With Seam-Shifted Wake

Introduction

This is the final update that I'll be making to my pitch quality model. The main update is the inclusion of spin/movement axis differences which was a clear weakness of my previous work. In addition, I've reworked the format of the underlying models.

These approach changes have limited me to data from 2020 onwards, but greater accuracy has been achieved in almost every model prediction.

This model update will only be used on my main website, other apps will continue to use the old version. Pitching grades from before 2020 will be grandfathered in using the old model.

For more information on older versions of the model, see these blog posts: [1], [2]

Spin Axis and Efficiency

On a pitch-by-pitch basis, the observed spin and movement axis from Statcast data can be compared to produce the axis difference metric, commonly attributed to Seam-Shifted Wake (SSW). Increased SSW is generally correlated with improved performance as seen in the graph below, which used the old version of my model.

An important aspect to pitch quality which is not in my models yet: Seam-Shifted Wake.

These graphs for sinkers and four-seam fastballs show that pitches with more SSW outperform their expected run value by more than those with less SSW. pic.twitter.com/e4i3ldKriY
— Cameron Grove (@Pitching_Bot) January 18, 2022

Spin Efficiency is estimated by using total movement and spin rate, it is not a direct measurement. High movement with a low spin rate implies a high spin efficiency, and vice-versa.

To see why these aspects are important, let's look at what the models think will make a good combination of spin efficiency and axis difference on a sinker (Any variation of the following graphs can be made using the "Stuff Plots" tab on my website). I'll use Shohei Ohtani's new sinker as a reference.

High whiff rate, low hard-hit rate, and high groundball rate, it looks like Ohtani's new sinker is in the perfect place on this diagram!

Model Design

I've now split the models into three major categories:

Fastball (Four-seam, sinkers, primary cutters)
Breaking Ball (Slider, curveball, secondary cutters)
Offspeed (Changeups and splitters)

Separate models are trained for these general pitch categories for each event prediction.

The Stuff models are also split in this way, the Stuff models only attempt to predict what happens on a swing event, because I did not want this model to make a proxy for zone rate or called-strike rate on non-swing events.

The command model has been left as-is for now.

In this update, I left much more time for hyperparameter tuning in the xgBoost models. A much lower minimum learning rate was used which has led to improved performance at the expense of not being able to use my laptop for a few days while finding the ideal set-up for each model!

Model Accuracy

This new model design has resulted in increased model accuracy in every prediction, the table below shows the improvement on 2022 data, which was unseen for both the old model and the new one. Part of the accuracy increase could be that the model is more well adjusted to the modern league baseline, as it hasn't been trained using older data. However, the improved R² value implies that the model has improved beyond just a better baseline.

RMSE
Statistic	Old Model	New Model	Difference
Swing%	0.3878	0.3841	-0.0038
SwStr%	0.3127	0.3115	-0.0011
GB%	0.4713	0.4699	-0.0015
LD%	0.4254	0.4248	-6e-04
FB%	0.4474	0.4457	-0.0017
HardHit%	0.4613	0.4581	-0.0032
Whiff%	0.3902	0.3885	-0.0017
Foul%	0.4876	0.4867	-9e-04
Ball%	0.2236	0.221	-0.0027
CS%	0.2174	0.2122	-0.0052
HBP%	0.0612	0.0618	6e-04
xRV	0.1382	0.1381	-1e-04

R²
Statistic	Old Model	New Model	Difference
Swing%	0.3971	0.4088	0.0117
SwStr%	0.0856	0.093	0.0073
GB%	0.0964	0.1016	0.0052
LD%	0.0034	0.0044	9e-04
FB%	0.0954	0.0996	0.0042
HardHit%	0.1047	0.1157	0.011
Whiff%	0.1994	0.2059	0.0065
Foul%	0.0496	0.0527	0.0031
Ball%	0.7708	0.7754	0.0046
CS%	0.7802	0.7911	0.0109
HBP%	0.3071	0.3047	-0.0025
xRV	0.0801	0.0831	0.003

Notable Grade Changes

With a reasonably large change in the underlying models, some pitchers and pitches will now have different evaluations. Here are some of the largest changes:

And in Stuff:

Closing Remarks

I'll be leaving my models and baseball analysis alone for a while after this update. I'm starting to write my thesis and so I anticipate having less free time this Autumn.

The work I do after graduating may limit the capacity to which I can maintain PitchingBot, but I'm committed to finding a new publicly available home for it if necessary.

Ahead in the count

Search This Blog