Bill James defined the "Pythagorean Theorem of Baseball," as a tool to predict a team's final win record based on their performance in the season to this point, which is useful for coaches and commentators for a variety of reasons, and is so named because of superficial similarity to the Pythagorean Theorum for finding the length of the hypatenuse c of a right triangle abc: a^2 + b^2 = c^2
. Other sports statisticians have since adapted the method for other sports, with different exponents to account for differing scoring systems. For example, in basketball, ESPN uses ^16.5.
How closely do these numbers work for recent seasons? How has the exponent changed over time, and is this method actually any good for other sports?
I start here with the NHL.
Using the official NHL API, I requested points scored for and against each franchise and comparing that to win% for the franchise in each season of the NHL, shown in this script. The results of this collection, organization, and analysis were saved into a .csv spreadsheet file.
Plotted from the data mentioned previously, using matplotlib, numpy, and pandas, in the script shown here.
Kevin Dayaratna and Steven J. Miller estimated an exponent slightly greater than 2 for hockey, though the above chart suggests an exponent slightly lower than 2 would produce more reliable predictions for the most recent seasons.