|
For better and for worse, ever since former South Florida Sun-Sentinel writer Dave Heeren created TENDEX in the 1960s, linear weights have been a standard trope in the world of basketball stats.
The basic concept is simple--assign some set of positive weights to "good" actions (points, rebounds, assists, etc.) and negative weights to "bad" ones (turnovers, fouls, etc.). Add up the weighted total, then perhaps divide by games or minutes, and the result should give a vague approximation of a player's skill. No matter how much complexity the formula adds (and the currently reigning champ in this department might be John Hollinger's PER), this fundamental principle of linearity remains the same.
Although subsequent nonlinear metrics (like BP's WARP) have emerged as superior to linear-weighted equations, linear weights still have their place because it is still difficult to match their simplicity and ease of use--simply put, it's a lot easier to calculate TENDEX on the back on an envelope than Win Shares. For this reason, it is still valuable to know which linear weights formula is the best, and that means we must test each metric's accuracy.
More specifically, to determine the "best" formula I will use its ability to predict (or, more accurately, retrodict) game outcomes based on the minutes given to the players used by the two opponents. If a metric is to be of any value, a team composed of players who excel in it should consistently win out-of-sample games against teams composed of players whose expected metric performance is inferior.
The linear weights I tested are as follows:
- NBA Efficiency* = ((pts+reb+ast+stl+blk)-((fga-fg)+(fta-ft)+tov))/mp
- Old Win Score** (David Berri) = (pts+reb+stl+0.5*blk+0.5*ast - fga-0.5*fta-tov-0.5*pf)/mp
- Game Score (John Hollinger) = (pts+0.4*fg-0.7*fga-0.4*(fta-ft) + 0.7*orb+0.3*(reb-orb)+stl+0.7*ast+0.7*blk-0.4*pf-tov)/mp
- APMVAL (Neil Paine) = (45*pts-35*(fga+0.44*fta)+18*reb+30*ast+72*stl+41*blk-75*tov-39*pf)/mp
- Alternate Win Score (David Lewin/Dan Rosenbaum) = (pts+0.7*orb+0.3*(reb-orb)+stl+0.5*blk+0.5*ast-0.7 * (fga-fg)-fg-0.35*(fta-ft)-0.5*ft-tov-0.5*pf)/mp
- Sports Illustrated (David Sabino) = (pts+ft-(fta-ft)+2*3pm-(3pa-3pm)+2*fg-(fga-fg) + 2*blk+2*orb+2*ast-2*tov+1.5*stl+1.5*(reb-orb))/mp
- TENDEX = (pts+reb+ast+blk-tov-(fga-fg)-(fta-ft))/mp
- Production = (20*fg+20*(reb-orb)+20*ast+15*orb+15*stl+15*blk+10*3pm+10*ft-20 * tov-10*(fga-fg)-10*(fta-ft))/mp
- Thibodeau (Tom Thibodeau) = (2*fg-(fga-fg)+ft-0.5*(fta-ft)+3*3pm-1.5 * (3pa-3pm)+reb+ast-pf+stl-tov+blk)/mp
- Nash (John Nash) = (0.5*pts+0.5*reb+0.5*stl+0.5*blk+2*ast-tov)/mp
- Points Created (Bob Bellotti) = (pts+0.9*reb+1.1*ast+0.9*stl+0.9*blk - 0.9*(fga-fg)-0.9*(fta-ft)-0.9*tov-0.45*pf)/mp
- VORP (Kevin Pelton) = (pts+0.75*orb+0.25*drb+0.5*ast+stl+0.33*blk)/(1.5*(fga+(.44*fta)+tov + ((0.75*orb+0.25*drb+0.5*ast+stl+0.33*blk)/2)+(mp/4)))
- Old SPM (Neil Paine) = -12.9+0.3*3pa36+0.7*ast36+0.3*blk36+0.6*drb36+0.1 * mpg+0.4*orb36+0.4*pf36+1.2*pts36+2.9*stl36-1.5*tov36-1.3*tsa36
- New SPM (Neil Paine) = -12.5+0.3*3pa36+0.8*ast36+0.7*blk36+0.6*drb36+0.1 * mpg+0.3*orb36+0.4*pf36+1.1*pts36+2.2*stl36-1.5*tov36-1.2*tsa36
* The formula the NBA uses was popularized by Kansas City Star writer Martin Manley in his "Basketball Heaven" series from the late 1980s.
**--Berri released a new version of Win Score in December 2011, but it was not included in this study.
The first test is to see how well each metric predicts game outcomes out-of-sample but within the same season. To accomplish this, I sorted every team game of the 2010-11 season into "even" and "odd" bins based on the order they came in the schedule (i.e., the Sixers' first game is "odd"; their second is "even"). Pace-adjusted player stats from even games were then used to predict the outcomes of odd games; if a player played less than 250 minutes in any half-season, he was given the league's average in each metric for the purposes of prediction.
To show how this works, here's an example using NBA Efficiency and the New Year's Day 2011 game between the Heat and Warriors, an odd-numbered game for both teams:
Player Team MP NBA Eff
Ellis GSW 46.6 .459
Wright GSW 39.7 .412
Lee GSW 36.5 .597
Curry GSW 35.9 .564
Radmanovic GSW 21.6 .481
Udoh GSW 17.1 .349
Law GSW 14.0 .338
Amundson GSW 13.3 .342
Carney GSW 7.8 .460
Williams GSW 7.5 .419
Team 24.0 .467
Player Team MP NBA Eff
James MIA 41.5 .743
Wade MIA 39.4 .688
Bosh MIA 39.3 .523
Arroyo MIA 26.9 .312
Anthony MIA 22.4 .303
Chalmers MIA 21.6 .322
Jones MIA 21.3 .292
Ilgauskas MIA 16.7 .488
Howard MIA 1.9 .387
Team 24.0 .497
In this game, NBA Efficiency would have predicted a Heat victory (.497 to .467); since Miami actually won, the game would count as an accurate forecast for the metric. (Also note that Carney received a league-average rating--.460--as he did not play 250 minutes in even-numbered games last season).
By contrast, here's how "Old SPM" treated the game:
Player Team MP Old SPM
Ellis GSW 46.6 1.86
Wright GSW 39.7 3.35
Lee GSW 36.5 3.06
Curry GSW 35.9 4.64
Radmanovic GSW 21.6 2.80
Udoh GSW 17.1 -4.25
Law GSW 14.0 -3.13
Amundson GSW 13.3 -5.82
Carney GSW 7.8 0.28
Williams GSW 7.5 -0.37
Team 240.0 1.51
Player Team MP Old SPM
James MIA 41.5 6.91
Wade MIA 39.4 5.18
Bosh MIA 39.3 0.23
Arroyo MIA 26.9 -3.69
Anthony MIA 22.4 -2.81
Chalmers MIA 21.6 1.18
Jones MIA 21.3 -1.10
Ilgauskas MIA 16.7 -2.67
Howard MIA 10.9 -3.89
Team 240.0 1.06
Based on the distribution of minutes, Old SPM would have predicted a victory for Golden State (1.51 to 1.06), so this game represents an incorrect pick for that metric. Here's how each metric performed in all 1,230 games from the 2011 regular season:
Metric % Correct
Pts Created 68.5%
Old Win Score 68.5%
NBA Efficiency 68.3%
TENDEX 68.2%
Production 68.0%
Alt Win Score 67.9%
Thibodeau 67.4%
APMVAL 67.3%
Sports Illustrated 66.8%
Old SPM 66.4%
Game Score 66.1%
New SPM 65.7%
VORP 61.5%
Nash 61.2%
By this test, Bellotti's Points Created and Berri's Old Win Score performed the best, with each correctly predicting 68.5% of games last season.
For its part, Old Win Score also showed by far the highest correlation (0.81) with defensive rebounds per minute. Without a team defensive adjustment, a metric that correlates highly with DReb rate will naturally do better at predicting opposite-half games within the same season (team defensive performance is relatively stable from half to half, and a large number of defensive rebounds by definition means a large number of defensive stops).
Because of this, a better out-of-sample test might be to use metrics from one season to predict the following season. To that end, I took the same pace-adjusted metrics from every regular season between 1987 and 2011--save for the strange, lockout-shortened 1999 campaign--and ran a study similar to the one above, predicting every game's outcome and seeing which metric correctly predicted them at the highest rate. (Once again, players with less than 250 minutes in the preceding season were assigned the league's average rate in each metric.)
In a sample of 26,486 games, here were the most accurate metrics:
Metric % Correct
New SPM 63.2%
Alt Win Score 63.1%
Old SPM 63.1%
APMVAL 62.9%
Production 62.8%
Thibodeau 62.7%
Pts Created 62.6%
NBA Efficiency 62.5%
Old Win Score 62.4%
TENDEX 62.3%
Game Score 62.0%
Sports Illustrated 62.0%
VORP 61.1%
Nash 60.6%
Of course, even this is not necessarily the ultimate test of a metric. In 2009, I wrote a Basketball-Reference post that dealt with team continuity; in it, I posited that the best metric would be one that accurately predicted the fates of teams with a great deal of roster turnover. As I said then:
Instead of predicting every team, maybe we should only focus on teams with obvious personnel changes, where the unpredictability is going to be highest because of changing roles and team dynamics. ... It seems to me that predicting these types of teams would be more informative than the typical squad, because your metric is going to have to rise or fall on its ability to anticipate how a player will react to a different role on the team. We're always talking about 'Holy Grails' when it comes to player ratings, and figuring out a way to effectively predict situations like this would go a long way toward developing the best metric for building future teams."
With that in mind, I calculated "continuity scores" for every team since 1987 (continuity score being the percentage of team minutes given to players who were on the team the previous season). I then looked at each metric's predictive performance in games involving teams who scored low in continuity:
Metric Below Avg. Bottom 1/2 Bottom 1/3 Bottom 1/4 Bottom 10% Less than 50%
New SPM 63.98% ( 2) 63.83% ( 1) 64.47% ( 2) 64.38% ( 5) 64.93% ( 4) 64.69% ( 6)
Alt Win Score 64.07% ( 1) 63.82% ( 2) 64.48% ( 1) 64.64% ( 1) 65.01% ( 2) 64.94% ( 2)
Old SPM 63.68% ( 7) 63.61% ( 4) 64.15% ( 7) 64.07% ( 9) 64.55% ( 9) 64.38% ( 9)
APMVAL 63.91% ( 3) 63.69% ( 3) 64.22% ( 5) 64.32% ( 6) 64.73% ( 7) 64.73% ( 5)
Production 63.87% ( 4) 63.60% ( 5) 64.32% ( 3) 64.39% ( 4) 64.67% ( 8) 64.66% ( 7)
-------------------------------------------------------------------------------------------
Thibodeau 63.86% ( 5) 63.53% ( 6) 64.28% ( 4) 64.55% ( 2) 65.50% ( 1) 65.06% ( 1)
Pts Created 63.70% ( 6) 63.41% ( 7) 64.19% ( 6) 64.48% ( 3) 64.83% ( 6) 64.81% ( 3)
NBA Efficiency 63.47% ( 8) 63.23% ( 8) 64.02% ( 8) 64.25% ( 7) 64.95% ( 3) 64.77% ( 4)
Old Win Score 63.24% (10) 63.03% (10) 63.61% (10) 63.74% (10) 63.98% (11) 64.02% (10)
TENDEX 63.34% ( 9) 63.10% ( 9) 63.88% ( 9) 64.23% ( 8) 64.89% ( 5) 64.62% ( 8)
-------------------------------------------------------------------------------------------
Game Score 63.22% (11) 62.93% (11) 63.57% (11) 63.65% (11) 63.94% (12) 63.98% (11)
Sports Ill. 62.84% (12) 62.66% (12) 63.18% (12) 63.23% (12) 64.14% (10) 63.85% (12)
VORP 62.07% (13) 61.90% (13) 62.51% (13) 62.60% (13) 63.07% (13) 63.15% (13)
Nash 61.52% (14) 61.32% (14) 61.84% (14) 61.67% (14) 62.02% (14) 61.95% (14)
-------------------------------------------------------------------------------------------
(RANK) in parenthesis
Key:
* Below Avg. = Prediction% in games where at least 1 team
was below the 1987-2011 average in continuity
* Bottom 1/2 = % in games where at least 1 team was in the
bottom half of 1987-2011 teams in continuity
* Bottom 1/3 = % in games where at least 1 team was in the
bottom third of 1987-2011 teams in continuity
* Bottom 1/4 = % in games where at least 1 team was in the
bottom fourth of 1987-2011 teams in continuity
* Bottom 10% = % in games where at least 1 team was in the
bottom tenth of 1987-2011 teams in continuity
* Less than 50% = % in games where at least 1 team had a
continuity score under 50%
As the sample narrows to games featuring teams with less and less continuity, some trends emerge. First, the adjusted plus-minus-based metrics (Old & New SPM, APMVAL), while clustered near the top of the overall rankings in all games, seem to decline in predictive accuracy when asked to handle teams composed of players in new roles and/or situations.
Meanwhile, a metric like Tom Thibodeau's, which isn't especially accurate across all teams, appears to become a more effective predictor when low-continuity teams are involved. "Thibodeau" ranked just sixth out of fourteen metrics overall, but rose to second in games featuring teams in the bottom quarter in continuity, and was first in predicting games involving the bottom ten percent of teams in continuity.
The metric that emerged most strongly from these tests, though, is Alternate Win Score. AWS was devised by David Lewin & Dan Rosenbaum for their 2007 paper "The Pot Calling the Kettle Black," as an answer to Berri's original version of Win Score. Lewin & Rosenbaum showed that AWS outperforms various advanced metrics in terms of predicting future wins, a finding that this research seems to further reinforce. AWS was the second-most effective overall predictor of future wins, and unlike SPM or APMVAL, it lost little of its predictive power when asked to assess teams that saw heavy personnel turnover from the previous season.
For these reasons, I have to conclude that Alternate Win Score is the "best" basic linear-weighted metric currently in the public domain. Obviously a more complex, nonlinear metric such as WARP would be preferable to AWS, but if you need to assess player performance using a simple, back-of-the-envelope calculation, Alternate Win Score should probably be your first choice.
Neil Paine is an author of Basketball Prospectus.
You can contact Neil by clicking here or click here to see Neil's other articles.
|
Nice article.