Schmke, when you say "give less weight to matches between mismatched players, unless there is an upset of course" (italics added), is that really what you meant? Because a tank job looks exactly like an upset to the computer. I think the computer already throws out matches when the players/teams are more than 0.5 apart. I think you touched on a better idea for eliminating tanking, which to eliminate matches that are too far from the "expected range". It would be simple enough for the USTA to do some calculations to identify match results that are outside, say, a 95% frequency of occurrence, and eliminate them from the NTRP calculation on the suspicion of tanking or injury or other unreliable indicator of ability. Yes that would wipe out the occasional wonderful and wellearned upset, but for every one of those I suspect it would also eliminate 20 tank jobs.
For those wondering what the statistical difference is between a 3.68 and a 3.69, see the table below Using a methodology very similar to Schmke's (so I don't have everyone's true NTRPs, just my estimates), and a large database of over 20,000, here's what I calculate. I sure hope the table comes out legibly.
[Rating difference] [% won by higher player]
0.000.01 52%
0.010.05 55%
0.050.10 63%
0.100.15 69%
0.150.20 72%
0.200.25 77%
0.250.30 80%
0.300.35 83%
0.350.40 86%
0.400.45 89%
0.450.50 91%
I suspect the percentages for the upper categories are somewhat artificially depressed by tanking, but of course have no proof.
So the answer to 3.68 vs. 3.69 is one could expect the 3.69 to win about 52% of the time, based on this admittedly limited sample and imperfect NTRP calculation method.
For my third and fourth cents I'd agree that in a redesign I'd add some weighting for actually winning the match. But to base ratings solely on won/lost records would be much more flawed than even the current system, for all the reasons schmke points out.
