My Wife's UTR is Plummeting!

OnTheLine

Hall of Fame
Likewise. My wife hasn't played a match for over a week and she tells me her UTR falls by a tenth of a point daily.

I seriously believe that UTR does constant manual corrections. If you haven't played in a week and none of your opponents/partners have played then how can your rating change even a little?

I see this on particularly on weeks that are between seasons. Not one single match of any sort is going on in the area. (Note: we are a bit of an island. More than 250 miles to get to anywhere else, and no tournaments hosted in 6+ months so no, people are not traveling elsewhere to play a rated match) No one is playing for a 2 week period. And yet, that little number shifts. That is not organic. That is someone fussing with something.

This is very different from degradation over longer period of times. (12 months or more) Those also make no sense.
 

S&V-not_dead_yet

Talk Tennis Guru
I think it speaks to how much people also dislike USTA and the NTRP system. Willing to try something else despite the quirks.

Well, people didn't have a choice: if they played USTA, they also got a UTR gratis.

I think it speaks to how much Ellison wanted to get involved [I had ridiculously good seats one year at Indian Wells and I could see Ellison sitting at the same level about 100' away; he didn't return my wave].
 

S&V-not_dead_yet

Talk Tennis Guru
It’s been pretty conclusively proven that it can handle neither.

To be fair, our DNTRPs might be going through the same gyrations but since the USTA only updates them annually and publishes them only as a level [ie 4.5 rating vs 4.37 numerical], we just don't see it.

Then again, my UTR has gone up 200 basis points in the last year of inactivty so that's even weirder; if anything, I'd expect a decline.
 

travlerajm

Talk Tennis Guru
To be fair, our DNTRPs might be going through the same gyrations but since the USTA only updates them annually and publishes them only as a level [ie 4.5 rating vs 4.37 numerical], we just don't see it.

Then again, my UTR has gone up 200 basis points in the last year of inactivty so that's even weirder; if anything, I'd expect a decline.
I haven’t followed UTR since I watched my UTR go from 9, to 8, to 7, to 6, to 5, to UR. All after I stopped playing. Meanwhile, my TR dynamic ntrp froze after my last match, which seems more logical to me.
 

schmke

Legend
I haven’t followed UTR since I watched my UTR go from 9, to 8, to 7, to 6, to 5, to UR. All after I stopped playing. Meanwhile, my TR dynamic ntrp froze after my last match, which seems more logical to me.
UTR makes the assumption that if you aren't playing, your game is decaying. If you are looking at pros, collegiates, and juniors, that may be true, as their lack of tournament play does usually indicate an injury or something that perhaps justifiably expects a decline in performance upon their return to play. E.g. if they aren't injured, they are playing tournaments, conversely if they aren't playing tournaments they must be injured.

Clearly, this is not a good assumption with adults that have league seasons at certain times of the year and may not play in a league or tournament for long periods, but that does not mean they are injured or aren't playing. So perhaps the wild adjustments some players see is due to this decay among related players, or it is possible UTR makes adjustments to compensate for the decay for adults and those adjustments are somewhat coarse grained adjustments that are being observed.
 

travlerajm

Talk Tennis Guru
UTR makes the assumption that if you aren't playing, your game is decaying. If you are looking at pros, collegiates, and juniors, that may be true, as their lack of tournament play does usually indicate an injury or something that perhaps justifiably expects a decline in performance upon their return to play. E.g. if they aren't injured, they are playing tournaments, conversely if they aren't playing tournaments they must be injured.

Clearly, this is not a good assumption with adults that have league seasons at certain times of the year and may not play in a league or tournament for long periods, but that does not mean they are injured or aren't playing. So perhaps the wild adjustments some players see is due to this decay among related players, or it is possible UTR makes adjustments to compensate for the decay for adults and those adjustments are somewhat coarse grained adjustments that are being observed.
If that’s the case, then it seems like the algorithm is trying to see things that aren’t necessarily there.

It seems like it would be easy to tweak it so that rating decay can be accounted for at the highest levels, with decay less relevant as the level goes down. By the time you get to to rec player range, inactivity in the usta computer should not change your rating.

This fix alone would give UTR a lot more credibility. Right now, half the players in their database have UR next to their name because UTR ignores ratings for players who aren’t active. This creates an interently unstable system and is why the ratings implode.
 

S&V-not_dead_yet

Talk Tennis Guru
UTR makes the assumption that if you aren't playing, your game is decaying. If you are looking at pros, collegiates, and juniors, that may be true, as their lack of tournament play does usually indicate an injury or something that perhaps justifiably expects a decline in performance upon their return to play. E.g. if they aren't injured, they are playing tournaments, conversely if they aren't playing tournaments they must be injured.

Clearly, this is not a good assumption with adults that have league seasons at certain times of the year and may not play in a league or tournament for long periods, but that does not mean they are injured or aren't playing. So perhaps the wild adjustments some players see is due to this decay among related players, or it is possible UTR makes adjustments to compensate for the decay for adults and those adjustments are somewhat coarse grained adjustments that are being observed.

Chess ratings used to follow the same decay concept but they finally switched to freezing ratings, possibly due to too many people sandbagging by not playing formal tournaments and then entering a big money tournament and smashing the competition.
 

ChaelAZ

G.O.A.T.
Maybe it is just my game decaying with my age and body, so UTR is just following suit and the truth sucks?

I do think there was some odd manual adjustment recently, watching no just my UTR drop .5, but all the people I have played this season dropping as well. I am not saying I somehow deserve to be a 4.5 or they are totally misrepresenting everything, but just noting something is whack there. But also, since NTRP is kinda whack, and UTR is using those matches (and maybe something in the USTA rating to somewhat match leveling for 3.0-4.5 lower level rec?) that is also creating issues like being seen in NTRP. I dunno. Just odd to see big shifts like that to me.
 

kingcheetah

Hall of Fame
I think that UTR is great for juniors and people playing lots of tournaments, but it can get wonky with leagues. Mine is UR currently, but I'm a 4.5-5.0 NTRP. Most of my friends that have a rating right now have seen it drop. It might defeat the purpose, but I think that counting mixed doubles matches can harm UTR inaccurately, I saw mine drop from some mixed I was playing for fun.
 

Moon Shooter

Hall of Fame
I am fairly new to tennis but have been a chess player for a while and have looked into ratings a bit. The ratings in chess are really accurate - especially for the bulk of players but it may not be the best indicator at the very highest levels (for various reasons having to do with draws and best play that really shouldn't apply to tennis). I was surprised that Tennis took so long to use a Elo type of system. I would just make a few observations that I think are worthwhile.

The accuracy of an Elo rating system should be tested by how well it predicts outcomes. If I am rated 3.5 and I play 50 games with someone else that is 3.5 we should each win 25 games. If that doesn't happen the system is not that accurate. This seems the only objective way to evaluate the accuracy of any rating system. So saying these people are all 4.0 under this rating system but then they have drastically different ratings under the UTR system simply means that one or both of the systems is not accurate. Also UTR does not only count the results of matches but each game! So if you won your matches but still had a rating drop that can happen if you lost more games in those matches than your rating predicted.

It is unclear to me that the NTRP system even has determining strength as a main goal. (and by that I mean is it trying to accurately predict outcomes of games) It seems they also have a goal of setting out a roadmap for what armature players can do to improve. So I seem to be a 3.5 player and the things they say I need to work on are in fact fairly accurate. Lots of people in my range have many of the issues they generally identified. Second serve isn't good, overhead needs work, etc. But some pros might be saying the same thing about their own second serve and their overheads. So clearly this is relative and subjective. Wins and losses are objective data. Is that data enough to accurately predict future wins and losses. In chess it can and it does it very well.

Here are some issues that *may* make tennis and chess different:

1) Young chess players advance very quickly and only play rated games during a specific season. Talk to any adult chess player and the last thing they want to do is play a kid who hasn't played a rated game in 6 months or so. The kid may have a rating of X from last season but there is a good chance that his strength went up considerably since then. But if they didn't play rated games their rating won't keep up with that improvement and he or she likely improved quite a bit more than I did since my last rated game. So playing up and coming kids after they haven't played a rated game recently tends to hurt your rating. I would think this would be an issue for tennis as well. The younger people play rated games during the season when they were 14 and then you all of a sudden play that kid as a 15 year old with the rating they had at 14 and well your results may not be good. But I don't know youth tennis well enough if it is a year round activity.

2) As was pointed out in some cases lack of play in tennis may be due to an injury and that could effect play. Especially at the highest levels. In chess the decline with age is a bit slower. So there is no need to drastically reduce ratings due to inactivity. I think the analysis by travelerajm and Schmcke above identify this problem accurately. Even if I don't play for a year I may need to get some rust out but after a few practices my game shouldn't be that much relatively worse. But at the highest levels? It does seem to perhaps have a bigger impact especially when combined with an injury or getting older past a certain age. It may be that UTR has to make a choice on how they want to address this. But I agree it makes no sense to assume that your average amature 35 year old is not playing tennis just because they had no rated games in a few years. They may have been playing and improving all the while moreover even if they didn't play they can likely shake the rust off after a few months of getting back into it so they may be competitive with their former level. Federer's lack of play may have a bigger effect on his competitiveness with Djokovic or Nadal much more. I can't really say for sure and I am not exactly sure what UTR does to your rating due to lack of play. But clearly there are some trade offs they need to consider and hopefully they are not just being lopsided against amateurs. In other words is the predictive power of these ratings at the pro level much stronger at the expense of the armature level? If so then they are not really being universal in the way they are calculating this. If the two groups are so different then perhaps they shouldn't even try to combine them.

3) In chess there is no upper end to the scale. I am not exactly sure all the effects this might have. But in chess they can tell you if you are say 200 points higher than some other person then you should beat them 3 out of 4 games. If you cap the upper end I am not sure this is possible. If the highest rated player 20 years from now is in fact considerably stronger (or weaker) than the top players of today then the difference in ratings will not always translate to winning a certain percentage of games in the same way.

4) Chess and tennis both have sandbagging but in chess it is just for those who play in money tournaments. Because Chess has such a good rating system that accurately predicts outcomes of games people value their rating quite a bit. So it is only a few people sandbagging. But of course this needs to snowball. The more people attach utr ratings to their games the more accurate the ratings will be and the more valuable it will be to tennis players. But sandbagging seems a bit harder to define in tennis. If someone is trying to work on a certain shot or tactic in a match to improve instead of just trying to win as best they can is that sandbagging? I think there may be a line to draw here but it is not clear. In chess you don't bring a new opening to a rated game until you think it is ready for prime time so to speak.

I have a few other thoughts but I think this is long enough.
 

schmke

Legend
The goal of NTRP as stated by the USTA is (my paraphrasing) to promote competitive and compatible play, effectively they want two players of the same level to be "compatible". Compatible does not mean each player will win 50% of the matches, rather just that they aren't of dramatically different abilities and can have a reasonably good match.

Note of course that the USTA publishes ratings only to the half point while they calculate them to the hundredth, and they go so far to (somewhat sillily) say that a 6-0,6-0 score is not unexpected if a top of level player plays a bottom of level player. That doesn't sound "compatible" to me, but that is their definition.

Regarding the NTRP algorithm, it sort of Elo based in that for each match it has an expected result and a players rating will go up or down based on doing better or worse than what is expected. NTRP does use the score/games and so it is not just a discrete taking/losing points based on winning or losing like pure Elo does, but one can move up or down varying amounts based on how different the actual score is from the expected.

Now, while the USTA doesn't publish the ratings to a hundredth, I have spent way too much time understanding the algorithm and have come up with a fairly accurate and reasonable recreation of it with my Estimated Dynamic NTRP Ratings (I'm not allowed to post links promoting my site, but Google "schmidt computer ratings ntrp" and you should find my FAQ on my blog and you can read more about my ratings there) and I do periodically look at how well the detailed ratings do in predicting match winners. When I've done it, it is more or less what you'd expect with the higher rated player/pair winning a higher percentage of the time the farther apart the ratings are.

For example, when I looked at it a few years ago, specifically looking at how often the favorite wins a match grouped by different gaps between the players, it revealed this which looks very much like you'd expect.

GapWinning %
0.00 - 0.05
53%​
0.05 - 0.15
63%​
0.15 - 0.25
75%​
0.25 - 0.35
84%​
0.35 - 0.45
90%​
0.45 - 0.55
93%​
0.55 - 0.65
95%​
0.65 - 0.75
96%​

I have not done nor have I seen this sort of analysis done for UTR, but it would be interesting to see if it is similar.

The fact that the percentages above match what one would intuitively expect doesn't necessarily mean NTRP is right or good, but it is certainly a factor in assessing how well a rating system works. If the percentages didn't make sense, it would certainly be a negative for the algorithm.
 

Moon Shooter

Hall of Fame
Schmke

Thanks. That is exactly the sort of predictive testing that I think should be used to determine if the system is accurate.

Am I correct in thinking that (even if we assume there is not local variation or sandbagging) players will be matched with other players at say 3.5 and under will beat them 90-93% of the time? Like you said the predicted outcome should be a double bagel if someone is in the same rating but has computer ratings at either end 3.01 and 3.49.

I would be interested in the results at different breaks. So if a 3.45 (a 3.5)player plays a 3.55 ( a 4.0)player how that works out. Do people get automatically demoted as often as promoted?

I agree the Utr should be tested in the same way. Of course at the amature adult level it is not as Widely used so it shouldn’t be as accurate. And the tournaments probably need to allow large groupings of players as the system takes hold. But If it is working correctly I expect more accuracy for college players.

The ntrp does not publish the computer ratings correct? So even assuming no local variability or sandbagging players will not see any improvement in their rating until they improved enough to predictably double bagel themselves since their last increase. I think this is a huge problem.
 

schmke

Legend
The table I shared is not strictly between players of the same level, but just the gap between players who played. So it already includes 3.5s playing up in a 4.0 league where the 3.5 faces a 4.0.

But yes, in practice, a top of level player beats a bottom of level player 90+% of the time according to this analysis. And that is consistent with the USTA saying that a 6-0,6-0 score is not unexpected in this case.

Does this make the NTRP levels too broad? In populated areas you could make that argument, and in Georgia, they actually have "low" leagues for I believe 3.5 and 4.0 where there is a separate non-advancing league for players rated 3.01-3.25 (3.0 low) and 3.51-3.75 (4.0 low).

But in smaller areas, splitting the levels like this would risk not having critical mass to have a flight of teams. There are already quite a few reasonably sized cities where even at the 3.0 to 4.0 levels there are only a handful of teams in a league, and if the levels were split in half, they may not be able to even field teams for a local league. So as a result, the USTA keeps their levels as-is.

Could there perhaps be a modification or re-scaling of ratings so instead of splitting each level into two levels, they took say two levels and split it into three? Or effectively change the levels so there is (what is today) a 0.4 range for a level rather than 0.5? That may be doable without killing league play in smaller areas, and it probably would result in slightly more competitiveness within a level, but I doubt the USTA does that with NTRP. It is more likely any shakeup to ratings like this would occur if/when they introduce WTN for league play.

And yes, the USTA only publishes the half-point levels, not the detailed rating, so a player doesn't official know they've improved until they cross the threshold to the next level and get bumped up. And this only happens at the end of the year so a player is really in the dark about how they are doing. This has led to things like my blog and estimated ratings and other sites that do similar, to fill the void the USTA has created by shrouding the detailed ratings in secrecy, and give players some insight into how they are doing.
 

FIRETennis

Professional
The table I shared is not strictly between players of the same level, but just the gap between players who played. So it already includes 3.5s playing up in a 4.0 league where the 3.5 faces a 4.0.

But yes, in practice, a top of level player beats a bottom of level player 90+% of the time according to this analysis. And that is consistent with the USTA saying that a 6-0,6-0 score is not unexpected in this case.

Does this make the NTRP levels too broad? In populated areas you could make that argument, and in Georgia, they actually have "low" leagues for I believe 3.5 and 4.0 where there is a separate non-advancing league for players rated 3.01-3.25 (3.0 low) and 3.51-3.75 (4.0 low).

But in smaller areas, splitting the levels like this would risk not having critical mass to have a flight of teams. There are already quite a few reasonably sized cities where even at the 3.0 to 4.0 levels there are only a handful of teams in a league, and if the levels were split in half, they may not be able to even field teams for a local league. So as a result, the USTA keeps their levels as-is.

Could there perhaps be a modification or re-scaling of ratings so instead of splitting each level into two levels, they took say two levels and split it into three? Or effectively change the levels so there is (what is today) a 0.4 range for a level rather than 0.5? That may be doable without killing league play in smaller areas, and it probably would result in slightly more competitiveness within a level, but I doubt the USTA does that with NTRP. It is more likely any shakeup to ratings like this would occur if/when they introduce WTN for league play.

And yes, the USTA only publishes the half-point levels, not the detailed rating, so a player doesn't official know they've improved until they cross the threshold to the next level and get bumped up. And this only happens at the end of the year so a player is really in the dark about how they are doing. This has led to things like my blog and estimated ratings and other sites that do similar, to fill the void the USTA has created by shrouding the detailed ratings in secrecy, and give players some insight into how they are doing.

What are your thoughts about UTR and how it applies to the amateur tennis level of tennis UTR 4 - UTR 10 compared to NTRP 3.0 - 5.0 ?
 

Moon Shooter

Hall of Fame
Schmke

Thank you for posting I think you do very interesting work on these topics.

The table I shared is not strictly between players of the same level, but just the gap between players who played. So it already includes 3.5s playing up in a 4.0 league where the 3.5 faces a 4.0


It may include some but I think the split between classes would be severely underrepresented since the standard tournament lines are artificially set at that break. I would not be surprised if the players that are rated 3.45-3.49 (a 3.5 player) would beat a 3.51-3.55 (4.0 player). I think the same would apply at the 3.0 versus 3.5 player level break. I think that might happen for several reasons for example someone self reports as a 3.5 or a 4.0 they lose matches and that brings them down in their category but not enough to drop them down a whole level, (who knows what needs to happen before you are dropped down?) but bad enough that they no longer enjoy losing so stop playing rated tournaments and still keep their 3.5 or 4.0 rating. And even beyond the insanity of self rating I am still not clear if people get bumped down by the same criteria as getting bumped up. Do we even know the criteria for either case? Is it as soon as someone hits 3.51 they get a "strike" or do they actually have to hit 3.55 before they get a strike? Is it the same for getting bumped down? If you haven't then how do you know what the actual computer rankings of people are? You would be assuming that the games lead to a 3.51 or higher rating for a 4.0 player when in fact the games may have lead to a 3.40 rating but they still didn't get dropped because it was within the tolerance of the ratings committee. So that 4.0 player may actually be losing about 63 percent of his games to a 3.5 player and that doesn't include the "tolerance" up for a 3.5 players!

The USTA refuses to say but I wonder if you have reversed engineered this.


"25. How high can my dynamic rating go before I earn a “strike”? The Dynamic NTRP system allows a certain tolerance for player improvement— more for lower level players where rapid improvement is more likely; less for higher-level players. The specific improvement factor is not published because of concerns that individuals, captains or others may attempt to manipulate their ratings."


So not only are the strikes a mystery but how the appeals are handled is a mystery. You get bumped up by some mysterious 3 strikes and then appeal it. What is the criteria for winning the appeal? Is it objective or subjective? But of course for the integrity of the entire system this break between levels is the important one. If players are allowed to have computer rankings due to how the strikes are issued or appeals are made that are considerably higher than the stated ranking then your the difference could be considerably higher than 93%. This combined with the importance of self reporting itself for those who don't play competitively. Or how players might team up with a player that self reports high until they get bumped down and that could artificially lower their rating makes me suspect that the NTRP is largely none-sense on stilts.

But yes even if we ignored all these problems with the opacity of the system and how it might be gamed even if it were all transparent and fair then sure a large league in Atlanta is going to be able to choose 3.40-3.49 players that will reliably beat teams from smaller areas. I mean in chess if you knew your teams average rating was 1400 and you were competing against a team that had an average rating of 1650 then sure you would expect to lose. Now smaller leagues may win from time to time because the captain knows how to game the system but by and large this sort of event has limited potential. Any sensible person is not going to be especially motivated to try to win a tournament when they know they are by definition competing against people not very good. The fact that the system can be gamed makes it worse. Compare this to how a legitimate (transparent and proven) and objective rating system might motivate people and I think it is obvious the legitimate rating system is the better route to motivate the vast majority of players.

In tennis (as with chess) you might be playing with the same group for years. And you find you are not beating anyone more often in that same group or some of them are beating you. This may lead you to think you are not improving or even getting worse. When in fact you are getting better it is just that the people you play with are getting better as well. That is why I tell people that play chess to get a rating. That is an objective measure of improvement that can motivate them to continue. In fact it is by far a bigger motivating factor for most chess players than the money tournaments that have rating classes. This NTRP system with its opacity and all the incentives to sandbag as opposed to increase ratings is about as demotivating as you could possibly design. I mean people should recognize that there is something wrong with a system that talks about all sorts of safeguards against having an artificially low rating but very few about having an artificially high rating. Shouldn't most people be focused on improving their rating instead of lowering it?

BTW I think the UTR has big issues as well. The commercials are Orwellian and there are very valid concerns raised in this thread that they should address in the open. I have not seen any sort of real nuts and bolts discussions. All of the interviews I have heard have been fluff and claims without evidence. They say their system is better but I am not seeing them announce predictions *in advance* that would contradict the other systems and demonstrate they have a superior rating system. It doesn't help that "oracle" runs the algorithm.
"The oracle says your rating is 5.67"
"How does the oracle know that?"
"I'm from Harvard and I was involved plus we consulted the oracle."

I realize they may not want to give the nuts and bolts of everything. But transparency is very important to legitimacy. The chess ratings have huge legitimacy because they are transparent and their predictive capacities have been demonstrated again and again. Many amateur chess players greatly value their chess rating and it motivates them to play and improve perhaps more than anything else an organization can do. Ratings play a key role in titles etc. It is only when you have a legitimate and objective rating system that people will care about it but once you have it you have a very strong tool for motivating players.

That said I do support UTR because it is proposing a model that, if done right, has a chance of being transparent enough to be legitimate. And I am willing to give them some slack early on as they are trying to work out the kinks. The adult amateur game is different than college, youth and pros. They will have some hard decisions to make. Plus COVID hit at exactly the wrong time for them. I just wish they were more transparent.
 

schmke

Legend
Lots of stuff to comment on. I'll select a few items :)

I don't think play between 3.4x and 3.5x is under-represented for a few reasons.

First, players do play up. Many teams have a player or two, a few teams even have around half the roster filled with players playing up. This certainly results in some matches between <=3.50 and >3.50 players.

Second, and more importantly, a 3.5 is necessarily in the range 3.01-3.50 only at the end of the year for which they obtained that rating. As they play matches in the following year, their dynamic rating will change (the USTA calculates dynamic ratings every night) and so a 3.4x who has a few good results to start the year very well may be 3.5x and then even a 3.5 match against a 3.4x becomes a 3.5x vs a 3.4x. And some players dynamic rating will go even higher than that during the year so 3.6x vs 3.3x (or lower) in a 3.5 match is not uncommon.

So the head to head ratings in a match are not as compartmentalized as you think.

For this next section I think it is useful to differentiate between the rating to the half point that the USTA publishes, and the dynamic rating to the hundredth that is calculated daily but not published. I refer to the former as a player's level, and the latter as their rating.

Regarding self-rates, they have no rating to start, and only obtain a rating as they play matches against other players with ratings. Their self-rating is just used to determine what level they can play, not an actual starting rating. This is different from some Elo based systems where players are given a default rating and if it is inaccurate can lead to point inflation/deflation or scenarios like you describe. So self-rates with NTRP don't have this issue, at least not to the degree a system that gives players default ratings does.

That doesn't mean players aren't getting better or declining and naturally their rating and level will be a trailing indicator of that. The published NTRP level can be significantly off both because it is so general, but also because it is only updated/published yearly (unless there is a pandemic and the USTA makes the wrong decision to not publish, or unless the publish early start lists for early start leagues, but oh yeah, they did away with that too), but even the dynamic rating can be a lagging indicator for someone who has improved a lot and continues to do better than expected. But this is the same with any rating algorithm for the most part, short of an algorithm that would look at recent trends and extrapolate where they are headed and preemptively establish that as their current rating.

Regarding strikes, the USTA is very opaque about the thresholds, but experience and some documents (that may be dated) one can find would seem to indicate 2.5s can play more or less a full level above their current level and not be DQ'd, and the threshold gets smaller as the level goes up but even a 4.5 can likely play a few tenths above their level without being DQ'd. The rationale is that the USTA doesn't want to punish someone who self-rated fairly, but naturally improves. That natural improvement is anticipated to be quicker at lower levels, thus the relatively higher thresholds.

Of course, when they don't publish at 2020 year-end, that creates a glut of self-rated players that now have two years to improve increasing the chances DQ's will happen, and as has been discussed here and on my blog, the DQ rates do seem to be up.

There are two types of appeals, automated and manual. Automated appeals do have objective criteria (although the USTA may change it from year to year and is again pretty opaque about what it is) and if a players rating after the year-end calculations are done meets the criteria (say, no more than 0.05 above the bottom of the level), they can click a button on TennisLink and have the appeal granted. Separately, someone can file a manual appeal that is reviewed by a district/section rep or committee, these are generally done for medical appeals or self-rate appeals where the guidelines/questionnaire slotted a player at what is believed to be the wrong level.

If you haven't yet, find my FAQ as it covers parts of this and more.
 

Moon Shooter

Hall of Fame
Lots of stuff to comment on. I'll select a few items :)

I don't think play between 3.4x and 3.5x is under-represented for a few reasons.

First, players do play up. Many teams have a player or two, a few teams even have around half the roster filled with players playing up. This certainly results in some matches between <=3.50 and >3.50 players.

When you say players "play up" you mean players that have a computer rating at or below their levels maximum playing against others who are a level above right. Or are you saying people can "play up" when their actual computer dynamic rating is actually what people in the higher level are allowed to have as well? See it seems that there is considerable overlap both upwards and downwards between level and computer rating. What this actually means as far as how the ratings may be distorted will depend on what the USTA does at year start with peoples computer ratings that are above or below what they "should" be for their level. And by "should" be for that level I mean the notion that the computer rating 3.51-4.00 means someone should be level 4.0 and someone with a computer rating of 3.50 to 3.01 should be level 3.5. Until we know how this works I am not sure this helps or hurts.

Second, and more importantly, a 3.5 is necessarily in the range 3.01-3.50 only at the end of the year for which they obtained that rating. As they play matches in the following year, their dynamic rating will change (the USTA calculates dynamic ratings every night) and so a 3.4x who has a few good results to start the year very well may be 3.5x and then even a 3.5 match against a 3.4x becomes a 3.5x vs a 3.4x. And some players dynamic rating will go even higher than that during the year so 3.6x vs 3.3x (or lower) in a 3.5 match is not uncommon.

So the head to head ratings in a match are not as compartmentalized as you think.


Thanks for taking the time to explain some of this. I have a few questions.

Lets say someone plays as a 3.5 and during the year and their dynamic rating goes to 3.5plus X. Now if x is under some threshold they do not even get a strike. But if X is above a certain threshold, do they get a strike right away or do they get it at the end of the year? Maybe they get notified only at the end of the year? Will they get a new strike after every tournament that ends with them having a dynamic rating above a certain threshold or will they only get a strike if 1) the performance rating for that tournament is above the threshold and 2) after the event they have a dynamic rating above the necessary threshold. So if someone has 3 matches where their performance rating was above a certain threshold and after each of those matches their dynamic rating was above a certain threshold then they would get all three strikes in one year and be informed at the end of the year that they are disqualified. This seems to be what is happening more now because they are combining years right?

"For this next section I think it is useful to differentiate between the rating to the half point that the USTA publishes, and the dynamic rating to the hundredth that is calculated daily but not published. I refer to the former as a player's level, and the latter as their rating.

Regarding self-rates, they have no rating to start, and only obtain a rating as they play matches against other players with ratings. Their self-rating is just used to determine what level they can play, not an actual starting rating. This is different from some Elo based systems where players are given a default rating and if it is inaccurate can lead to point inflation/deflation or scenarios like you describe. So self-rates with NTRP don't have this issue, at least not to the degree a system that gives players default ratings does."

Ok I am not sure I understand how this would work unless the first and only match with a new self reported player never counts for anyone except the self reported player. For example lets take a doubles match. Lets say Team 1 has player A and Player B. Player A is new and self rated. Player A rates himself as a level 3.5 (played in a bad high school program 15 years ago but was on the top doubles team) is out of shape and really a 2.5 rated player. Player B on Team 1 has a computer rating of 3.45. Now team 2 both have computer ratings of exactly 3.22. They play a match. And lets say they all try their best and the outcome is whatever it would take for player A on team 1 (the new player) to get a 2.41 performance rating. Obviously player A really hurt his team and is just glad its over and decides not to play any more matches.

So now what happens to the ratings on team 2 and player B from team 1? If the outcome is exactly what you would expect given the new players new provisional rating then their scores shouldn't be effected at all right? And since you are assigning that new player's rating at that time that will always be the case. So are you saying no ones rating is ever effected when a new self rated player is involved in single match except for the self rated player? Let's say player A from team one plays one more match (and only one more match) and no one from the first match is involved. If player A performs above 2.41 in that second match will player B's rating go down and team 2's rating go up? If player A performs worse in his second match, then player B's rating goes up and team 2s rating goes down? Or will no one but player As rating be effected until player A gets an established computer rating? If he does play enough to get that established rating will that retroactively effect the prior teams ratings?

That doesn't mean players aren't getting better or declining and naturally their rating and level will be a trailing indicator of that. The published NTRP level can be significantly off both because it is so general, but also because it is only updated/published yearly (unless there is a pandemic and the USTA makes the wrong decision to not publish, or unless the publish early start lists for early start leagues, but oh yeah, they did away with that too), but even the dynamic rating can be a lagging indicator for someone who has improved a lot and continues to do better than expected. But this is the same with any rating algorithm for the most part, short of an algorithm that would look at recent trends and extrapolate where they are headed and preemptively establish that as their current rating.

Regarding strikes, the USTA is very opaque about the thresholds, but experience and some documents (that may be dated) one can find would seem to indicate 2.5s can play more or less a full level above their current level and not be DQ'd, and the threshold gets smaller as the level goes up but even a 4.5 can likely play a few tenths above their level without being DQ'd. The rationale is that the USTA doesn't want to punish someone who self-rated fairly, but naturally improves. That natural improvement is anticipated to be quicker at lower levels, thus the relatively higher thresholds.

Wow! So in 6.0 doubles the 2.5 player could be an entire level higher and the 3.5 player could be a few tenths higher? Obviously a properly leveled team even if they are at the very top of the bracket 2.49 and 3.49 would have very little chance at all of winning according to your calculations. You are saying between .25-.35 gives a 83% chance of winning and a level up gives a 93% chance. Am I looking at that correctly?
 

Moon Shooter

Hall of Fame
Of course, when they don't publish at 2020 year-end, that creates a glut of self-rated players that now have two years to improve increasing the chances DQ's will happen, and as has been discussed here and on my blog, the DQ rates do seem to be up.

There are two types of appeals, automated and manual. Automated appeals do have objective criteria (although the USTA may change it from year to year and is again pretty opaque about what it is) and if a players rating after the year-end calculations are done meets the criteria (say, no more than 0.05 above the bottom of the level), they can click a button on TennisLink and have the appeal granted. Separately, someone can file a manual appeal that is reviewed by a district/section rep or committee, these are generally done for medical appeals or self-rate appeals where the guidelines/questionnaire slotted a player at what is believed to be the wrong level.

If you haven't yet, find my FAQ as it covers parts of this and more.

I agree that the delays are a problem (and hard to excuse) but I also wonder if there are problems beyond that so I am not too focused on that short term issue.

I read the faq and thank you for publishing it. I also read some of your sample reports. In your reports you indicate a "prior" and a "start" rating. Now the prior rating would be the rating right before their last event that lead to the players current rating right? I noticed the "start" rating is never above the level - at least in the samples you offer but sometimes it exactly at the level. So that was something else I was wondering about.

Lets say someone plays a tournament and it puts them above the computer/dynamic rating for their level but that is the last tournament for the year for them. So next year do they start with, say, 3.5 plus X or do they get reset to 3.50 even (assuming 3.50 is the lowest allowed rating for their level)? Now lets say X is high enough for them to get a strike. If the dynamic rating is not reset then do they get another strike every year even if they are not playing any tournaments? If not then lets say 2 years later they play one more match. Does that match have to yield a result below the threshold or they get another strike? If they don't keep getting strikes every year then they can simply not play rated games until the year they want to play in a big tournament and then they will not get additional strikes as they improve correct?

If the rating does reset to the level they are playing every year that is, of course, worse.

Does the rating reset if they win the appeal and get to stay at that level? (And by "reset" I mean their computer/dynamic rating goes to 3.50 or 4.00 or whatever the maximum for their level is.) If it doesn't and they keep playing at about the same level will they be DQed again? In that case if you win an appeal then you need to immediately start performing worse or your will be DQed again. If it is reset for you is it reset for your competition and teammates that you play with and against there after?


And I wonder about what happens to people who perform worse than their level. What is the threshold to tell them they have been demoted? And does that reset every year. (here the reset would be to put the computer rating/dynamic rating at the minimum needed to remain at their level) So if someone claims to be a 3.5 but through the year they get a record that demonstrates they are a 2.73 player will they get a strike? If they don't do they get reset at 3.01 at the start of the year so anyone that pairs with them in doubles will effectively have their rating reduced?

It seems to me that all these rules with thresholds for strikes combined with presumably even further thresholds for DQs that are appealed, and further combined with unknown thresholds for demotion can distort things pretty well at the important/officially published break lines. I mean it seems you can have a rating a few tenths of a point above 3.50 and be officially considered a 3.5 level player and a few tenths of a point below 3.50 and be officially considered a 4.0 level player. If that is correct then instead of the official 4.0 level player reliably double bageling the official 3.5 level player this is reversed. The official 3.5 level player will reliably double bagel the officially 4.0 level player and neither will be promoted or demoted. Am I seeing this correctly?


I am not sure I see what benefit there could be in such a rating system but obviously such officially skewed views of player skills could negatively effect amateur enthusiasm to improve their official rating. It seems the only people that 1) really understand this system and 2) still care about it, are those trying to game the tournaments where there are supposed to be caps on tennis ability. I won a tennis tournament against people that should have been disqualified if they were good at tennis. I mean we have that in chess as well and sometimes there is decent money involved at lower levels. But it is not motivating enough for enough people to dominate the rating system. For the most part the vast majority of chess players are very motivated to improve their rating because they know higher rating equates to higher skill. The vast majority of big concerns in chess ratings are about people cheating or doing things that will artificially inflate their rating. The main concerns in tennis (at least as far as the faq goes) seem to involve safeguards against people trying to artificially lower their rating. Am I getting this right? Does this not seem perverse to you?
 

schmke

Legend
When you say players "play up" you mean players that have a computer rating at or below their levels maximum playing against others who are a level above right. Or are you saying people can "play up" when their actual computer dynamic rating is actually what people in the higher level are allowed to have as well? See it seems that there is considerable overlap both upwards and downwards between level and computer rating. What this actually means as far as how the ratings may be distorted will depend on what the USTA does at year start with peoples computer ratings that are above or below what they "should" be for their level. And by "should" be for that level I mean the notion that the computer rating 3.51-4.00 means someone should be level 4.0 and someone with a computer rating of 3.50 to 3.01 should be level 3.5. Until we know how this works I am not sure this helps or hurts.
Playing up is when a player with say a 3.5 level plays on a 4.0 team. The USTA's rules allow this, generally speaking a team's roster must be at least half at-level players, but some areas or clubs will be stricter and require more players to be at-level. But as a result, you will have 3.5 level players playing 4.0 level players more than you might otherwise think. Of course, at the time of the match, the current dynamic rating for the 3.5 could be higher than the 4.0s current dynamic rating, or there could be a huge disparity between their ratings.

Lets say someone plays as a 3.5 and during the year and their dynamic rating goes to 3.5plus X. Now if x is under some threshold they do not even get a strike. But if X is above a certain threshold, do they get a strike right away or do they get it at the end of the year? Maybe they get notified only at the end of the year? Will they get a new strike after every tournament that ends with them having a dynamic rating above a certain threshold or will they only get a strike if 1) the performance rating for that tournament is above the threshold and 2) after the event they have a dynamic rating above the necessary threshold. So if someone has 3 matches where their performance rating was above a certain threshold and after each of those matches their dynamic rating was above a certain threshold then they would get all three strikes in one year and be informed at the end of the year that they are disqualified. This seems to be what is happening more now because they are combining years right?
Only self-rated and appeal players are subject to 3-strike DQ;s. So just to be clear, if a player with a 2019 year-end level of 3.5 is playing now, they can't get strikes or be DQ'd. The 2019 year-end 3.5 is considered computer rated, or a 3.5C and will remain a 3.5 until 2021 year-end when the USTA publishes the new year-end levels and our hypothetical player would be bumped to 4.0C if their year-end rating falls in the range of 3.51-4.00.

For self-rated (3.5S) or appeal (3.5A) players, what you outline is correct. They may have their dynamic rating rise to 3.55, but if the strike threshold is 3.60 they do not get a strike. If their rating is over the strike threshold after 3 separate matches, they get a 3-strike DQ. It is only when the 3rd strike occurs that the player is notified, so they do not know when they may have accrued strikes. The DQ does happen right away though, it does not wait for year-end. If a player gets a 3-strike DQ, their level will change to 4.0D and they will remain that until year-end when year-end levels are published and they would likely then become a 4.0C.

And yes, the self-rated and appeal players that are improving are more likely to accrue strikes and be DQ'd since they now have 2 years during which their rating could surpass the strike threshold.

Ok I am not sure I understand how this would work unless the first and only match with a new self reported player never counts for anyone except the self reported player. For example lets take a doubles match. Lets say Team 1 has player A and Player B. Player A is new and self rated. Player A rates himself as a level 3.5 (played in a bad high school program 15 years ago but was on the top doubles team) is out of shape and really a 2.5 rated player. Player B on Team 1 has a computer rating of 3.45. Now team 2 both have computer ratings of exactly 3.22. They play a match. And lets say they all try their best and the outcome is whatever it would take for player A on team 1 (the new player) to get a 2.41 performance rating. Obviously player A really hurt his team and is just glad its over and decides not to play any more matches.

So now what happens to the ratings on team 2 and player B from team 1? If the outcome is exactly what you would expect given the new players new provisional rating then their scores shouldn't be effected at all right? And since you are assigning that new player's rating at that time that will always be the case. So are you saying no ones rating is ever effected when a new self rated player is involved in single match except for the self rated player? Let's say player A from team one plays one more match (and only one more match) and no one from the first match is involved. If player A performs above 2.41 in that second match will player B's rating go down and team 2's rating go up? If player A performs worse in his second match, then player B's rating goes up and team 2s rating goes down? Or will no one but player As rating be effected until player A gets an established computer rating? If he does play enough to get that established rating will that retroactively effect the prior teams ratings?
Matches with/against a self-rated player don't count towards the dynamic rating for the other players in the match unless the self-rated player has played a handful of matches and generated their own dynamic rating. Those early matches are supposed to get factored in as part of the year-end calculations and presumably look at where the self-rated player ended up to go back and calculate what that means for the other players in the match. This "going back" to recalculate prior matches does not happen during the year though.
 

schmke

Legend
I agree that the delays are a problem (and hard to excuse) but I also wonder if there are problems beyond that so I am not too focused on that short term issue.

I read the faq and thank you for publishing it. I also read some of your sample reports. In your reports you indicate a "prior" and a "start" rating. Now the prior rating would be the rating right before their last event that lead to the players current rating right? I noticed the "start" rating is never above the level - at least in the samples you offer but sometimes it exactly at the level. So that was something else I was wondering about.

Lets say someone plays a tournament and it puts them above the computer/dynamic rating for their level but that is the last tournament for the year for them. So next year do they start with, say, 3.5 plus X or do they get reset to 3.50 even (assuming 3.50 is the lowest allowed rating for their level)? Now lets say X is high enough for them to get a strike. If the dynamic rating is not reset then do they get another strike every year even if they are not playing any tournaments? If not then lets say 2 years later they play one more match. Does that match have to yield a result below the threshold or they get another strike? If they don't keep getting strikes every year then they can simply not play rated games until the year they want to play in a big tournament and then they will not get additional strikes as they improve correct?

If the rating does reset to the level they are playing every year that is, of course, worse.

Does the rating reset if they win the appeal and get to stay at that level? (And by "reset" I mean their computer/dynamic rating goes to 3.50 or 4.00 or whatever the maximum for their level is.) If it doesn't and they keep playing at about the same level will they be DQed again? In that case if you win an appeal then you need to immediately start performing worse or your will be DQed again. If it is reset for you is it reset for your competition and teammates that you play with and against there after?


And I wonder about what happens to people who perform worse than their level. What is the threshold to tell them they have been demoted? And does that reset every year. (here the reset would be to put the computer rating/dynamic rating at the minimum needed to remain at their level) So if someone claims to be a 3.5 but through the year they get a record that demonstrates they are a 2.73 player will they get a strike? If they don't do they get reset at 3.01 at the start of the year so anyone that pairs with them in doubles will effectively have their rating reduced?

It seems to me that all these rules with thresholds for strikes combined with presumably even further thresholds for DQs that are appealed, and further combined with unknown thresholds for demotion can distort things pretty well at the important/officially published break lines. I mean it seems you can have a rating a few tenths of a point above 3.50 and be officially considered a 3.5 level player and a few tenths of a point below 3.50 and be officially considered a 4.0 level player. If that is correct then instead of the official 4.0 level player reliably double bageling the official 3.5 level player this is reversed. The official 3.5 level player will reliably double bagel the officially 4.0 level player and neither will be promoted or demoted. Am I seeing this correctly?


I am not sure I see what benefit there could be in such a rating system but obviously such officially skewed views of player skills could negatively effect amateur enthusiasm to improve their official rating. It seems the only people that 1) really understand this system and 2) still care about it, are those trying to game the tournaments where there are supposed to be caps on tennis ability. I won a tennis tournament against people that should have been disqualified if they were good at tennis. I mean we have that in chess as well and sometimes there is decent money involved at lower levels. But it is not motivating enough for enough people to dominate the rating system. For the most part the vast majority of chess players are very motivated to improve their rating because they know higher rating equates to higher skill. The vast majority of big concerns in chess ratings are about people cheating or doing things that will artificially inflate their rating. The main concerns in tennis (at least as far as the faq goes) seem to involve safeguards against people trying to artificially lower their rating. Am I getting this right? Does this not seem perverse to you?
Lots above, but I'll try to be brief.

Dynamic ratings are calculated only from "advancing" adult (not mixed) league play, not from tournaments or "other" leagues. Tournaments and "other" league matches may be included in year-end calculations if the section they are played in has elected to include them. Players that play only mixed will get an "M" level at year-end, if a player plays at least three non-mixed matches, their mixed matches won't be factored in and they'll get a "C" level at year-end, or perhaps a "T" if they only played tournaments.

In my reports, "start" is the rating they started the year (rating period) at and prior is the rating they had before their most recent match. By definition, start will be in the range of their level, as start is effectively their year-end rating to the hundredth that led to the year-end level being published. There is no resetting to the top/bottom/middle of the range for their level.

In a way, you can think of the dynamic rating as the continually calculated rating to the hundredth that continues year to year, and the year-end level that is published is just a snapshot (with some year-end calculations the USTA does) at a point in time (generally early November) that establishes that players level for the following year/rating period.

And yes, while the very first match of the year may adhere to the 4.0 should beat the 3.5, after that, the "3.5" may very well have a higher dynamic rating than the "4.0" and be expected to win and have their dynamic rating go down if they don't.
 

Moon Shooter

Hall of Fame
Schmke

First and most importantly, thank you for your patience in explaining this!

Dynamic ratings are calculated only from "advancing" adult (not mixed) league play, not from tournaments or "other" leagues.


Ok I think what you are saying is there can be advancing mixed adult league play, but it is not rated. So players that play *only* mixed adult advancing league play and no other play that would be rated will always keep their self rating. So when people play in the USTA 6.0 mixed doubles national championship there may never be any actual checking of their true rating unless they lied on their original sheet and someone challenges it? Are they at least supposed to self report increases in playing ability before they sign up for a tournament like that?

I am assuming that mixed doubles are never rated because the USTA mens and the women's rating system are completely different and they don't feel like they can figure out how they might compare - or for whatever reason choose not to.


As far as self rating then as soon as you play a certain number of games you are then no longer self rated but computer rated. At that point the person is placed into their correct level and unless they appeal it they will simply go into the level the dynamic rating dictates their level with no tolerances up or down. I would assume it is a small percentage of players that have an appealed rating. The vast majority of players who play enough games to get a computer rating likely just let that dictate their level.

If you are playing in a rated league but one of the players in a doubles match is self rated and does not play a sufficient number of games in that year to get a computer rating that match will not effect any of the other 3 players ratings.

In your blog you say:

"Second, we can look at the number of players who have played in the same period. The counts by year are 126K in 2019, 124K in 2020, and 86K in 2021. Here there is a 31% drop from 2020 to 2021, basically mirroring the drop in team matches, so no surprise."

Is that the number of adults playing in USTA games that would be rated? Or does that include mixed doubles play and other non rated tournaments/competitions? I'm just curious how many players participate in rated play every year.
 

schmke

Legend
Players that play only mixed will get a rating, but it is an "M" rating based only on their mixed play. If they sign up for an adult league after that, they are required to self-rate again, and it is supposed to be no lower than their M rating. Whether TennisLink enforces that well or not, who knows.

So, if a 3.5S plays mixed on a 6.0 team and does very well, they could get a 4.0M rating at year-end. If they subsequently sign-up to play on an adult (same gender) team, they would be required to self-rate as a 4.0S and play on a 4.0 team. If they sign-up for the adult team during the same year as the mixed team before ratings are published, they are still a 3.5S and could play on a 3.5 team.

I believe the stats you quoted were all from 18+/40+ advancing leagues.
 
For the numbers people, this is interesting maybe, frankly not even my tennis player friends would care that much so I will share here for lack of a better place. I have no verified UTR events the past year and a half, also no singles results at all UTR or usta that same time period. I have tons of USTA doubles, one doubles tourney win and numerous USTA doubles wins, doubles state results. I usually get put at #1 or #2 doubles for my teams the past 3 seasons with rotating partners, but my partners all have 1 full doubles UTR higher than me, including the opponents, so for doubles the system probably thinks I am being carried. I'm not sure my team would feel that way, not that important though. My singles UTR has gone down 1.3 points in "decay". All my doubles partners and opponents are doubles rated 2 points higher now than my singles UTR, some of them 3 points higher. I'm going to play this weekend against kids with my UTR that I think should be 2 points higher. What I think doesn't matter unless I really focus on winning, I will be nervous knowing I need to win as lopsided as possible to move that needle. Thanks for indulging my short story lol.
 

Moon Shooter

Hall of Fame
For the numbers people, this is interesting maybe, frankly not even my tennis player friends would care that much so I will share here for lack of a better place....


I agree that arguing a legitimate and transparent rating system will help increase tennis's popularity is not easy. But consider this. Catholic Priests do not think the chastity requirement is too onerous. Now this may seem to be a good argument but then again those who are priests by definition hold that view or at least held that view or they wouldn't be a priest. In other words if you only ask people who accept the conditions you will find that yep they find the conditions acceptable. I am not saying priests should be allowed to marry I am just saying that only asking them is a skewed sample.

Likewise people who play tennis may not care very much that there are no good objective measures of improvement or strength among amateurs. But of course that is the case because they are playing tennis even though there is not a good objective measure of their improvement or strength.

This is why I think we should at least consider other areas and the role a rating plays in those other areas. Just about every chess website offers you a rating as you play. Every national program has a very good and transparent rating system. Now if a rating system was never done in chess it is true some people would still play amateur chess. And if you asked those people who do play chess in that imagined world where ratings never existed they would probably say well its not that important. And in the chess world as it is now there are some people that still don't care about their rating. But there are also a *huge* number of players motivated to improve their rating. This is why every chess playing website offers a legitimate rating. So if you somehow just removed the rating system from all of chess there would undoubtedly be a huge drop in chess participation. I mean if you opened a chess playing website that did not offer ratings people would think you are crazy.

Now I am not saying a rating system will be as important for tennis as it is for chess. I don't think we can know that until we have a good rating system established in amateur tennis for a few generations. But I think offering a legitimate and transparent rating system will almost certainly help at least some. If people don't want to play rated games they don't have to - in chess or tennis. But not even offering a legitimate and transparent rating system in tennis seems hard to justify.

I have no verified UTR events the past year and a half, also no singles results at all UTR or usta that same time period. I have tons of USTA doubles, one doubles tourney win and numerous USTA doubles wins, doubles state results. I usually get put at #1 or #2 doubles for my teams the past 3 seasons with rotating partners, but my partners all have 1 full doubles UTR higher than me, including the opponents, so for doubles the system probably thinks I am being carried. I'm not sure my team would feel that way, not that important though. My singles UTR has gone down 1.3 points in "decay". All my doubles partners and opponents are doubles rated 2 points higher now than my singles UTR, some of them 3 points higher. I'm going to play this weekend against kids with my UTR that I think should be 2 points higher. What I think doesn't matter unless I really focus on winning, I will be nervous knowing I need to win as lopsided as possible to move that needle. Thanks for indulging my short story lol.

I trying to suggest that amateur tennis would benefit from an accurate and objective rating system. I am not saying that UTR is that system. I don't know if it is. The rest is likely to be of interest only to numbers people.


I suspect that the decay timelines they put on ratings may make some sense for pros and youth than it does for adult amateurs. I think they may want to either use different rules for adult amateurs, or they should leave the rating the same but increase the k factor for player that has been out and a reduced k factor for their opponents/partners. The larger the k factor the more each game will have an effect. So if an adult has not posted a rated game for 2 years it may be they were injured and possibly will never fully recover. Or it is possible they do fully recover. It is also possible they just haven't been playing rated games but have been playing lots of tennis clinics and improving all along! If their rating goes down due to some arbitrary decay factor no one concerned with improving their rating will want to play them! Why? Because if their top rating was say a 6 but they decayed to a 4 but in fact they improved above a six you are going to end up with a deflated rating after that match. If you increase the k factor for the person that was out then if they really did lose or gain in the meantime their rating will quickly adjust. Likewise from the opponents view if their rating changed dramatically by reducing the k value the opponent will not take a big hit or a big windfall if the player really did change in strength during that time.

I also think the accuracy of they system can be tested and refined fairly easily. I think both Schmke and I seem to agree on that. Your rating system is supposed to predict who is stronger so you compare how your systems predictions match with what happens in reality! I suspect they will find their decay algorithm is hurting their rating system for adult amateurs. People like Schmke or Jeff Sonas would be able to tell them this if they were transparent about the system. But if they want to keep the system under wraps then they will not be able to have outside help or the legitimacy that comes from transparency.

Schmke ran some numbers that tend to show the ntrp is doing a decent job. But clearly this NTRP is not transparent and seems to be separated between men and women as well as many other potential loopholes that to my mind condemn its legitimacy. I am not surprised people don't put much stock in it. (of course that is a relative) UTR seems to at least be on the right track. But they also seem to be making some decisions that are bewildering and not offering answers. But to be fair to any rating system it will only work if it has enough data. Many people are not submitting their matches to UTR to rate. So even if it is a good system if it has insufficient data it will not give accurate predictions.

If you are playing with people who have a higher UTR and doing well - and again doing well means not only winning matches but winning games in each match - then your utr will go up. Your partners UTR will also go up and your opponents should go down. I think one problem is people seem not to understand that UTR is based not just on who won the match but on how many games were won. See even the original post here "My wife plays almost entirely singles in 4.0 18+ and 40+ USTA leagues, and she wins about 4 of 5 matches." Winning 4 of 5 matches doesn't really tell us *anything* about whether her UTR should go up or down. If she is playing weaker competition and having close matches her UTR will go down. I think sometimes tennis players may let up a bit when a set is 5-1. But I believe UTR judges those games with just as much weight.

So some of the problems may be the UTR system and some of the problems may be that people don't know how the system works and therefore find some short term changes that are confusing.
 
Last edited:
[/QUOTE]
Good points. Tennis needs ratings, just like it helps chess, golf, etc. Plus future generations will be very used to ratings and levels from video games, mobile games etc. Yeah, UTR unfortunately does require winning games or beating someone 0-0 to help the number move, but I've had success before and the max movement I I was about .27 upwards movement from 3 matches. That decay of 1.4 ish hurts! ;)
 

S&V-not_dead_yet

Talk Tennis Guru
This is why I think we should at least consider other areas and the role a rating plays in those other areas. Just about every chess website offers you a rating as you play. Every national program has a very good and transparent rating system. Now if a rating system was never done in chess it is true some people would still play amateur chess. And if you asked those people who do play chess in that imagined world where ratings never existed they would probably say well its not that important. And in the chess world as it is now there are some people that still don't care about their rating. But there are also a *huge* number of players motivated to improve their rating. This is why every chess playing website offers a legitimate rating. So if you somehow just removed the rating system from all of chess there would undoubtedly be a huge drop in chess participation. I mean if you opened a chess playing website that did not offer ratings people would think you are crazy.

Chess ratings exist for the same reason as NTRP/UTR in tennis: so you can find competitive opponents. If I'm looking for a game of either, I don't want to play someone way above or below me; it wouldn't be a competitive match. Once the game begins, the rating is irrelevant. Sure, the results could cause big changes in both ratings but it's not a factor in the actual play [well, it shouldn't be. But we all know the feeling if we're beating or losing to someone we're not "supposed" to.]

If a rating system didn't exist, participants would come up with their own.

As far as the decay factor goes, in chess that was to address sandbagging. That's not particularly relevant to tennis juniors since they don't have the luxury of time like adults do [ie stop playing official tournaments for 2 years in preparation for a big money tournament]. And many chess tournaments, especially the big ones, offer money prizes for all divisions, not just the top, which provides more motivation to suppress one's rating.

The downside is that my chess rating is the same as when I was playing competitively which means if I entered a tournament, I'd not only get crushed but my opponents would wonder if there was a mixup in assignment.
 

Moon Shooter

Hall of Fame
Chess ratings exist for the same reason as NTRP/UTR in tennis: so you can find competitive opponents.

That is one important aspect but not the only one. Chess ratings are also an excellent objective measure of your skill at the game which in turn can help you learn what actually improves your play. Study a bunch of openings does that help your rating? If yes then you know that was time well spent if not then you may want to focus on something else. The same would be true of tennis if they had a decent rating system. Practice and improve your serve versus your forehand or net game etc and then see how your rating does. Unfortunately the NTRP only changes once a year and will only change if there was a huge change in your play so it is impossible to use it to gauge what exactly lead to the most improvement.


If I'm looking for a game of either, I don't want to play someone way above or below me; it wouldn't be a competitive match. Once the game begins, the rating is irrelevant. Sure, the results could cause big changes in both ratings but it's not a factor in the actual play [well, it shouldn't be. But we all know the feeling if we're beating or losing to someone we're not "supposed" to.]

If a rating system didn't exist, participants would come up with their own.

As far as the decay factor goes, in chess that was to address sandbagging. That's not particularly relevant to tennis juniors since they don't have the luxury of time like adults do [ie stop playing official tournaments for 2 years in preparation for a big money tournament]. And many chess tournaments, especially the big ones, offer money prizes for all divisions, not just the top, which provides more motivation to suppress one's rating.

The downside is that my chess rating is the same as when I was playing competitively which means if I entered a tournament, I'd not only get crushed but my opponents would wonder if there was a mixup in assignment.

Chess ratings do not decay because if they decayed it would make the rating system less accurate. Sandbagging may be a consideration but I don't think it is even close to the primary one. If you had a rating 1556 in 2010 and started playing in tournaments now you likely played at least a few unrated games since. Just like if someone left tennis for a few years they likely would not have the first ball they hit coming back be a serve in a rated tournament. They would likely play around a bit more casually to get the rust out. And in both tennis and chess some people have been playing many unrated games and improving quite a bit in the meantime. So assuming you never hit a single tennis ball since your last rated match is just a bad assumption at least for adult amateur tennis. For youths maybe that assumption is better justified.

I also think because chess ratings are such a good indicator of strength people are proud of their rating and having it decay undermines that aspect. But even if ratings decay any rating system should keep track of a players highest rating as a sort of badge.


Yes chess offers some big money tournaments for people that are not high rated. And there are some people who try to win at those as they are rising in the ranks as well as a few that may be sandbagging. (In chess sand bagging tends to mean throwing games to get a lower rating not just improving) I played in a few and it was fun to play with the larger stakes. But because the chess ratings are so good and objective the vast majority of chess players are focused on increasing their rating, not sandbagging so they can win a thousand bucks at some tournament. Sandbagging is a concern for a tiny percentage of chess players well below 1%. The biggest concern by far is that people are unfairly getting higher ratings. And this is reflected in the amount of time the USCF and other groups spend in trying to prevent people cheating to get higher ratings as through computers etc. In chess the efforts to avoid sandbagging are minor and rare compared to the concern that people try to cheat to get higher ratings. In tennis it seems most of the discussion is about sandbagging concerns. I think that is because the tennis rating systems are such a poor indicators of skill (for adult amateurs) anyway, few care much about it. In chess the rating system is legitimate so improving your rating is a big motivator for large number of amateur players.
 
Last edited:
All right, for the benefit of this thread I small talked my opponents on changeovers, one had a "decay" of 1.3 UTR and another 1.1 and both said they hadn't played for about a year in a UTR event, both were teenagers. The good news is the matches were competitive, better than I feared, so it was fun, won a 2 setter and lost a 3 setter.

The bad news was the winds were 20-38 mph, literally, like I normally wouldn't complain, but you had to see it to believe it, my serve toss was acting like it was one of those ping pong balls in the NBA lottery machine or a bingo machine blowing around lol, it was challenging. Also, if you think maybe you are in good or great shape doing drills, practicing, and doing doubles matches, which I have been doing often every week for over a year, watch out, it was 90 degrees and I almost melted after 4 1/2 hours of singles.
 

ChaelAZ

G.O.A.T.
Maybe it is just my game decaying with my age and body, so UTR is just following suit and the truth sucks?

I do think there was some odd manual adjustment recently, watching no just my UTR drop .5, but all the people I have played this season dropping as well. I am not saying I somehow deserve to be a 4.5 or they are totally misrepresenting everything, but just noting something is whack there. But also, since NTRP is kinda whack, and UTR is using those matches (and maybe something in the USTA rating to somewhat match leveling for 3.0-4.5 lower level rec?) that is also creating issues like being seen in NTRP. I dunno. Just odd to see big shifts like that to me.


Annnnnd, now everyone is back up again? Something is weird there, with most all showing 5's and 4's, where they dropped to 4's and 3's when I posted.
 

ChaelAZ

G.O.A.T.
Now are you willing to listen to me that their ratings have nothing to do with "enough data points" and that they are mostly contrived and forced to fit in little boxes for the adult rec player?


Well, no and yes. The more data points you have the better the system works, especially when you calculate out in two decimal places. That is really irrelevant and you are right about fitting rec in tighter boxes. since the weird changes got to be forcing rec players down for more room up top. (y)

I think they still need to work the system to create greater striation for top level players while scrunching all us rec folks into very low numbers. I see rec 4.5 might run up to UTR 9 (just looking around YT titles and what folks have posted here in the past) or some say higher at the moment, but that only leaves 6 rating points to differentiate a 4.5 players from a top level GOAT pro. That is ridiculous, and that was one of the first things I thought would be an issue with UTR, as it similarly is with NTRP but NTRP just gave up at 6.0 and calls everything above a pro. UTR tried to put EVERYTHING in one scale and that needs work. That heat map on the first page says 4.5 would max out at 8 I think, and maybe that is the great scrunch of 2021? :unsure:

Not sure why they chose a 16 point scale, except that their algorithm might use binary scales, and at that point they should have made it 32 to give more breadth and probably could have only used one decimal point for separation and clarity. But that math is well beyond me.
 

schmke

Legend
Not sure why they chose a 16 point scale, except that their algorithm might use binary scales, and at that point they should have made it 32 to give more breadth and probably could have only used one decimal point for separation and clarity. But that math is well beyond me.
WTN will use a scale of 40 (beginner) to 1 (top pro)
 

Moon Shooter

Hall of Fame
IMO the system should not have an artificial cap at all. They should just do something like if you are 2 points higher than some other player you should be able to beat them in 75% of matches.


UTR's system was doomed from the start because they capped the minimum as someone who get get half their serves in (or something like that) and they capped the top at 16.5 or something. So of course no one can be sure what being 2 points better than someone else should mean as far as results. It will depend on how much better the top pro is from all the different players below them.


Is that the problem you guys are talking about as far as trying to crunch all the rec players in?

If so I don't think having a higher cap in the total system will solve this problem. There are an infinite number of real numbers between 0 and 1 so even if the ratings went from zero to 1 we would not be worse off. The problem will happen if you cap off the top end at any number. The solution is not to cap off the top.


I still wonder why Federer's rating is what it is. It seems UTR is not following it's own rules of only considering games for the last 12 months.


 

rrortiz5

Rookie
Sounds about right. I’ve only lost 2 matches this year and one was to a 7 and the other a strong 6. I havent played any USTA league since March and my UTR has gone from a 6.39 to a 5.8... lack of play I guess.
 

J_R_B

Hall of Fame
The goal of NTRP as stated by the USTA is (my paraphrasing) to promote competitive and compatible play, effectively they want two players of the same level to be "compatible". Compatible does not mean each player will win 50% of the matches, rather just that they aren't of dramatically different abilities and can have a reasonably good match.

Note of course that the USTA publishes ratings only to the half point while they calculate them to the hundredth, and they go so far to (somewhat sillily) say that a 6-0,6-0 score is not unexpected if a top of level player plays a bottom of level player. That doesn't sound "compatible" to me, but that is their definition.

Regarding the NTRP algorithm, it sort of Elo based in that for each match it has an expected result and a players rating will go up or down based on doing better or worse than what is expected. NTRP does use the score/games and so it is not just a discrete taking/losing points based on winning or losing like pure Elo does, but one can move up or down varying amounts based on how different the actual score is from the expected.

Now, while the USTA doesn't publish the ratings to a hundredth, I have spent way too much time understanding the algorithm and have come up with a fairly accurate and reasonable recreation of it with my Estimated Dynamic NTRP Ratings (I'm not allowed to post links promoting my site, but Google "schmidt computer ratings ntrp" and you should find my FAQ on my blog and you can read more about my ratings there) and I do periodically look at how well the detailed ratings do in predicting match winners. When I've done it, it is more or less what you'd expect with the higher rated player/pair winning a higher percentage of the time the farther apart the ratings are.

For example, when I looked at it a few years ago, specifically looking at how often the favorite wins a match grouped by different gaps between the players, it revealed this which looks very much like you'd expect.

GapWinning %
0.00 - 0.05
53%​
0.05 - 0.15
63%​
0.15 - 0.25
75%​
0.25 - 0.35
84%​
0.35 - 0.45
90%​
0.45 - 0.55
93%​
0.55 - 0.65
95%​
0.65 - 0.75
96%​

I have not done nor have I seen this sort of analysis done for UTR, but it would be interesting to see if it is similar.

The fact that the percentages above match what one would intuitively expect doesn't necessarily mean NTRP is right or good, but it is certainly a factor in assessing how well a rating system works. If the percentages didn't make sense, it would certainly be a negative for the algorithm.
Interesting. The tail percentages are actually much lower than I expected. A difference of 0.65-0.75 with a winning percentage of 96% implies that 1 in 25 matches between people a full level and a half apart goes to the underdog. That seems like a lot. I'm around the top of 4.0, so let's say 3.90. So if I play 25 guys rated 4.60 (or low level 5.0), I should win once? That seems unlikely. Similarly, if I play 25 guys rated 3.20, or lower range 3.5, I should lose one? also unlikely. Intuitively, I would have guessed the win% for a difference in that range would be >99%.
 

Max G.

Legend
An alternate explanation - if you play 25 guys rated 4.60 (or 3.20), one of those is likely to be misrated and should not have that rating. That table is NOT saying "people whose rating is perfectly stable and accurate, and differs by 0.75, will have an upset 4% of the time. It's saying that, including all the people coming back from injury, or playing through injury, and sandbaggers, and players who go 1-2 years without playing any rated matches and change level in that time, and people whose rating has been messed up by playing a sandbagger or an injured player or another misrated player, 4% of the time a rating difference of 0.65-0.75 is too high.

And that's on top of all the other factors that make rec players inconsistent (singles/doubles specialists playing their non-preferred one, inconsistent conditions and preparation).
 

J_R_B

Hall of Fame
An alternate explanation - if you play 25 guys rated 4.60 (or 3.20), one of those is likely to be misrated and should not have that rating. That table is NOT saying "people whose rating is perfectly stable and accurate, and differs by 0.75, will have an upset 4% of the time. It's saying that, including all the people coming back from injury, or playing through injury, and sandbaggers, and players who go 1-2 years without playing any rated matches and change level in that time, and people whose rating has been messed up by playing a sandbagger or an injured player or another misrated player, 4% of the time a rating difference of 0.65-0.75 is too high.

And that's on top of all the other factors that make rec players inconsistent (singles/doubles specialists playing their non-preferred one, inconsistent conditions and preparation).
I would still think the instance of this would be far less than 1 in 25.
 

Moon Shooter

Hall of Fame
An alternate explanation - if you play 25 guys rated 4.60 (or 3.20), one of those is likely to be misrated and should not have that rating. That table is NOT saying "people whose rating is perfectly stable and accurate, and differs by 0.75, will have an upset 4% of the time. It's saying that, including all the people coming back from injury, or playing through injury, and sandbaggers, and players who go 1-2 years without playing any rated matches and change level in that time, and people whose rating has been messed up by playing a sandbagger or an injured player or another misrated player, 4% of the time a rating difference of 0.65-0.75 is too high.

And that's on top of all the other factors that make rec players inconsistent (singles/doubles specialists playing their non-preferred one, inconsistent conditions and preparation).


I was wondering the same thing as JRB and this makes quite a bit of sense. Especially the players that are coming back after a few years and their ratings have not caught up (or down.) If we did this analysis and only used players that played at least 10 rated matches in the last 12 months I am fairly confident the higher end would converge on 99% considerably quicker.

Edit: although I think it is also possible that someone could get a very high rating in doubles and then play a singles match and perform quite a bit worse. So combining the ratings may also account for some of these outliers that we would normally expect to disappear.
 

schmke

Legend
Interesting. The tail percentages are actually much lower than I expected. A difference of 0.65-0.75 with a winning percentage of 96% implies that 1 in 25 matches between people a full level and a half apart goes to the underdog. That seems like a lot. I'm around the top of 4.0, so let's say 3.90. So if I play 25 guys rated 4.60 (or low level 5.0), I should win once? That seems unlikely. Similarly, if I play 25 guys rated 3.20, or lower range 3.5, I should lose one? also unlikely. Intuitively, I would have guessed the win% for a difference in that range would be >99%.
The 1 in 25 are likely exceptions with odd/strange ratings for some reason, or a result of sandbagging/tanking and associated players being wildly out of level or losing deliberately.
 

Moon Shooter

Hall of Fame
The 1 in 25 are likely exceptions with odd/strange ratings for some reason, or a result of sandbagging/tanking and associated players being wildly out of level or losing deliberately.


I agree that a sandbagger may deliberately throw matches and this would lower the percentages at the top. But wouldn't sandbaggers that are "wildly out of level" mostly be at the top of their rating level anyway? Wouldn't that mean they could not be beating players that are .55-.75 rating points higher? Are you thinking it would be from sandbaggers playing up?
 

J_R_B

Hall of Fame
The 1 in 25 are likely exceptions with odd/strange ratings for some reason, or a result of sandbagging/tanking and associated players being wildly out of level or losing deliberately.
That stuff surely happens to some degree, but I would think it's far less than 1 in 25.
 

ChaelAZ

G.O.A.T.
I thought about paying for the year, just to see the variablity in my rating and see what thresholds there might be for the decay of matches, or more so how much other player's ratings when I am not playing affect my rating still. I am not sure it shows that though, so anyone pay for UTR and how much data is shown in the rating history?
 
Top