UTR Rising !!! I have finally conquered the algo

My observation is that most players have match ratings that vary +/- 0.2 during a year. I wouldn't call that a margin of error though, as the factors influencing that are more than just their performance. How they do match to match matters of course, but so does how much their opponent(s) and partner vary too.

Yes but isn't that the same for UTR? I mean if match ratings are varying +/- 0.2 during a year and a half USTA point equals @ 2 UTR points then we shouldn't be surprised that UTR's ratings (both match and overall) vary a bit more during a year. UTR is just more open about it because they publish this whereas USTA keeps it hidden.
 
Yes but isn't that the same for UTR? I mean if match ratings are varying +/- 0.2 during a year and a half USTA point equals @ 2 UTR points then we shouldn't be surprised that UTR's ratings (both match and overall) vary a bit more during a year. UTR is just more open about it because they publish this whereas USTA keeps it hidden.
No. What UTR is doing would be equivalent to USTA saying I’m a 4.0 ntrp, and then saying I’m promoted to 4.5 ntrp 6 months later because I didn’t play any matches in the last 6 months but the last guy I beat was a high school player and he started playing college matches.
 
Last edited:
The match rating is how well you do in a particular match. Presumably, a decent rating system should smooth out those fluctuations when calculating your actual rating.

It's totally reasonable for a player to have really good days and really bad days (fluctuation in match rating). It's not good if the rating system takes a few good days and says "yay you're a 4.5 now!!!" and then a few bad days and goes "wait no nevermind you're a 3.5..." and then a few weeks later "well some opponents of yours turned out really good so you're a 5.0!"
 
Yes but isn't that the same for UTR? I mean if match ratings are varying +/- 0.2 during a year and a half USTA point equals @ 2 UTR points then we shouldn't be surprised that UTR's ratings (both match and overall) vary a bit more during a year. UTR is just more open about it because they publish this whereas USTA keeps it hidden.
But it isn't my (or any player's) match rating varying, yes, that would be expected. Instead it is the "dynamic" rating itself varying on a daily basis and sometimes significant amounts. In my ratings the dynamic rating does not vary +/- 0.2 on a regular basis. Typically it moves at most a few hundredths match to match, and most players will go up or down +/- 0.1 over the entire year. It is usually only players that are improving or have declining skills that vary more than that.
 
But it isn't my (or any player's) match rating varying, yes, that would be expected. Instead it is the "dynamic" rating itself varying on a daily basis and sometimes significant amounts. In my ratings the dynamic rating does not vary +/- 0.2 on a regular basis. Typically it moves at most a few hundredths match to match, and most players will go up or down +/- 0.1 over the entire year. It is usually only players that are improving or have declining skills that vary more than that.

The dynamic rating is trying to predict what your match performance rating will be. USTA's algo makes it so your dynamic rating doesn't move and seems much more stable even though actual match performance results will be bouncing around. UTR simply has a dynamic rating that shows that variation. For UTR if you look at the rating for the past month of someone that plays at least 10 matches a year (in doubles or singles) and take the average. The actual match results will be just as close to that then if you take USTA's dynamic rating at any given time. USTA likely will be more accurate for people that play less then 10 matches in the relative group.

In other words, the fact that USTA's rating system is not as sensitive to the data does not mean it is automatically more accurate.
 
The dynamic rating is trying to predict what your match performance rating will be. USTA's algo makes it so your dynamic rating doesn't move and seems much more stable even though actual match performance results will be bouncing around. UTR simply has a dynamic rating that shows that variation. For UTR if you look at the rating for the past month of someone that plays at least 10 matches a year (in doubles or singles) and take the average. The actual match results will be just as close to that then if you take USTA's dynamic rating at any given time. USTA likely will be more accurate for people that play less then 10 matches in the relative group.

In other words, the fact that USTA's rating system is not as sensitive to the data does not mean it is automatically more accurate.
Where have you read that UTR is trying to have their dynamic rating show the match to match variation? Given they state they are looking back at a full year's worth of data, I'd expect it to be the complete opposite, the rating would represent some combination, more weight given to more recent matches, of that entire year.

But again, the key complaint some have about UTR is not that it varies based on match results, rather that it varies significantly when there are no new results for the player or anyone a few degrees separated from them.

Note that I have not said the the USTA's rating system is more or less accurate than UTR. I've simply said the the degree to which UTR ratings seem to change when there is little to no change in the inputs is an indication that the algorithm is volatile and/or the algorithm isn't stable. If a player goes up and down 0.2 every day, I think that leads to some question about its accuracy, but what is accuracy?

Accuracy of a rating system is also hard to nail down. What is the measurement for how accurate a rating is? Is the system that predicts winners the best the most accurate? Is the one that is closest to predicting actual scores the most accurate? Or is it some other criteria like how well it "fits" the inputs? Different ratings systems have different goals or may be optimized for different accuracy measurements.
 
Last edited:
Update:

My singles UTR still rising.

But my doubles UTR has sadly been downgraded to a 100% reliability UTR 4. It seems my win in my second to most recent match over a 100% reliability UTR 9 is no longer being counted because our 5 UTR unit gap in level is too big a gap.
 
Update:

My singles UTR still rising.

But my doubles UTR has sadly been downgraded to a 100% reliability UTR 4. It seems my win in my second to most recent match over a 100% reliability UTR 9 is no longer being counted because our 5 UTR unit gap in level is too big a gap.
The binary decision to include/exclude certain matches from the calculation if the gap crosses a threshold could very well be a key contributor to the volatility and oscillations we see. if on one iteration of the algorithm a result is used, but on the next it isn't, you can understand how there would be wild swings.
 
The binary decision to include/exclude certain matches from the calculation if the gap crosses a threshold could very well be a key contributor to the volatility and oscillations we see. if on one iteration of the algorithm a result is used, but on the next it isn't, you can understand how there would be wild swings.
The exclusions contribute, but there are obviously more serious actual “bugs” in the code.

I have a double digit win streak in rated doubles matches, spanning 18 months (7-0 in last 12 months), with average opponent rating about UTR 7. And all opponent ratings at least UTR 6.

If I beat all the UTR 7s, How can my rating be 100% reliability UTR 4?

When I emailed this exact question to UTR, the rep replied “I forwarded your inquiry to our UTR data team, and they have confirmed that your rating is correct and accurate.”
 
Last edited:
I have a double digit win streak in rated doubles matches, spanning 18 months (7-0 in last 12 months), with average opponent rating about UTR 7. And all opponent ratings at least UTR 6.
Maybe the AI that powers UTR algorithms has a sense of humor about players who play only mixed doubles avoiding same-gender competition and throws in a -3 UTR adjustment for that!! Those players don’t get much respect at tennis clubs and the guy who programmed the UTR algorithm decided not to give them respect also.

 
Would we say that the 'average' UTR of the Bryan brothers' opponents was 28? Because what I see is a 4 over a 5 (good job!), 4 over a 6.5 (even better!), 4 over a 3.5, ...
 
Would we say that the 'average' UTR of the Bryan brothers' opponents was 28? Because what I see is a 4 over a 5 (good job!), 4 over a 6.5 (even better!), 4 over a 3.5, ...
Add the two opponent UTR and subtract partner UTR = opponent UTR, per UTR web site.
 
Where have you read that UTR is trying to have their dynamic rating show the match to match variation?
I said “The dynamic rating is trying to predict what your match performance rating will be.” That is the measure of the accuracy of any rating system.

Given they state they are looking back at a full year's worth of data, I'd expect it to be the complete opposite, the rating would represent some combination, more weight given to more recent matches, of that entire year.
Yes because they they think the more recent matches provide more important data as to how you will play in the future then older matches.
But again, the key complaint some have about UTR is not that it varies based on match results, rather that it varies significantly when there are no new results for the player or anyone a few degrees separated from them.
I understand and I agree with this complaint to some extent. But since there is a floor and a ceiling we are all connected. If the top player is playing better but can’t have a rating higher then 16.5 how can the rating system account for his better play? It must move *everyone* else down. In effect having a ceiling and a floor turns a rating system into a ranking system.

Note that I have not said the the USTA's rating system is more or less accurate than UTR. I've simply said the the degree to which UTR ratings seem to change when there is little to no change in the inputs is an indication that the algorithm is volatile and/or the algorithm isn't stable.

There are always inputs being made every time a rated match is played and that will cause a reshuffling throughout the rating system.

If a player goes up and down 0.2 every day, I think that leads to some question about its accuracy, but what is accuracy?

I would say accuracy is how well the system predicts results.
Accuracy of a rating system is also hard to nail down. What is the measurement for how accurate a rating is? Is the system that predicts winners the best the most accurate?
Yes
Is the one that is closest to predicting actual scores the most accurate?
Yes. Both of these would be important. It is one thing to say a rating system predicts the winner. But if you have a ntrp dynamic rated player who is a 3.83 beat a 2.87 7-6, 0-6, 1-0 we should question either what happened in the match or the accuracy of of the rating system - at least in their case.
Or is it some other criteria like how well it "fits" the inputs?
I’m not sure what that means.
Different ratings systems have different goals or may be optimized for different accuracy measurements.
 
I would say accuracy is how well the system predicts results.

Yes

Yes. Both of these would be important. It is one thing to say a rating system predicts the winner. But if you have a ntrp dynamic rated player who is a 3.83 beat a 2.87 7-6, 0-6, 1-0 we should question either what happened in the match or the accuracy of of the rating system - at least in their case.
Regarding predicting winners versus scores, a system tuned for one may not necessarily be the same as one tuned for the other. So which is more important or the goal?
I’m not sure what that means.
Regarding "fitting" past results, this means how well the current ratings would "fit" or predict prior results, e.g. you beat x, y, and z so is your rating better than them, while you lost to a, b, and c so is your rating worse than those players? And this fitting can be measured by win/loss or predicted scores, and again an algorithm tuned for one is not necessarily the same as an algorithm tuned for the other.

I have not seen (haven't really looked to be fair) what UTR's stated goal for their rating is so I'm not sure what to measure it by.
 
Yeah, I think the goal of a rating system is usually more complicated than just "predict the next match result". Rating systems also serve as a way of rewarding past match results (not the same thing). And serve as an incentive structure.

Like, to give an example - say you retire from a match with injury. Does that mean your rating system should immediately drop precipitously? After all, you're injured, so if you played a match you would lose to nearly everyone. Therefore, if the goal of a rating system is to represent your current level, injury should immediately drop it to nearly zero, then recovering to something just below your previous level as you recover.

Or, surfaces! If a rating system's only goal was to predict match results, then obviously for the pros, their rating should suddenly change after Miami/IW (as they all switch to clay) and switch around after RG to grass-court predictions for a few weeks, then switch back to hardcourts. ...but that too would be dumb, because we're not looking for a rating system to be predictive - we're looking for it to judge how players already did.

(Imagine that rationale. "Yep, #2 player just beat the #1 player, he's had better results and is playing great... but his upcoming schedule says he's playing on clay, so he's predicted to play worse! Maybe, fellow commentator, he should announce his entry to some hardcourt tournaments, so the rating system keeps predicting his hardcourt ability and gets him to #1?)

...a lot of what rating systems do seems to me to make sense from looking at what their real goal is. And it's usually not JUST "predicting match results".

USTA NTRP has, as best I can tell, the goal of facilitating league play by grouping players into roughly-similar skill bands. I think that drives a lot of the decisions.

Why is there a single rating for singles and doubles, even though players might have substantially more skill in one than the other? Because once a player is on a team the captain should be able to schedule them for any line, it would be such a headache if captains had some players that were eligible to play singles only and others that were able to play doubles only.

Why is the rating only updated (publicly) yearly? Because captains need to be able to make teams and then expect them to stay together for the whole season (where "the season" can vary in times across the country).

Why does the USTA take into account game differential and not just total result? Because the rating system needs to mostly work for rec players that play as few as 3 rated matches a year. If they only used final W/L, the rating system would adjust far too slowly for those players. (Why doesn't USTA require more matches? Because they don't think these ratings really serve as a usable incentive for most players, if they required more matches they expect most of the impacted players would just go "shucks, guess I'm unrated" instead of playing more matches to get a rating.)

Why does USTA treat new players the way they do, with self-rates and strikes and bumps? Because they want it to be possible for a new player to join a league and get a rating by playing in the league, rather than making a rating a prerequisite for joining a league.

Why does the USTA have this dynamic rating... and then yearly they do some weird adjustments to it based on a very small (relatively) number of matches at sectionals/nationals? Because they want nationals to work, with teams from different areas all playing together, so it's quite important for them to adjust whole areas to be rated similarly, and they can't do that during the year when players from different areas aren't playing each other.

And yes, all those constraints happen against the background of "the rating should represent the player's skill level, the higher rated players should win more often". But predicting match results is not the ONLY goal of the rating system, by far.

I think I understand USTA NTRP pretty well, including the incentives and why it's designed the way it is. I don't know the math - I think Schemke does - but I'm pretty sure I got the basic idea of how it works and why it is the way it is.

...I can't have the same confidence for UTR. Both because it seems more complicated under the hood, but also because I don't know their goals as well. My impression - despite the fact that they call themselves universal - is that the main goal of UTR is to support recruiting of pre-college juniors. They feel they have enough market power in that area that they can dictate to their players that they need some minimum number of match results, so they don't have the USTA's issue where they have to have good ratings for players who only play 3-4 matches. They do not care about extremely imbalanced pairings like USTA Mixed Doubles because that just doesn't come up routinely in junior play, it's a quirk of the USTA's gender-segregated mixed doubles leagues. They seem to care a HUGE amount about their rating sytem being EXTREMELY responsive - juniors can change fast in skill, and if someone goes through a huge skill leap in September they want that to be represented in their rating ASAP before college recruiters make their decisions, even if that takes silly hacks like players' ratings being retroactive and causing recalculations.
 
Last edited:
Regarding predicting winners versus scores, a system tuned for one may not necessarily be the same as one tuned for the other. So which is more important or the goal?

I don't know how you could separate the two with the data we have from tennis matches.

Regarding "fitting" past results, this means how well the current ratings would "fit" or predict prior results, e.g. you beat x, y, and z so is your rating better than them, while you lost to a, b, and c so is your rating worse than those players? And this fitting can be measured by win/loss or predicted scores, and again an algorithm tuned for one is not necessarily the same as an algorithm tuned for the other.

I have not seen (haven't really looked to be fair) what UTR's stated goal for their rating is so I'm not sure what to measure it by.

Again I don't know how you would develop an algo that fits someone's ratings based on past results but would not accurately predict future results. Rating systems typically don't have the data that tells them oh well this guy has been practicing quite a bit since his last rated match and getting better or he was just injured etc. I suppose they could incorporate age to some degree and assume that younger people would get better faster then older people. If that lead to better predictions then I would say that would be a more accurate rating system. But it may not serve other purposes of a rating system that GrassCourtFan talked about - such as demonstrating a certain level of accomplishment in the sport etc.
 
GrassCourtFan
I haven't carefully read everything you said but I agree with most of it. when I said "I would say accuracy is how well the system predicts results." That did not mean to imply accuracy of predicting results is the only purpose of a rating system - not at all. The main reason I play USTA tennis is because that plugs me in to various rating systems (USTA's rating is likely the least informative). And no it is not valuable to me because I think it can predict my match outcomes. But predicting match outcomes I think is the best way to measure accuracy of a system.

I was trying to explain what I mean by "accuracy." I would say the guy with the broken leg does not have a rating that accurately measures his tennis ability. Because if he tried to play with a cast he would lose to people of that same rating. Yes later he may be back in that form. This is what I mean by accuracy. Other people might say his rating is "accurate" because the math was done correctly or all the matches were entered or there was no bug in the system. That is also a reasonable way to understand "accuracy" it is just different then the view of "accuracy" I take and I am trying to be clear what I am talking about.

I have played with injuries and with people with injuries and against people with injuries. Rather then take a default someone will play with cramping calves etc. I think all of this can throw off the accuracy of any rating system. But it is not something I would blame on the rating system - it is just something you have to deal with when you are rating adult rec tennis - especially older guys like me. Chess usually doesn't have issues like this unless someone has a headache that day or maybe they are drunk etc.
 
Ive heard UTR is very important for college bound players. How much variation is there in highschool players/recruits.

For rec players, it probably has more volatility w some guys playing a few matches ipsettimg the algorithm
 
In 3 days, my rating reliability went from 100% to 20%. I feel a void.

I think most people with common sense knew your rating was not that reliable. Computer ratings will sometimes lack common sense. That is why ratings are at best a tool to gauge strength and improvement as opposed to an Oracle making inerrant pronouncements.
It is ironic that utr is criticized for being cryptic as to why it spits out the numbers it does and it claims it is “powered by Oracle”.
 
Back
Top