This is guy from our club who decided to figure out TennisRecord calculations. Might be interesting to some of you...

Interesting, a nice presentation of what TR does. And what it does is a reasonable way to calculate a rating. The question is, is it what the USTA does calculating Dynamic Ratings?

That was pretty cool to watch. I would like to see more of these type of videos on how these calculations work. For example, how does the algorithm work if one player has a dynamci and the other is a self rate?
I would say no, since I know TR has to move some folks around each year when the new annual NTRP get published.

The way that they seem to have decoded TR's averaging (which averages the latest match with the prior three dynamic ratings) is one of those small things that has a huge impact.

In the long version of the video, it's shown that this results in an interesting phenomenon where your latest match counts for 25% (obviously), then the second latest match counts 6.25%, then it goes up to 7.81% and 9.77% for the next two before falling off again.

Perhaps even more interestingly, for players with a lot of historical matches it looks like something around ~40% of your rating is from matches older than your last 6, and maybe 20-25% is from matches older than your last 10.

I wonder, if we knew the USTA algorithm, if we would find it giving that much weight to older historical matches.

I like it, thanks for sharing. Looking forward to watching the long versions when I make time.

It could be possible for TR to have the match/dynamic update formulas exactly right (that is, exactly what USTA uses) but still fail to match USTA's year-end results. That's because this formula is only one piece of the puzzle. There's also:

• Every returning player's starting rating at the beginning of the rating year. If these are not accurate then many dynamic ratings will end up being wrong even if the right updating formula is used

• Which matches count / don't count for year-end rating. This should be knowable but depends on tracking the rules in every area, and TR is not perfect on this. TR also calculates dynamic ratings for matches that USTA does not.

• The year-end rating calculation scheme. The year-end rating is not just the final dynamic rating, there is a separate calculation that could end up very different. TR does seem to attempt to do this separate calculation but that must be a lot harder to figure out.

It could be possible for TR to have the match/dynamic update formulas exactly right (that is, exactly what USTA uses) but still fail to match USTA's year-end results. That's because this formula is only one piece of the puzzle. There's also:

• Every returning player's starting rating at the beginning of the rating year. If these are not accurate then many dynamic ratings will end up being wrong even if the right updating formula is used

• Which matches count / don't count for year-end rating. This should be knowable but depends on tracking the rules in every area, and TR is not perfect on this. TR also calculates dynamic ratings for matches that USTA does not.

• The year-end rating calculation scheme. The year-end rating is not just the final dynamic rating, there is a separate calculation that could end up very different. TR does seem to attempt to do this separate calculation but that must be a lot harder to figure out.
You seem to be making the USTA's case for treating TR as a novelty and not something to consider to be an accurate representation of the dynamic or year-end rating the USTA has?

No one really knows what the USTA is doing. These sites leads annoying declarations about the certainty of bumps based on this site throughtout the year. Our LC makes the case TR is an unwanted joke but USTA does little to prevent it. IMO, they are endorsing it by not stopping it as it wouldn't be hard to do. I will give TR credit that they have done a decent job aggregating the information and making it less annoying to navigate than the various USTA apps and sites.

+1.

I'm the dude who made the video explaining TennisRecord's calculations. Good suggestion on a video about a Computer-Rated player vs. a Self-Rated player. I had that idea as well, just haven't executed on it yet. In my experience, what TennisRecord does in this situation is as follows. It actually doesn't care about whether the players are Computer-Rated or Self-Rated ... the algorithm works the same either way. HOWEVER, if any of the players on a particular court do not have "enough match history", then all of the other players on that court will receive an "S" for their match performance rating, and their new dynamic rating will be simply unchanged from before. "Enough match history" is defined as three matches.

Something I was thinking about as I was watching this... Ajay played Chris in a makeup match in the '24 early start league. The original match date 9/30 was input for the match date even though it was played later and that's currently an NC in the calculation. I would assume it will stay that way at this point but made me wonder what happens when it takes a week to enter scores? So If I play 3 matches in a weekend and the Friday match doesn't get entered but that Sat/Sun matches do. Will TennisRecord calculate them in order of the match or order they get pulled off the site the following week when the Friday match gets picked up?

My understanding is that TR will eventually calculate them in the order that the matches were played (or rather, supposed to have been played), not in the order that the scores were entered. So, there may be an interim period where TR has incomplete information, and for that period, the player's match performance ratings and dynamic ratings will be inaccurate. But once all the scores have been entered, TR will go back and recalculate the performance ratings and dynamic ratings in the correct sequence, as if all the scores had been entered immediately after the match.

Regarding the match of Ajay Goel vs Chris Schlater, which shows 9/30/2023, but was actually played much later (Jan 2024 I think). I know it shows "NC" now. But I would be willing to bet that if we check back a couple months from now, TR will have gone back and calculated the ratings for that match, assuming (incorrectly) that it took place on 9/30/2023. (TR doesn't know any better, because even the USTA TennisLink site shows that this match happened on 9/30.) And then it will recalculate all of the other matches which took place more recently than 9/30/2023. I've captured a screenshot of Ajay's TR page as it stands today, so we can test my theory in a couple months.

I think TR does not go back and calculate matches that were entered out of chronological order. I played a makeup court for a match that was originally scheduled on 10/28 last year. The scores were entered at the time of the match with a double default at D3 that was later edited to include my makeup score. That makeup score appears in TR on the date the match was originally scheduled but it's still "NC" in terms of my rating.

Everyone knows that TR is not accurate for USTA's dynamic or year-end ratings, because they regularly miss on predicting DQ's and bump up/down predictions. That case doesn't need to made by me.

My point was that it's possible that TR is using the exact same match rating formula and dynamic rating update formula as USTA, while failing to match them in other ways.

I've found that they're generally accurate for EOY ratings. For my team, it correctly predicted 90%+ of the guys.

I bet you could have predicted 100% for EOY, same for your opponents. Outside of going to state, sectionals or nationals, it's not really needed.

Not sure how you counted, but if you make a prediction that no one at all gets bumped, you would be correct for about 85% of players. So 90% is not much better than random, if you count non-bumped guys as a "correct" prediction.

That said, I've argued that TR seems to do pretty well at getting the rank order of players right. The players who get bumped up/down tend to be the highest/lowest TR-rated players at that level.

But it's pretty clear that the year-end rating that TR gives you is maybe plus-or-minus 0.2 from your mystery USTA rating, and sometimes much worse than that.

If I ran the USTA, I;d offer a super membership for \$x/month where you get an updated Dynamic rating each month.

The way I like to look at the data is, what % of the players at any given year-end TR rating get bumped up, what % get bumped down, and what % stay the same. In a perfect model, 100% of 4.00s would get a 4.0 YE rating, and 100% of 4.01s would get a 4.5.

This year, I found that 75% of C or A players nationwide with a 3.95 rating got a 4.0 YE23 rating. At 3.91 it was 90%. At 4.00-4.01, it was right at 50%, meaning an imperfect but unbiased model (in past years TR showed about 0.04 downward bias). If you plot “% ending up at 4.0” vs. TR rating, you’ll get an S-shaped curve – the sharper, the better. I believe this is how the efficacy of a model should be judged, rather than “what % of bump-ups were correctly predicted.” For S players the predictive ability is not as good; for example at 3.91 only 75% got a YE23 4.0. Handling of S’s is clearly the area where understanding is foggiest. I found all of the percentages translated very well to other NTRP levels.

If other purveyors of calculated ratings wanted to show off their predictive ability, they would publish some subset (an Area? A District? At least 1000 players I think, but not cherry-picked) of their numbers just before YE ratings were published, and then we could make an informed comparison. Anecdotal evidence about “nailing” one guy or another is not as compelling.

So if two guys win the same number of games, under the TR method as Rajeev outlines it they swap ratings. So if a 3.80 plays a 3.60 and wins 7-6, 4-6, 1-0, TR says the 3.80 guy played like a 3.60 and the 3.60 guy played like a 3.80. There is some logic in that. However, isn’t it just as logical to say that since they "tied", their match ratings should be identical, probably meaning each gets a 3.70?

TennisOTM

Professional
I think the TR way is more logical for the match rating. When you average their match rating into their prior dynamic ratings, their updatating dynamic ratings will move closer to (but not past) each other. If they then repeatedly played each other and kept having the same score, their dynamic ratings would converge together to be equal. By your suggested method, that convergence would happen much more slowly.

Thanks for sharing!
I've done this reverse engineering the formula myself and I'm interested to see how our methods differ. (Haven't watched it yet)

I have also studied the TR calculation for years and understand their match to match calculation. The main difference I have observed with TR and USTA is in the EOY calculations USTA performs. USTA value post-season play as there is a Nationals adjustment that trickles down that is unknown. It seems very hard to get a bump up without playing any post-season matches. Additionally, there seems to exist a large singles adjustment. In the 2 sections I have closely watched over the years, it is the singles players who receive the bump with much lower year end TR ratings. For example, a singles and doubles player who ended at x.28 got a bump. Every player who bumped to 5.0 in the last couple years had singles matches whereas higher YE doubles players only did not receive the bumps. This is not simply a factor of singles players playing higher rated players in the absence of other doubles. For some players, they are well aware of this and if they really want a bump. they focus on singles for the year and receive the bump even though they are nowhere near the predicted bump up range in TR. Finally, the bump downs are few and far between in USTA. I do believe ratings are much stickier at USTA than TR as every year TR predicts several bump downs that do not happen in USTA.

I think that the nationals and other year-end adjustments can also be subjective to give a little juice to whatever they want the year end distribution to look like (recognizing that the straight calculation is flawed), so if that is the case, it's going to be impossible to program that mathematically.

Aside from the year end adjustments, the other big difference between NTRP and TR (and where TR does it wrong) is that they anchor your match rating to your opponent's starting rating, so that if you "tie" (i.e. win the same number of games like 6-3 2-6 1-0 = 9-9 in games), each guy ends up with the other guy's starting rating for the match.

So, if I start with a rating of 3.82 and my opponent starts with 3.44, then I end up with a 3.44 match rating for this "tie" match and he ends up with 3.82. This is nonsensical. We tied. We should have the same match rating. The NTRP calculation is anchored to the average starting rating, so that in this case, the average of 3.82 and 3.44 is 3.63, so each player would get a 3.63 rating for a "tie".

You can also end up winning but getting a lower match rating than your opponent. For example, if you start with a 3.94 and your opponent starts with a 3.56 (so a high 4.0 vs a low 4.0) and you win 6-3 6-4 comfortably with a break each set, you'll get a rating of 3.71 for the match based on the differential anchored at his rating of 3.56, but despite losing, he'll get a rating of 3.79 anchored to your 3.94, so according to TR, he played better than you in the match despite losing in straight sets by a break in each set. It doesn't make sense the way they anchor the match ratings and is a big flaw in the TR calculation (and one that should seemingly be fairly easy to fix).

I think the match rating formula makes sense, and I would bet that NTRP does it the same way. Here a few reasons that it makes sense to me:

1) A brand new player needs a match rating to get their dynamic calculations going. If the formula required your prior rating to calculate something, this would be impossible. You can think of your match rating as what your first rating would be if this was your first match.

2) If anybody "tied" a 3.82 player on a given day, they would get a 3.82 match rating regardless of their history, which makes perfect sense to me. If different players got different match ratings for the same performance against the same player, that would be more nonsenical, I think.

3) The weirdness you point out in your examples is taken care of in the averaging step. The 3.44 and 3.82 players who tied each other would get 3.82 and 3.44 match ratings, respectively, but then when those get averaged in they move to something closer together, but not past each other. The higher-rated player is still higher after the averaging.

4) If those 3.44 / 3.82 players repeatedly played each other (and no one else) and kept tying, they would eventually converge to the 3.63 you noted. It could actually take pretty long to converge even under the TR method, but by your suggested method it would take MUCH longer to converge.

The NTRP calculation definitely does not do it the same way.

1. The NTRP calculation is different (and more like TR) for rated vs nonrated players until the nonrated player has a valid DNTRP (after 3 matches or whatever). It's not required to make all of your calculations bad just so that you don't have to have a different approach as new players get ramped up in DNTRP.
2. If the 3.44 player ties a 3.82 player, they definitely played at the same level that day. The 3.44 may have played up to 3.82 or the 3.82 may have played down to 3.44 or (most likely) they met in the middle somewhere. They definitely did NOT play nearly a level apart on that day and end up with a tied score. Because of the variability in any player's match-to-match play, it's not illogical for two players to tie the same guy on different days but end up with different ratings. It IS illogical for players playing in the same match on the same court against each other to tie but somehow play at different levels.
3. The averaging helps, but it still doesn't change the fact that the individual ratings don't make any sense.
4. Ok, so this rating system makes more sense for people who play a long string of tied matches in a row. That doesn't supercede the fact that it doesn't make sense in the other 99.999% of the stuations.

I think that the nationals and other year-end adjustments can also be subjective to give a little juice to whatever they want the year end distribution to look like (recognizing that the straight calculation is flawed), so if that is the case, it's going to be impossible to program that mathematically.

In this example, the lower player did perform "better" than the higher player in the sense he exceeded the expected match result. A match rating simply attempts to continually tweak the dynamic rating and in this case, the lower player's dynamic rating should move up and the higher player's should move down. A match rating is not a dynamic rating.

When you say that you know that the NTRP calculation works like that, is there an internet-available source that lays that out, or are you relying on specific behind the scenes knowledge you have of the algorithm? I'm curious, as it seems like the USTA ostensibly tries to keep that kind of stuff under wraps.

Do you have evidence for that statement? "It doesn't make sense to me" is not evidence, but maybe you know something more?

It might be that we're sorta both wrong and NTRP does not actually have anything that they call a "match rating." As far as I know it's something that TR made up, and really it's just a step toward the calculation of how to update the dynamic rating after a new match.

In your example of the 3.44 tying the 3.82, let's say the 3.44 had been stable at that number over the last three matches. Then TR says the new dynamic rating for the 3.44 gets updated like this:

(3.44 + 3.44 + 3.44 + 3.82) / 4 = 3.535

You say the calculation should be:

(3.44 + 3.44 + 3.44 + 3.63) / 4 = 3.4875

Can you agree that both of those end results for the new dynamic rating are sensible?

The TR way essentially gives more weight to the new match result. I'm willing to believe that TR's overall update method does give too much weight to the new result in some cases compared to NTRP, but that doesn't mean it's illogical.

I won't comment on whether I think any of the musings on this thread about TR are accurate or not as far as what TR does nor if what TR seems to do is accurate compared to the USTA. The USTA does try to keep a veil of secrecy around the algorithm and anything TR does almost certainly isn't correct given the accuracy or lack thereof of predicting year-end levels, and I can't even say what I do is correct either, although I have reason to believe what I do is closer and my accuracy generally backs that up.

That said, I wanted to comment on a few other things in the above post:
What would you consider evidence of this? Do you believe the majority of bump ups played post-season matches? What percentage? 70%? 80%?

I show that there were just under 16K players that were a 2022 year-end C that were bumped up at the end of 2023. Of those, less than 5K, or just 31% played in the post-season. So it doesn't seem like it is so much harder to get bumped up without playing post-season matches with over twice as many of the bumps up NOT playing in the post-season.

Again, what in your mind constitutes few and far between? Do you believe there are 3 times more bumps up than bumps down? 5 times? More than that?

Again, using 2022 to 2023, there were just over 9K bump downs, so there aren't even twice as many bump ups than bump downs, it is only 1.7 times. That doesn't seem like few and far between to me.

I'm not sure there is much truth in this either, although you may be simply making an observation about TR's predicted and missed bumps?

You seem to be saying more singles players get bumped up than doubles players? And players that want to be bumped up gravitate to play singles?

I took a look at the bump ups from last year and to really determine if someone predominantly played singles or doubles, I looked at players who played at least eight matches, and then, considered them predominantly singles or doubles if they played at least three times of one than the other.

Of the nearly 16K bump ups last year, just over 10K played at least eight matches. Of these, only 690 were predominantly singles players while 2,313 were predominantly doubles players. That does not sound like one has to play singles to get bumped up, or most of the bump ups are singles players, quite the opposite with 3.4 times more doubles players being bumped up than singles.

Yes, there are more playing opportunities in doubles than singles, six of eight spots in most 18+ leagues, and six of seven spots in 40+ that uses four courts, so all things being equal, you'd expect more doubles players to be bumped up and that is indeed what we see, by about the ratio you'd expect.

The thing we know about TR is that the algorithm is “open” (or at least easy to reverse engineer) and reasonably logical.

The thing we know about usta is that the algorithm has to be kept secret. In order to maintain secrecy, it likely has illogical elements so that no logical person can reverse engineer it.

Other than manual year end adjustments, I doubt that is the case. I suspect the dynamic algorithm is very logical and the core of the year end calculations too.

"Illogical elements" is clearly wrong, but randomization might be a possibility. An algorithm that is not strictly determinative but instead assigns randomized weights to results (within a given range) might very well outperform a purely determinative one and have some reverse-engineering protection as a side-effect.

Hey @rgoelmsft, I watched most of your extended videos - nice work!!

About the scatterplots where you showed the opponents' rating gap versus TR's assigned performance gap for a particular match score. I can see how you used those plots to figure out what TR must be doing. However, there are several points on the scatterplot that deviate from the formula quite a bit.

You explained the deviations due to a >0.38 pre-match rating gap, but there are bunch of other points that aren't explained by that. Have you figured out other explanations for those, or are there still some TR perfomance ratings that don't seem to make sense?

I just watched this part, and that is pretty strange. I can't think of any logical reason why your result from 4 matches ago should be weighted >50% more heavily than your result from 2 matches ago, to arrive at your current dynamic rating.

I'd like to think that USTA uses a formula that does not produce a strange phenomenon like this, but who knows?

Let me share some things that makes me question TR.

Just for some background I think the player is a bottom half 3.5 player, or upper end 3.0. (he has a 4.XX UTR) I have played many matches with him and we are about equal that is why I had him play at 3.0 level. I am a 3.0 C player.

Here are some questions:
1) Consider the mixed doubles match 12/16/23 where he played on the D1 court. He went into that match with 3.06 mixed rating. His three prior mixed ratings were 3.08, 3.08 and 2.99. His performance for the match was indicated to be a 3.22. So what was his new mixed dynamic rating? 3.49!? How does that happen?

2) And just generally what was going on to start his self rate for his regular/adult rating? In particular, I am not sure why he received a 2.90 rating after his first match with a partner (nate) that never played before. But then on 2/18/2023 he plays in a 3.0 match with a c rated 3.0 partner (his partner was bumped to 3.5 this year) that tr had at 3.04 and wins 6-2 6-2 against two players that are supposedly 3.20 according to TR. This puts him at 3.93 match performance. And it seems to discount his 2.90 prior dynamic rating so he gets a new dynamic rating of 3.78. So ok big win for him right? That's why he suddenly has a dynamic rating higher then the 4.0 players on the team tri level team. Well shouldn't that be a big loss for his opponents that were supposedly both 3.20?

Well if you look at that 2/18/23 match it shows his opponent had a 3.20 rating and played with a 3.20 partner and lost 6-2 6-2 to a 3.04 and a 2.90. What performance rating did it give him? 3.20!?

This makes no sense to me.

Now I sort of understand why he may have ended up with a 3.93 match performance since there were some relatively new players. But even there TR seems to have embedded that score into stone when it was based on a match with two opponents that barely played any rated matches. So partners that played with him could end up getting their TR rating hammered and his opponents would get their TR ratings boosted. When you only play a few adult rated matches that can really throw things off.

For #1, keep in mind that the 12/16 match occured after the year-end ratings came out. If you click on "current rating history" it is the first match listed. TR resets everyone's dynamic rating (including for mixed) to a new number to start the new year, which is usually different (sometimes much different, as in this case) than the final dynamic rating from the prior year. For this guy, they seem to have him at 3.5000 to start the new year, so his 12/16 match rating got averaged into that.

Great catch! But he was bumped to USTA 3.5 not double bumped to 4.0. So wouldn't a 3.06 rating still be ok? It seems for his partner and his opponents he was suddenly a 3.50 instead of a 3.06 player as well.

It seems they reset his mixed rating to match the dynamic adult rating they calculated. Since they thought he should be double bumped to 4.0 they set his rating at 3.50 at year end and made his mixed rating match that. I didn't know they set the mixed rating to match their adult dynamic rating at year end. Does anyone know if USTA does that with their mixed ratings as well?

Edit: also even if he was set at 3.50 I still think his 3.22 performance should have dropped him lower then 3.49.

Looks like it was 3.47, right? But still, I don't know how they got that. We might guess they would do something like (3*3.50 + 3.22) / 4, based on their "normal" formula that OP discovered, but that comes out to 3.43.

Perhaps the year-start rating gets weighted more heavily than that for the initial matches. If they did (7*3.50 + 3.22) / 8, that would come out about right, the 7 being based on this player having played 7 rated (non-mixed) matches in 2023?

The OP of this thread figured out a lot, but there are still some TR mysteries out there...

yes you are right on the 3.47 - and it still doesnt add up. And how did he end up with a rating pushing 4.5 level player? Seems they railroaded his rating into the stratosphere based on dodgy evidence and kept doubling down on the early mistake. I can understand how that can happen but this also shows why excluding mixed games leads to results that are way off. They completely ignored the mixed games where people had established ratings and overvalued early adult league even though many of those players were self rates without a well established rating.

Why was the first match from 3/12 counted but not the second match? It’s all really odd and seems random.

Other factors that worth mentioning are the time intervals used for updates, and time marching scheme used.

Not all the matches happen at a fixed time intervals. They are also not aligned with the calculation time intervals. The time intervals used for updating the ratings greatly affects the result.

Also the calculation uses explicit formula for time marching, i.e., at each update interval the algorithm only uses ratings from the previous intervals. This is understandable since the implicit scheme requires solving large coupled system, which is much more expensive than explicitly update.

Other factors that worth mentioning are the time intervals used for updates, and time marching scheme used.

Not all the matches happen at a fixed time intervals. They are also not aligned with the calculation time intervals. The time intervals used for updating the ratings greatly affects the result.

Also the calculation uses explicit formula for time marching, i.e., at each update interval the algorithm only uses ratings from the previous intervals. This is understandable since the implicit scheme requires solving large coupled system, which is much more expensive than explicitly update.
On TR, I don't think the time intervals affect the dynamic rating calculations at all. If the player had three prior dynamic ratings in the same rating year, those are averaged in to the new match rating using the formula that OP discovered. It doesn't matter how long ago those three prior matches occurred - they could have happened the prior three days or spaced three months apart, it's the same formula used. Correct me if there is evidence otherwise!

One thing I like about TR… once my rating for my most recent match gets calculated in, my rating is stable. It doesn’t change until I play the next match.

None of this continuous rating decay until I’m a beginner or 2-ntrp-level oscillation non-sense that I’ve seen elsewhere.

It must have been discussed here before. Why USTA keeps it secret? I just stumbled into someone rated 4.0C and he's never won any match according to TR. Zero wins. Does local USTA coordinator has the right to just assign a rating?

One thing I like about TR… once my rating for my most recent match gets calculated in, my rating is stable. It doesn’t change until I play the next match.

None of this continuous rating decay until I’m a beginner or 2-ntrp-level oscillation non-sense that I’ve seen elsewhere.
I don’t believe UTR does not necessarily decrease your rating as time goes on. The decay is in the weight of the rating when you have t played recently.

People may like that your rating doesn’t change in tr. however when tr first pulls a rating virtually out of thin air because the players have played very few matches and then doubles down on that rating it throws the accuracy. Ther is a reason utr scored the best out of all the rating systems when it comes to predictability.

It must have been discussed here before. Why USTA keeps it secret? I just stumbled into someone rated 4.0C and he's never won any match according to TR. Zero wins. Does local USTA coordinator has the right to just assign a rating?

Sure if you self rate as a 5.0 you can lose all your games and likely still end up at least a 4.0.

USTA thinks their adult members are children that can’t handle the truth of how ratings are determined.

It must have been discussed here before. Why USTA keeps it secret? I just stumbled into someone rated 4.0C and he's never won any match according to TR. Zero wins. Does local USTA coordinator has the right to just assign a rating?
If he played a bunch of matches against 4.0s and kept them close, he could get rated 4.0 even if he didn’t win any of them.

If he played a bunch of matches against 4.0s and kept them close, he could get rated 4.0 even if he didn’t win any of them.
I don't know where I got the impression ntrp leans heavily on match winning %, whereas UTR reflects more games won.

As I understand someone could even get dqed after losing a match.

