Reverse engineering the USTA ratings algorithm. (NTRP, Tennis Link, League, Dynamic)

dizzlmcwizzl

Hall of Fame
I am going to attempt to generate my own personal rating and see what I can learn from the process. I believe many of us have a good grasp on the way the USTA calculates these ratings, although few folks really know the ins and outs of the algorithm.

Recently I have been impressed with Schmke's contributions … Although he has no inside knowledge, he has spoken at length about how he believe the system works. His observations have rung true to me mathematically. I think that for him the process of calculating ratings for large groups has provided him insight into the process. I am interested in the same thing.
I have no intention of calculating ratings for others … rather I am going to publically try and calculate a rating for myself and for people I suspect are very close to the cutoff.

I am going to start with public documents provided by the USTA. I have been reading them carefully and they provide quite a bit of detail about the system. Unfortunately, some of these links conflict.

I will be using the TLS website for my starting point. While we know they are not spot on, they have been relatively on target this season. The strongest players I have played this season have had the highest TLS ratings and vice versa for the weakest players. While they are not exact, I believe they are close enough to learn something from.

Why would I even care?

First, I admit there is no good reason to do this. I should just play my matches against whom I am scheduled and do my best. However, it intrigues me and I suspect many of you as well.

Second, I am a Physicist so the math/statistics interests me and I have several weeks of free time before the next semester.

Third, I just injured my hamstring so I will be out for a few days. This gives me something tennis related to concentrate on while I heal.

Fourth, by sharing my observations maybe some of you will point out things I missed or ideas you have. I have found 3 conflicting links on the USTA website, so maybe you have found other links that might shed some light or further confuse me.
 
This website contains what I believe is the most complete description of the algorithm.

In matches where all players have previous ratings the procedure is as follows:

1. The system looks up the current dynamic rating of all the players in the match.

2. The system looks up from a table, the likely score of the match based on the current dynamics of the players.

3. The system compares the likely match score with the actual match score. For example, if one player or team has a tenth of a point higher rating than the opponent, the likely score is 6-4, 6-4.
• If the winning team wins by a larger than expected margin, each player’s ratings is increased based on the margin of victory and the losing player’s rating is decreased by the same amount.
• If the winning team wins by less than the expected margin, their ratings will actually decrease and the losing team’s ratings will increase.
• Likewise, the “wrong” team may win which causes their rating to increase markedly and the rating of the team which was favored would decrease by the same amount.

4. The rating obtained for each player in Step #3 is averaged with a maximum of their previous three dynamic ratings and that number becomes their new current dynamic rating. (Indirectly this connects the current dynamic to all previous matches but weights the four most recent matches more heavily.) The reason for this averaging is to even out the ratings in cases where some unusual situation causes an atypical result.

Each player rating is maintained in the system to the nearest hundredth of a point.

The difference in ratings of the members of a doubles team is held constant in a calculation of an individual match. If the two players are three hundredths (.03) of a point apart going into the match then they are three hundredths (.03) apart after the calculation in Step #4. However, once that number is averaged with the three previous dynamic ratings (Step #5) that difference may change. This is how we measure the performance of players as they change partners.
 
In investigating my own personal rating there are three major underlying questions that concern me.

First, what is the table of game differences the USTA uses. We have read that a double bagel occurs when one player is at the top and another player is at the bottom of the rating level.

I am going to use the table below … In a three set match the largest game difference is 12 games.



Second, how do I adjust player ratings after comparing expected game difference to actual game difference?
I am going to start with this example: Lets assume two singles players (4.09 vs. 4.45) where the 4.45 player wins 6-3, 6-2.

1) Ratings difference ….. 4.45-4.09 =0.36
2) Expected game difference = 9
3) Actual Game difference = 7
4) The 4.09 player performed 2 games better than expected. So there actual result indicates that the players should only have been 0.28 ratings points apart. So I will adjust The 4.09 player up 0.04 points and the 4.45 player down 0.04 points.
5) This new adjusted rating will then be average against some of the players previous results.

Third, and I think most interesting: Do I average this adjusted rating against the last three ratings or with the maximum of the last three ratings? I have seen both in print, I want to look at how ratings are affected by this.
 
Last edited:
1st Case study …. Dizzlmcwizzl.

A new 4.5 player: I was an early start 4.5 two seasons ago and was bumped down at year end. Last season I went 19-4 (at 4.0 and 4.5) including districts and sectionals and was bumped up to 4.5. I assumed I was well into the 4.5 band. The TLS website indicates I was a 4.20 in my district. I think this is probably a little high but agrees with my assessment I was well into the 4.5 band.

This season I am currently 8-7 playing 1/3 of my matches at line 1 doubles and ½ of my matches at line 3 doubles. There has been two unexpected results …. One in which we lost a match we should have won easily and another where we upset a much better pair.

Prior to any analysis I expected my rating to remain steady or slightly improve.
I performed the analysis using two methods …
First Method, I averaged the latest match result with the maximum of the three previous matches. This is the procedure outline in the referenced document above.

By weighting the best results up to 4 times in the rating it has a general upwards pull on the rating. Also since I was averaging over two results (maximum and current) instead of last four ratings very unexpected results had dramatic short term effects on ratings.



 
In the Second analysis I averaged all of the last four ratings together. This had the expected result of smoothing out my results. Both the highs and lows were muted and there was not a net upwards push effect on my rating. This does not match the most complete document on ratings I could find, but it does match anecdotal evidence that we have all seen … that movement in ratings is hard to achieve.



 
Observations:

Unexpected wins or losses have huge effects on ratings in the short term but not so in the long term. For example my partner and I lost badly to a team at the very bottom of the band. This dropped his rating well below the 4.5 threshold and my rating nearly 0.1 points.

However on the next day I partnered with another player and dominated a “better” team. Between this result and the averaging my most recent maximum rating I returned to nearly my previous rating. My partner however, was driven down and has not yet played another match to recover those lost points.


My guess is that the year-end leveling that occurs must average the last few matches that have been played. To be otherwise, the last match you play could result in dramatic year end changes.

The other interesting note is that clearly the more matches you play your starting rating becomes less important. For example, I charted my rating as if I had started the year as an exceptionally low 4.5, a middle 3.5, a top of the band 4.5 and my TLS Rating of 4.2. All of these ratings eventually would be in the 4.5 band based on my results this year.

 
Last edited:
Third, and I think most interesting: Do I average this adjusted rating against the last three results or with the maximum of the last three results? I have seen both in print, I want to look at how ratings are affected by this.

It is averaged with the (up to) 3 most recent dynamic ratings, not the most recent match results (match ratings). I can't tell which you mean in your use of "results".
 
It is averaged with the (up to) 3 most recent dynamic ratings, not the most recent match results (match ratings). I can't tell which you mean in your use of "results".

I meant the rating after each match has been factored in ... including the averaging.
 
do the usta calculations take into account dynamic rating of your partners when playing doubles? like if i'm paired with a strong partner vs a weak partner?
 
The most interesting question to me is based on whether they use the max of the last three ratings ... or if they use the perform a straight average of 4 ratings (3 previous plus the adjusted rating based on the current match).

The link I posted says they use the maximum ... but other links have said straight average of "up to 3" ....

The maximum is what I read, but straight averaging to me seems more inline with anecdotal observation.
 
i love this idea as i have started getting into the rankings to set up match ups. TLS definitely has some issues, but then again so does the algorithm. and sometimes it seems spot on.

better question: why does the UTSA even keep it secret? People game the system now and people would game the system if we knew the real methods and our dynamic rating.

I think it would be a motivational factor for some of us who are competitive to try to "beat" what the system says we should do, also let us know objectively when we are beat by someone who the system says should win, so whatever your personal "he wasn't as good as me" feelings are...... you are wrong.

what is the look up table for predicted scores with point differential?
 
In investigating my own personal rating there are three major underlying questions that concern me.

First, what is the table of game differences the USTA uses. We have read that a double bagel occurs when one player is at the top and another player is at the bottom of the rating level.

I am going to use the table below … In a three set match the largest game difference is 12 games.



Second, how do I adjust player ratings after comparing expected game difference to actual game difference?
I am going to start with this example: Lets assume two singles players (4.09 vs. 4.45) where the 4.45 player wins 6-3, 6-2.

1) Ratings difference ….. 4.45-4.09 =0.36
2) Expected game difference = 9
3) Actual Game difference = 7
4) The 4.09 player performed 2 games better than expected. So there actual result indicates that the players should only have been 0.28 ratings points apart. So I will adjust The 4.09 player up 0.04 points and the 4.45 player down 0.04 points.
5) This new adjusted rating will then be average against some of the players previous results.

Third, and I think most interesting: Do I average this adjusted rating against the last three ratings or with the maximum of the last three ratings? I have seen both in print, I want to look at how ratings are affected by this.

very nicely done. If I may ask - where did you get this table from?
 
very nicely done. If I may ask - where did you get this table from?

I made it up .... I figured two end points are known. And the max game difference is 12 games.

Same rating = no difference ie ... no game difference

Top of level versus bottom of level = double bagels ... ie double bagels

So for singles there could be a 0.5 difference for on level players and I broke that into 12 divisions ... for doubles the max rating spread is 1.0, again broken into 12 divisions.

I suspect their real table does not factor 1 game differences at all since they are essentially no difference in game breaks. I may adjust my table to represent differences in break of serve rather than actual game differences because anything less than 4 game in difference is essentially a tie.
 
I don't think that less than a 4 game difference is essentially a tie... Any match taken to a match tiebreak could have a one or two game difference. Which should properly rate as a closer match, right?

Do match tiebreaks count as just one game?
 
I don't think that less than a 4 game difference is essentially a tie... Any match taken to a match tiebreak could have a one or two game difference. Which should properly rate as a closer match, right?

Do match tiebreaks count as just one game?

The biggest blowout possible for a 3 game difference is 7-6, 6-4 .... while not a tie for sure, you would be hard pressed to convince me that two players with the same rating prior to the match should have their ratings adjusted in any way.

So in that sense I could call that score line essentially a tie for ratings adjustment purposes. Or maybe not ... I will have to noodle on it
 
This is a good estimation of your NTRP changes.

Are you using the TLS ratings for your opponents? If so, wouldn't you want to somehow adjust their ratings to be up to their current level when you determine the resulting changes when you play them?

edit: Also you played in three different areas so the TLS website has you at three different NTRPs.
 
Last edited:
I was going to point that out, that the starting ratings for partners/opponents are important, but more important is adjusting them for each match they play as they could move up or down prior to your match with them which would obviously affect the rating you get for the match.

But dizzy is having so much fun with this, I didn't want to rain on his parade. But I guess I did now ...
 
This is a good estimation of your NTRP changes.

Are you using the TLS ratings for your opponents? If so, wouldn't you want to somehow adjust their ratings to be up to their current level when you determine the resulting changes when you play them?

edit: Also you played in three different areas so the TLS website has you at three different NTRPs.

Who knows how good it is .... but it is a learning exercise.

Yes as far as my opponents are concerned I am also using their year end TLS rating. I am not doing the adjustments for them.

I am justifying this in two ways ...

1st, all of my opponents except one have well established playing careers. I know them all and none of them are dramatically better or worse than they have been.

2nd, in looking at my rating there is not much fluctuation so I suspect the same is true of my opponents ....

3rd, I performed the analysis several more times adding in a random number generator that adds or subtracts up to 0.2 points to each of my opponents ratings. When doing this there has been very little effect on my final rating .... I suspect this is so, since the difference i add or subtract is truly random and I have a fairly large sample size of data to work with. Below is a graph I produced with random additions or subtractions to my opponents ratings and you will see they still appear to converge at roughly the same point.



Finally, and perhaps most importantly, I am not doing this to actually calculate my rating. I am starting with flawed input and I do not know the algorithm. My goal is learning and data mining roughly 4000 matches and sequentially building new ratings is more work that does not move me closer to my goal.
 
I was going to point that out, that the starting ratings for partners/opponents are important, but more important is adjusting them for each match they play as they could move up or down prior to your match with them which would obviously affect the rating you get for the match.

But dizzy is having so much fun with this, I didn't want to rain on his parade. But I guess I did now ...

There is no rain here ... I know I am leaving out that part ... but I have no intention of doing that work especially when it may not matter for an isolated case anyway.
 
This is a good estimation of your NTRP changes.

Are you using the TLS ratings for your opponents? If so, wouldn't you want to somehow adjust their ratings to be up to their current level when you determine the resulting changes when you play them?

edit: Also you played in three different areas so the TLS website has you at three different NTRPs.

I started with just my home district, but have since redone the calculations with a weighted average of the three districts.

By the way ... do I know you, or did you deduce who I am?
 
I started with just my home district, but have since redone the calculations with a weighted average of the three districts.

By the way ... do I know you, or did you deduce who I am?

I deduced who you were, I play in the next District over from you (Mid-Atlantic) and it was surprising to me how many 4.5's play in my District and also play in yours.
 
I deduced who you were, I play in the next District over from you (Mid-Atlantic) and it was surprising to me how many 4.5's play in my District and also play in yours.

Really .... I do not know many folks from my home district that travel to yours.

I can only think of two that make their way up from down your way. .... 1) an ex Naval Academy guy who is very, very good and 2) a young guy that plays the minimum number of matches for a recent national champion.
 
Really .... I do not know many folks from my home district that travel to yours.

I can only think of two that make their way up from down your way. .... 1) an ex Naval Academy guy who is very, very good and 2) a young guy that plays the minimum number of matches for a recent national champion.

They are on the #1 team (Snipers) and I looked again and its only 4 that are based here that travel there. Two are recent 5.0 drop downs the other two are pretty strong 4.5s.

That team also has guys that list their addresses in NY and TX so they really look all over to recruit for that team ;)
 
I deduced who you were, I play in the next District over from you (Mid-Atlantic) and it was surprising to me how many 4.5's play in my District and also play in yours.

Really? To me it's kind of surprising how few do. Up here in Jersey, people play in Middle States NJD, Middle States Philly, Middle States DE, Eastern NJ - Middlesex, Eastern NJ - Sussex, and even some in Eastern Manhattan or Middle States Eastern PA, but there is very little crossover to Mid-Atlantic even though the travel is probably not that much further if there are teams up around Elkton or whatever.
 
Really? To me it's kind of surprising how few do. Up here in Jersey, people play in Middle States NJD, Middle States Philly, Middle States DE, Eastern NJ - Middlesex, Eastern NJ - Sussex, and even some in Eastern Manhattan or Middle States Eastern PA, but there is very little crossover to Mid-Atlantic even though the travel is probably not that much further if there are teams up around Elkton or whatever.

Elkton is in Cecil County and there are no USTA leagues there, the next county down from that is Harford County and there are also no USTA leagues there either. I do see people play in the Central PA leagues but its more common for people to go south if they want to play in a different district and play in DC and Northern VA.
 
Elkton is in Cecil County and there are no USTA leagues there, the next county down from that is Harford County and there are also no USTA leagues there either. I do see people play in the Central PA leagues but its more common for people to go south if they want to play in a different district and play in DC and Northern VA.

Yea ... I live as close as you can to Maryland without being in Maryland. For me the closest Mid Atlantic league is probably a 70-80 minute drive. I am closer than that to Princeton.
 
They are on the #1 team (Snipers) and I looked again and its only 4 that are based here that travel there. Two are recent 5.0 drop downs the other two are pretty strong 4.5s.

That team also has guys that list their addresses in NY and TX so they really look all over to recruit for that team ;)

That captain is the guy that won Nationals two years ago. He is the best recruiter I have ever seen ... everyone one of his teams is a Franken team comprised of players from all around the section.

This year he is bringing an 8.0 and a 9.0 team to mixed nationals.
 
I like the TLS web site because it gives you a ranking list, like the ATP and the WTA. I wish it would take into consideration the diffferent districts though. If you just joined a different district, you ratings are completely different from your old district. If you are a 4.0, I think they must start you off as a 3.75 or something like that. Because my rating in the new district is a lot lower than my old district (over 4.0), even though I won most of the time.
 
That captain is the guy that won Nationals two years ago. He is the best recruiter I have ever seen ... everyone one of his teams is a Franken team comprised of players from all around the section.

This year he is bringing an 8.0 and a 9.0 team to mixed nationals.

I really wish the USTA would step in and put a stop to that stuff. Maybe limit players on a team to ONLY players who actually LIVE within the district they want to play in. The average team has no chance when the USTA allows guys like this to basically recruit all over the country and pull together a national line-up. Goes against the whole idea behind league play.
 
The Texas Section limits the number of players who can live 50+ miles away to one or two (depending on whether the League is in a big city or small city), I believe.
 
The TLS site has a bunch of updates for the 2013 season up.

http://www.tennisleaguestats.com/

I just spent some time comparing his ratings to my feel of who can beat who around here. I think it is pretty good for those people who have played more than a few league matches.
 
^^

Ha. I'm still a 3.49 according to them. :) That's, I think, where they had me at the start of the season too although USTA, in their infinite wisdom, threw me in with the next group up.

But it doesn't surprise me as my season was a mixed bag, at best, and could better be described as mediocre. 'Twill be interesting to see YER as I'm done with league play for the year.
 
My computer science brain loves this idea and wants to try doing this too but my busy tennis captain brain says I don't have time.
 
^^

Ha. I'm still a 3.49 according to them. :) That's, I think, where they had me at the start of the season too although USTA, in their infinite wisdom, threw me in with the next group up.

But it doesn't surprise me as my season was a mixed bag, at best, and could better be described as mediocre. 'Twill be interesting to see YER as I'm done with league play for the year.

I here you .... this season as a 4.5 I am 8-8 overall in adult, which satisfying, yet I know 3 or 4 of those matches could have gone the other way making the swing between 5-11 to 11-5.

....


However what I find interesting is that I am 3-7 with usual partner (also a new 4.5) and 5-1 with anyone else. In every match but one, I felt like I was was the better of the two of us, but something about playing with an established 4.5 just relaxed me and let me play better tennis. I guess that is the lament of newly bumped folks. You have to start at the bottom.
 
I am going to tweak the formula a little and will post this later today.

Also, the new TLS ratings came out and I am going to carve them up a little to see how close my stuff agrees with them.
 
Do you have any other leagues later in the year that would go into your calculation?

In my part of the Mid-Atlantic our Tri-Level season which is also used in the calculation for the Adult ratings starts in the end of summer, early fall.
 
Do you have any other leagues later in the year that would go into your calculation?

In my part of the Mid-Atlantic our Tri-Level season which is also used in the calculation for the Adult ratings starts in the end of summer, early fall.
I am also in the Mid-Atlantic...but I'm done for the year. Our 40+ counts but I'm boycotting it. Not sure if Tri-level stuff counts so I went looking...and found this instead:

*Self-rated players are not allowed to participate at Tri-Level Nationals.

Hello, here's your sign. Maybe they'll eventually catch on with/for the other leagues too.
 
My 2012 TLS "rating" is off by a whopping .22 (which is a pretty dang big margin)... will be interesting to see what the 2013 comes in at.
 
Back
Top