100%winning Record got bumped down?!?!

gmatheis

Hall of Fame
He is listed as a 4.0 on every team he has been on recently , so it looks like he was never a 4.5 and therefore did not get bumped down, but rather just didnt get bumped up.
 

gameboy

Hall of Fame
Wow. whatever they did to update the USTA league site just BLOWS!!! You can't even navigate back and forth went clicking on links. WTF???
 

schmke

Legend
He is listed as a 4.0 on every team he has been on recently , so it looks like he was never a 4.5 and therefore did not get bumped down, but rather just didnt get bumped up.

Correct. He played on 4.0 teams in 2008 and 2009, then played on a 4.5 team in 2012 but was still a 4.0 and simply didn't get bumped up.

As to why he didn't get bumped up, I estimated him to be a 3.68 starting the year (based on his 2009 season, I think it still carried over to 2012) and moving from 3.68 to over 4.0 in just 4 matches, while possible, is pretty difficult. Couple that with playing 3 of the 4 matches with a self-rated player and a couple of those likely don't generate match ratings for him (only the self-rated player gets match ratings until he has played a few matches) so it may have really just been a couple matches that counted which would be really difficult to improve that much.

In the end, I estimate his dynamic rating at the end of 2012 was 3.90, so he improved but not enough to get bumped up.

Now, his partner in the 3 matches was rated a 4.5C at the end of the year and my estimate agrees with that, having him at 4.43.

Now, the actual skill gap between him and his partner may not be that large, but because he had a starting rating in the lower half for a 4.0, he "helped" his partner get a pretty high initial rating which in turn contributed to his rating only improving to 3.9 despite winning at 4.5.

This is one of the flaws in the system, that playing a lot of matches with the same partner keeps the rating difference between the partners more or less the same. This results in the higher rated partner getting "pushed" up when the team wins.
 

LuckyR

Legend
Correct. He played on 4.0 teams in 2008 and 2009, then played on a 4.5 team in 2012 but was still a 4.0 and simply didn't get bumped up.

As to why he didn't get bumped up, I estimated him to be a 3.68 starting the year (based on his 2009 season, I think it still carried over to 2012) and moving from 3.68 to over 4.0 in just 4 matches, while possible, is pretty difficult. Couple that with playing 3 of the 4 matches with a self-rated player and a couple of those likely don't generate match ratings for him (only the self-rated player gets match ratings until he has played a few matches) so it may have really just been a couple matches that counted which would be really difficult to improve that much.

In the end, I estimate his dynamic rating at the end of 2012 was 3.90, so he improved but not enough to get bumped up.

Now, his partner in the 3 matches was rated a 4.5C at the end of the year and my estimate agrees with that, having him at 4.43.

Now, the actual skill gap between him and his partner may not be that large, but because he had a starting rating in the lower half for a 4.0, he "helped" his partner get a pretty high initial rating which in turn contributed to his rating only improving to 3.9 despite winning at 4.5.

This is one of the flaws in the system, that playing a lot of matches with the same partner keeps the rating difference between the partners more or less the same. This results in the higher rated partner getting "pushed" up when the team wins.


I agree that this is the likely explanation for what happened, and I agree with the use of the word: "flaw". Sometimes more complication, calculations and rules lead to illogical conclusions like this case.
 
Last edited:

gameboy

Hall of Fame
What is so illogical? I think what shmke described is perfectly logical.

So, if you happen to win several matches against self-rated players, you should get bumped no matter what? THAT would be illogical.
 

schmke

Legend
I agree that this is the likely explanation for what happened, and I agree with the use of the word: "flaw". Sometimes more complication, calculations and rules lead to illogical conclusions like this case.

Perhaps my use of the word flaw was a little strong. It is just how the system behaves when two players partner at doubles all the time. The gap between their ratings will be maintained.

Without some other matches played with other partners and/or singles matches to otherwise adjust one of the individuals ratings, what alternative do you suggest for this situation where matches are always played with the same partner?

You can't just arbitrarily shrink the gap between their ratings as that too many not be correct and may not give the credit due to the higher rated partner. We've just looked at one situation where the big gap seems like an error but I'm sure there are others where a small gap would similarly be an error.

It is just the nature of a system that calculates ratings based on limited data. It isn't perfect, but it is better than many alternatives.
 

Nellie

Hall of Fame
Also, many of the matches against 4.0 competition had tight scores, so those wins might actually bring down the rating (i.e., if you play 4.0s to a virtual tie, you get a 4.0 rating, even if you are playing in a supposed 4.5 match).
 

ronray43

New User
Still, of the eight opponents he played, all in doubles matches, six of them are rated 4.5 and two rated 4.0. He has proven he's not only competitive as a 4.5, but can win at 4.5. Yet another example of why USTA needs to integrate win-loss record into at least a portion of the NTRP algorithm.
 

gmatheis

Hall of Fame
Still, of the eight opponents he played, all in doubles matches, six of them are rated 4.5 and two rated 4.0. He has proven he's not only competitive as a 4.5, but can win at 4.5. Yet another example of why USTA needs to integrate win-loss record into at least a portion of the NTRP algorithm.

4 matches is hardly enough of a sample size to determine that he belongs at 4.5

He very well may have had some very good matches, and his opponents could very well have had some bad matches ... it happens.

My friend who just got bumped to 4.0 for 2012 went 19-2 in the spirng season and was bumped to 4.5 for 2013. He had a reasonable sample size of matches to determine he was ready to move up .... 4 matches is really not enough, especially when they were all close scores.
 

schmke

Legend
Still, of the eight opponents he played, all in doubles matches, six of them are rated 4.5 and two rated 4.0. He has proven he's not only competitive as a 4.5, but can win at 4.5. Yet another example of why USTA needs to integrate win-loss record into at least a portion of the NTRP algorithm.

And how do you know that he didn't win because of a strong partner? His rating did improve throughout the year, just not enough to get bumped up.

But, since you want to focus on records, here are his matches.

The first match was a 5 and 4 win over a 4.5 (bumped up from 2011 so not that strong of one) that went 2-4 for the year and finished with a rating just over 4.0 and a 4.0 that only went 5-5 for the year. And he played with a 4.5 partner. So this one wasn't that strong a win.

The second and third matches were with a self-rated partner so they didn't generate a match rating for him, but were a match tie-break win (but lost more games than the opponent) over a 4.5 that went 1-3 for the year and a newly bumped to 4.5 that went 1-4 for the year, and a close 6 and 5 win over a 4.5 that went 0-2 and a 4.0 that went 1-4. Even if these counted, they would not have been that impressive.

The fourth match was a 5 and 4 win over two 4.5s one that went 4-2 but one that went just 1-1. This was the most impressive result for the year and by itself indicated he played at a 4.5 level, and did improve his dynamic rating, but wasn't enough to get his overall rating above 4.0.

So, if you look at the details, you see that he had a winning record, but the matches were all close, two opponents were 4.0s, and two other opponents were just bumped to 4.5 so likely on the lower end of the 4.5 range. And the combined record of his opponents was just 15-25 on top of them not all being 4.5s. And he played with what appears to be a strong partner.

So his staying at 4.0 seems entirely plausible.
 

LuckyR

Legend
Perhaps my use of the word flaw was a little strong. It is just how the system behaves when two players partner at doubles all the time. The gap between their ratings will be maintained.

Without some other matches played with other partners and/or singles matches to otherwise adjust one of the individuals ratings, what alternative do you suggest for this situation where matches are always played with the same partner?

You can't just arbitrarily shrink the gap between their ratings as that too many not be correct and may not give the credit due to the higher rated partner. We've just looked at one situation where the big gap seems like an error but I'm sure there are others where a small gap would similarly be an error.

It is just the nature of a system that calculates ratings based on limited data. It isn't perfect, but it is better than many alternatives.


Again, I agree with you that that is how the system works.

Basically the current system has 600 levels (1.00 to 7.00, in 0.01 increments) but from a practical standpoint it lumps them into 10 (1.5 through 6.0-7.0). Thus if your rating change happens to cross particular threshholds you get "bumped up", though your rating has been changing every match you play.

It is my personal opinion, (and I know that many disagree with this) that given the tremendous leeway in the numerous variables that go into why we all win and lose matches: emotions, fatigue, preparation, equipment, surfaces, illness, conditions etc that to assume that it is all matchplay quality and assign 3 significant digits worth of accuracy to each player's quality is naive and simpleminded.

I like the 10 level NTRP system, but I would not have a "secret" behind the scenes rating. I would use either the 10 levels themselves, or at most 2 significant digits worth (1.0 to 7.0, in 0.1 increments).

The world is divided into two types: lumpers and splitters. The USTA are splitters, I'm a lumper. Neither is right or wrong, it's how you look at the world.
 

J_R_B

Hall of Fame
There was a guy a couple years ago here at 4.0 who played both 4.0 (8-2 including districts/sectionals) and 4.5 (3-1) and got bumped from 4.0 to 4.5. Then he played a year at 4.5, then played another year at 4.5 where he was 5-0, and got bumped back down to 4.0. It was the strangest thing I'd ever seen in the ratings. Of course, he knew he was a 4.5, so kept playing for his 4.5 team and signed up for the 4.0 team that was going to win the league late in the year and played two matches to get eligible and played in the playoffs (then got bumped again). He's currently playing both 4.5 and 5.0, although it's a stretch to say he's a legit 5.0 (more like a league filler since there is a scarcity of real 5.0s here).
 

schmke

Legend
I like the 10 level NTRP system, but I would not have a "secret" behind the scenes rating. I would use either the 10 levels themselves, or at most 2 significant digits worth (1.0 to 7.0, in 0.1 increments).

I'm a bit confused. The NTRP system does effectively have the 10 levels as there is not a 3.67 level. There is 3.5 and 4.0 and you are one or the other.

But you have to have a way to determine when someone has improved or declined such that they should move into an adjacent level. The NTRP does this through having calculations to the hundredth and established periods at the end of which the rating check and level (re)assignment is done.

In your scenario, how would you calculate when someone should move up or down a level? I think that is what the debate is about, not so much having some reasonable number of levels.

IMHO, you have to go to at a minimum tenths and realistically hundredths to have a reasonable way to calculate ratings based on opponents ratings. Treating all players at a given half-point level the same would not result in an accurate system at all.

You can easily have scenarios where player A, a weak to middle 4.0 plays court 3 and has a good record players just bumped up from 3.5 and gets bumped to 4.5 because he won a lot at 4.0. While player B, a middle to strong 4.0 plays court 1 and loses more than he wins against strong 4.0s just bumped down from 4.5 and gets bumped down to 3.5 because he lost a lot a 4.0. So you have player A, probably not as strong as player B, but A gets bumped up to 4.5 and B down to 3.5, a full 2 levels apart.
 

wrxinsc

Professional
I'm a bit confused. The NTRP system does effectively have the 10 levels as there is not a 3.67 level. There is 3.5 and 4.0 and you are one or the other.

But you have to have a way to determine when someone has improved or declined such that they should move into an adjacent level. The NTRP does this through having calculations to the hundredth and established periods at the end of which the rating check and level (re)assignment is done.

In your scenario, how would you calculate when someone should move up or down a level? I think that is what the debate is about, not so much having some reasonable number of levels.

IMHO, you have to go to at a minimum tenths and realistically hundredths to have a reasonable way to calculate ratings based on opponents ratings. Treating all players at a given half-point level the same would not result in an accurate system at all.

You can easily have scenarios where player A, a weak to middle 4.0 plays court 3 and has a good record players just bumped up from 3.5 and gets bumped to 4.5 because he won a lot at 4.0. While player B, a middle to strong 4.0 plays court 1 and loses more than he wins against strong 4.0s just bumped down from 4.5 and gets bumped down to 3.5 because he lost a lot a 4.0. So you have player A, probably not as strong as player B, but A gets bumped up to 4.5 and B down to 3.5, a full 2 levels apart.

i really appreciate your patience and understanding with all of the OMG such and such is such a bunch of bullbunch posts. clearly many have no desire to understand how math works (not to mention how the USTA process dynamic ratings which is simple math). or even how statistics work. it is all cool baby. they can go along with their way.

a wise guy once pointed out to me that all modern human interaction is mathematics and politics.

hopefully those folks are skilled in the later. i doubt it. otherwise why not try to take a minute or two and understand the real way of things before posting bullbunch. oh. right. it is the internet.

press on my aligned one.
 

OrangePower

Legend
^^^^

Well, it would be simple math and as such easy to understand if USTA published their algorithm... but of course they don't, and for good reason, so we are all left to speculate on how certain results came to be. Some people's guesses (like schmke) are perhaps more educated and better than others', but ultimately still a guess.

Also, just because the current algorithm kinda works does not mean that it is the most optimal way to calculate ratings. So it is fair game to challenge it. For example, I happen to think that an ELO-based algorithm that adjusts a player's rating after every set would work even better.
 

schmke

Legend
^^^^

Well, it would be simple math and as such easy to understand if USTA published their algorithm... but of course they don't, and for good reason, so we are all left to speculate on how certain results came to be. Some people's guesses (like schmke) are perhaps more educated and better than others', but ultimately still a guess.

You are right, I'm guessing, but given how accurate I've been in agreeing with bump ups/downs, I think my estimates are pretty accurate.

Also, just because the current algorithm kinda works does not mean that it is the most optimal way to calculate ratings. So it is fair game to challenge it. For example, I happen to think that an ELO-based algorithm that adjusts a player's rating after every set would work even better.

I agree that the algorithm is not necessarily the best. I'd certainly do some things different if I was doing it from scratch. Tell me more about updating a rating after every set though. If you were supposed to win a set 6-4 but win 6-2, the participating players ratings are updated and you are supposed to win the 2nd set 6-3 or something like that?
 

goober

Legend
4 matches is hardly enough of a sample size to determine that he belongs at 4.5

He very well may have had some very good matches, and his opponents could very well have had some bad matches ... it happens.

My friend who just got bumped to 4.0 for 2012 went 19-2 in the spirng season and was bumped to 4.5 for 2013. He had a reasonable sample size of matches to determine he was ready to move up .... 4 matches is really not enough, especially when they were all close scores.

I know people who were bumped with very matches. A former team mate went 3-3 and was bumped to 4.5. His losses were 1-6, 2-6, 3-6, 4-6, 4-6, 1,6. Unfortunately one of his wins was a lopsided win against a ringer who threw the match against him. Completely unfair that he was bumped. He basically quit USTA after that.

Another guy was bumped to 4.5T after winning a a 4.0 tourney playing only 3 matches.
 

schmke

Legend
I know people who were bumped with very matches. A former team mate went 3-3 and was bumped to 4.5. His losses were 1-6, 2-6, 3-6, 4-6, 4-6, 1,6. Unfortunately one of his wins was a lopsided win against a ringer who threw the match against him. Completely unfair that he was bumped. He basically quit USTA after that.

Another guy was bumped to 4.5T after winning a a 4.0 tourney playing only 3 matches.

It all depends on your current rating when the sequence of matches starts, and then the rating of who you play and the scores. Depending on those factors, it could take just a few matches, or take many matches for someone to get bumped up or down.

You didn't say if the 3-3 teammate had that record at 4.0 or if he was playing up at 4.5. Losing against strong opponents can still improve your rating. And that lopsided win can skew things, particularly if the opponent is or ends up being benchmark.

The 4.5T could have been very close to the threshold to be bumped, and then just 3 matches with good results against strong opponents can easily push him over the threshold.
 

goober

Legend
It all depends on your current rating when the sequence of matches starts, and then the rating of who you play and the scores. Depending on those factors, it could take just a few matches, or take many matches for someone to get bumped up or down.

You didn't say if the 3-3 teammate had that record at 4.0 or if he was playing up at 4.5. Losing against strong opponents can still improve your rating. And that lopsided win can skew things, particularly if the opponent is or ends up being benchmark.

The 4.5T could have been very close to the threshold to be bumped, and then just 3 matches with good results against strong opponents can easily push him over the threshold.

First guy was self rated and all his matches were at 4.0. Obviously I don't know the ratings of the guys he played, but I know roughly how good they were. 2 guys who beat him were average to slightly above average 4.0s based on winning percentage. One guy got bumped to 4.5 but beat him 2 and 1. The lopsided match which was thrown ended up being guy that played in sectionals and nationals so he was benchmarked.

The tournament guy had no rating, no prior USTA history other than juniors which were 5 years + ago. He played no other matches for the entire year. I played this guy and I am not saying it was an unfair bump. The guy was ranked in the top 75 in Florida as a junior so he was definitely out of level. I was just pointing out people can get bumped with not a lot of matches.
 

jmnk

Hall of Fame
^^^^

[...]

Also, just because the current algorithm kinda works does not mean that it is the most optimal way to calculate ratings. So it is fair game to challenge it. For example, I happen to think that an ELO-based algorithm that adjusts a player's rating after every set would work even better.

Fair enough. Do you have an idea for a different ranking algorithms? Let's discuss it.

And USTA algorithm --is-- based on ELO principle. and it --does-- adjust the rating after every match (granted, not after a set but after a match). It adjusts player's dynamic rating. Your period-end ranking is essentially your dynamic ranking at the end of the ranking period, rounded to 0.5.
 

jmnk

Hall of Fame
[...]
I agree that the algorithm is not necessarily the best. I'd certainly do some things different if I was doing it from scratch. Tell me more about updating a rating after every set though. If you were supposed to win a set 6-4 but win 6-2, the participating players ratings are updated and you are supposed to win the 2nd set 6-3 or something like that?
I also really appreciate your opinions here. So I was thinking - what would you do differently if you were designing an USTA ranking system from scratch. I can see separating singles and doubles, perhaps add some 'points' for win (so 06 76 76 win is still a winning result even though one lost more games) - but other than that I have hard time figuring out how you can improve on what they have now.
 

goober

Legend
I also really appreciate your opinions here. So I was thinking - what would you do differently if you were designing an USTA ranking system from scratch. I can see separating singles and doubles, perhaps add some 'points' for win (so 06 76 76 win is still a winning result even though one lost more games) - but other than that I have hard time figuring out how you can improve on what they have now.

Singles and doubles should not be seperate rankings. Even though in theory it would make sense, but from a practical standpoint it would be a logistical nightmare. You know how hard it would be to have team that was filled with players that have different ratings for singles and dubs?

These are the changes I would like to see:

I would like to see a larger emphasis on matches played at sectionals and Nationals. There should be a much lower threshold to bump these players then what there is currently.

I think there should be slightly more emphasis on win-loss. The current thinking is that someone can be 10-0 or 0-10 against the same set of opponents and still have similar ratings if the guy who lost all the time had close matches and the guy who won all the time but barely won. The truth is the guy who won all the time is better because he knows how to pull out close matches and that should be worth something.

No ESR bump downs only bump ups. This prevents current loophole of bumped players playing some spring matches losing them and then rejoining their old team in the fall.

To prevent tanking. End of year calculations: High rated players who lose to low rated players (especially if they are a full level below) should have that result thrown out. If the low rated players continue to win and show they are actually at a higher level, the scores could be kept.
 

schmke

Legend
I also really appreciate your opinions here. So I was thinking - what would you do differently if you were designing an USTA ranking system from scratch. I can see separating singles and doubles, perhaps add some 'points' for win (so 06 76 76 win is still a winning result even though one lost more games) - but other than that I have hard time figuring out how you can improve on what they have now.

The primary thing is to give more value to winning the match, i.e. don't base it simply on game differential. I do football ratings as well and there I do find that using just the score (with diminishing returns) is generally more accurate and extra weight doesn't need to be given to winning, but that is a sport where there is one score, not multiple sets where the winner can have a negative game differential.

I might also look at a non-linear scale for the component that uses game differential (perhaps the USTA does today with the table they have, I don't know) and also give less weight to matches between mis-matched players, unless there is an upset of course.

Also, I'd probably look at another approach than the averaging the latest match rating with last three dynamic ratings approach to make more recent matches count more. I do some other things with my other ratings systems that seems to work reasonably well.

Last, I'd look at ways to try to keep thrown matches from affecting ratings too much. Potential ideas are throwing out results that are way above or below normal, or at least giving them less weight.
 
Last edited:

schmke

Legend
Singles and doubles should not be seperate rankings. Even though in theory it would make sense, but from a practical standpoint it would be a logistical nightmare. You know how hard it would be to have team that was filled with players that have different ratings for singles and dubs?

Agree here, although it might be interesting to have a player with just the one rating but know what their component ratings are at singles vs doubles. From a captaining and match-up standpoint, this could be great information.

I would like to see a larger emphasis on matches played at sectionals and Nationals. There should be a much lower threshold to bump these players then what there is currently.

This is already done, although I don't know specifics, through the benchmark calculations. Matches played against benchmark players do count for more, and all players at sectionals and nationals are benchmark by definition. If anyone has more information on this part of the year-end calculation, would love to hear it.
 

LuckyR

Legend
I'm a bit confused. The NTRP system does effectively have the 10 levels as there is not a 3.67 level. There is 3.5 and 4.0 and you are one or the other.

But you have to have a way to determine when someone has improved or declined such that they should move into an adjacent level. The NTRP does this through having calculations to the hundredth and established periods at the end of which the rating check and level (re)assignment is done.

In your scenario, how would you calculate when someone should move up or down a level? I think that is what the debate is about, not so much having some reasonable number of levels.

IMHO, you have to go to at a minimum tenths and realistically hundredths to have a reasonable way to calculate ratings based on opponents ratings. Treating all players at a given half-point level the same would not result in an accurate system at all.

You can easily have scenarios where player A, a weak to middle 4.0 plays court 3 and has a good record players just bumped up from 3.5 and gets bumped to 4.5 because he won a lot at 4.0. While player B, a middle to strong 4.0 plays court 1 and loses more than he wins against strong 4.0s just bumped down from 4.5 and gets bumped down to 3.5 because he lost a lot a 4.0. So you have player A, probably not as strong as player B, but A gets bumped up to 4.5 and B down to 3.5, a full 2 levels apart.



As I mentioned in my previous post, what strikes someone as "logical" is going to be based in this case by whether they really believe there is an actual measurable and most of all, reproducable difference between a 3.68 and a 3.69. Obviously anyone can perform the mathematical calculation to come up with such numbers. But is there any data to say that the numbers 3.68 and 3.69 will predict a different outcome in future matchplay (does the 3.69 win more matches)? My guess is there is no such data. Well if there is no difference in matchplay results between 3.68 and 3.69 then from a purely statistical standpoint (this is not my opinion, it is a statistical reality) they should both be 3.7.

In the extreme example of treating all 4.0s exactly the same (1 significant digit), you would use simple won/loss records. It would be an OK system (not great). It wouldn't be very nuanced and would miss outliers who played a skewed set of opponents. But OTOH you wouldn't have cases like a prior thread where some guy goes 9-0 or somesuch and doesn't get bumped. Perhaps that guy played a very odd set of matches, but on the face of it, it looks weird, especially for the next guy who plays him. I gives the impression (perhaps true, perhaps false) that the USTAs system is missing the forest for the trees, or worse.

I get your last paragraph's example, but you know what? The most visible tennis ranking systems in the world (the ATP and WTA) don't use any opponent ranking information in their calculation of ranking. A trip to the quarters in a 500 level is the same whether you beat the World #1 6-0, 6-0 or if you were LL with three walkovers. You'd think from these boards that the Pro ratings would be all over the place, but except for players coming off of injury, it seems to work just fine.
 

goober

Legend
I get your last paragraph's example, but you know what? The most visible tennis ranking systems in the world (the ATP and WTA) don't use any opponent ranking information in their calculation of ranking. A trip to the quarters in a 500 level is the same whether you beat the World #1 6-0, 6-0 or if you were LL with three walkovers. You'd think from these boards that the Pro ratings would be all over the place, but except for players coming off of injury, it seems to work just fine.

The problem is that pros play only tournaments and USTA is mostly about leagues and teams so the pro ranking system would not work.
 

LuckyR

Legend
The problem is that pros play only tournaments and USTA is mostly about leagues and teams so the pro ranking system would not work.

I agree with your statement. I wasn't trying to say the USTA should adopt the ATP/WTA systems, just pointing out that as a general concept, you can very successfully, treat all matches the same regardless of the ranking of the opponent.

And to be honest, I'm not even advocating totally ignoring opponent ranking, just point out that there is likely no statistical justification taking that ranking to 3 significant digits.
 
Last edited:

schmke

Legend
I agree with your statement. I wasn't trying to say the USTA should adopt the ATP/WTA systems, just pointing out that as a general concept, you can very successfully, treat all matches the same regardless of the ranking of the opponent.

If everyone is playing at the same level, sure. That isn't the case for USTA League play so the argument, while valid, doesn't apply. With USTA League play you have to have some measure to identify when a player should be bumped up or down. I and others have pointed out that simple win/loss is not going to be accurate and you agreed in an earlier post. And I've also said if I was to do a ratings system from scratch I would look to give some weight to win/loss so I'm not arguing it should be ignored.

And to be honest, I'm not even advocating totally ignoring opponent ranking, just point out that there is likely no statistical justification taking that ranking to 3 significant digits.

So, you'd advocate doing the calculations as they are now, but then rounding the dynamic rating to tenths? If so, you have several problems.

First, while your earlier statement that a 3.68 and 3.69 are not really any different and I assume you'd round both to 3.7 (?) so they are equal, you then have the problem that a 3.64 and 3.65 are near equals but when you round, they are now a tenth apart (3.6 and 3.7) which has suddenly introduced significance you didn't intend and probably isn't appropriate.

Second, a player can more easily get stuck at a rating. If they are a 3.7 and have a good match and their rating goes up to 3.74, you round them back down to 3.7. This happens again and they are again at 3.7. If you let the rating stay 3.74, that next match may improve them a bit and their steady improvement can be rewarded by gradually getting closer to 4.0 rather than having the rounding issue hold them back.

Basically, I don't see a downside to using hundredths. We all know that any given match has variables and a player may play above or below their current rating. So using hundredths and having a 3.69 and 3.68 doesn't mean the 3.69 is better or is going to win more head to head matches, it just means that is what their rating is. Dropping significant digits only causes problems like I describe above.

Now, I will grant you that if the USTA were to publish ratings more granular than half points, I would recommend only going to tenths as going farther doesn't serve much purpose. But from a calculation standpoint, at least hundredths is really required.
 

gmatheis

Hall of Fame
I find it amusing that this post has generated such discussion when it was basicly a lie.

The original claim was that a person with 100% win ratio at 4.5 was bumped down.

The facts are
1- This person was never bumped down
2- His 100% win ratio was only over 4 doubles matches.

Heck I could flip a coin and get 100% heads over 4 flips. Does that mean coin flips should no longer be used ?
 

LuckyR

Legend
Obviously we are each making some guesses as to What Would Happen If...

As it turns out there is an objective way of seeing if 3 significant digits is appropriate. The USTA could verify whether players one hundreth of a point perform any differently, with a few mouse clicks and publish the results. Heck they could likely get the results published in a Journal somewhere.

That information would not be my (or anyone else's) opinion, it would be a statistical fact. All I'm saying is that IMO such a review would reveal that there is no justification for the current system, but you are correct, until the USTA provides the info, we are all, essentially guessing.
 

OrangePower

Legend
I agree that the algorithm is not necessarily the best. I'd certainly do some things different if I was doing it from scratch. Tell me more about updating a rating after every set though. If you were supposed to win a set 6-4 but win 6-2, the participating players ratings are updated and you are supposed to win the 2nd set 6-3 or something like that?
Not exactly; ELO (in its pure form) is binary; what would be important is who won the set, and not the set score. So the rating does not set an expectation for the set score, rather the difference in player ratings determines the starting probability of each player winning the set. Ratings adjust after each set based on who won/lost the set.

This method would be superior in considering scores such as 7-6, 1-6, 1-0.

As I understand it, in the current system adjustments are based on total number of games (perhaps I am wrong?). So in this example, the game score is 9-12, and player A is determined to have 'lost', although of course he won the first match!

The binary ELO per set method would recognize this as 2 sets won for player A, and one set for player B.

The primary drawback is of course not being able to differentiate between for example 6-1, 6-1, and 7-6, 7-6.

ELO can be adjusted to consider margin of set score in addition to won/loss, but I'm not even sure that would be better. I think scores within a set are often not representative of relative strength anyway. Also, comparing with the current algorithm, the current algorithm already has a similar (and actually more significant) flaw in that the third set is just recorded and considered as 1-0.

Fair enough. Do you have an idea for a different ranking algorithms? Let's discuss it.

And USTA algorithm --is-- based on ELO principle. and it --does-- adjust the rating after every match (granted, not after a set but after a match). It adjusts player's dynamic rating. Your period-end ranking is essentially your dynamic ranking at the end of the ranking period, rounded to 0.5.

USTA algorithm is not ELO, although of course they share the principle of adjusting ratings after each set/match. I meant actually applying the specific ELO methodology and algorithm. Most significantly as I've noted above, pure ELO is binary and considers win/loss rather than score, and that's what I would implement for starters.

I've implemented this for other things and think it would be a good fit for tennis.
 
Schmke, when you say "give less weight to matches between mis-matched players, unless there is an upset of course" (italics added), is that really what you meant? Because a tank job looks exactly like an upset to the computer. I think the computer already throws out matches when the players/teams are more than 0.5 apart. I think you touched on a better idea for eliminating tanking, which to eliminate matches that are too far from the "expected range". It would be simple enough for the USTA to do some calculations to identify match results that are outside, say, a 95% frequency of occurrence, and eliminate them from the NTRP calculation on the suspicion of tanking or injury or other unreliable indicator of ability. Yes that would wipe out the occasional wonderful and well-earned upset, but for every one of those I suspect it would also eliminate 20 tank jobs.

For those wondering what the statistical difference is between a 3.68 and a 3.69, see the table below Using a methodology very similar to Schmke's (so I don't have everyone's true NTRPs, just my estimates), and a large database of over 20,000, here's what I calculate. I sure hope the table comes out legibly.

[Rating difference] [% won by higher player]
0.00-0.01 52%
0.01-0.05 55%
0.05-0.10 63%
0.10-0.15 69%
0.15-0.20 72%
0.20-0.25 77%
0.25-0.30 80%
0.30-0.35 83%
0.35-0.40 86%
0.40-0.45 89%
0.45-0.50 91%

I suspect the percentages for the upper categories are somewhat artificially depressed by tanking, but of course have no proof.

So the answer to 3.68 vs. 3.69 is one could expect the 3.69 to win about 52% of the time, based on this admittedly limited sample and imperfect NTRP calculation method.

For my third and fourth cents I'd agree that in a redesign I'd add some weighting for actually winning the match. But to base ratings solely on won/lost records would be much more flawed than even the current system, for all the reasons schmke points out.
 

jmnk

Hall of Fame
Not exactly; ELO (in its pure form) is binary; what would be important is who won the set, and not the set score. So the rating does not set an expectation for the set score, rather the difference in player ratings determines the starting probability of each player winning the set. Ratings adjust after each set based on who won/lost the set.

This method would be superior in considering scores such as 7-6, 1-6, 1-0.

As I understand it, in the current system adjustments are based on total number of games (perhaps I am wrong?). So in this example, the game score is 9-12, and player A is determined to have 'lost', although of course he won the first match!

The binary ELO per set method would recognize this as 2 sets won for player A, and one set for player B.

The primary drawback is of course not being able to differentiate between for example 6-1, 6-1, and 7-6, 7-6.

ELO can be adjusted to consider margin of set score in addition to won/loss, but I'm not even sure that would be better. I think scores within a set are often not representative of relative strength anyway. Also, comparing with the current algorithm, the current algorithm already has a similar (and actually more significant) flaw in that the third set is just recorded and considered as 1-0.



USTA algorithm is not ELO, although of course they share the principle of adjusting ratings after each set/match. I meant actually applying the specific ELO methodology and algorithm. Most significantly as I've noted above, pure ELO is binary and considers win/loss rather than score, and that's what I would implement for starters.

I've implemented this for other things and think it would be a good fit for tennis.
I'm not exactly sure what you mean by 'ELO (in its pure form) is binary' - but the concept of ELO method makes no assumption about win and loss. in plain language ELO method is based on a concept of an 'expected result between two sides'. What the expected result is? Well, that varies depending on the ranking difference between two sides. For example, in tennis, the expected result between 4.35 and 4.12 players may be 2:0 (in sets), or 2:1(in sets) or 5 games difference - that depends on who is designing the formula. There's no 'right' way to do it - what works the best is based on experience, empirical data, etc.
Now let's assume that whoever designed the system decided that the expected result between 4.35 and 4.12 (or any other two sides where ranking difference is 0.23) is 5 games difference. Now the 4.35 player won 7:6, 6:4. So while he won, he 'lost' per ELO formula since he was expected to win by 5 games while he won by only 3. The formula now will give you number of ranking points that each player gained/lost. that may be further adjusted by the 'importance' of the match - the designer may want to give more weight to sectional matches vs. everyday league match.
The system is very non-linear. Meaning if you perform better than expected while playing a player with similar ranking you will gain a lot. But performing better than expected vs. a player with vastly lower ranking will barely earn you any points. Eventually, at some point (in tennis maybe when players are like more than 1.5 ranking points apart), you do not really gain anything no matter how much you win.
In that sense USTA algorithm --is-- ELO based. BTW - so is FIFA ranking, and FIFA also uses goal differential (and not just win/loss) when calculating ranking points. But Canada Rogers tennis ranking uses win/loss only.

Again, there's no 'right' method. using games differential gives you more granularity, which helps if your data pool is somewhat limited. how many matches does one play during one year? 10? - that is not that many for statistical purposes.
 

wrxinsc

Professional
I might add that the dynamic ratings calculated each "evening" by our [collectively] USTA computer; aka 'Algy', also takes into account each member's last two matches, when available. Effectively averaging an available three matches whilst generating the member's new dynamic rating. This deals to some reasonable degree with anomalies.

Our area league hosted an evening with one of the gentleman at USTA that was instrumental in the development of the algorithm and the processes involved in the computer rating system and is actively managing the system currently.

He spoke very freely about the system and how it works while stopping short of revealing > the algorithm <.

Schmke is pretty much spot on about how this all works, and while many of us are curious about the algorithm and its inner workings, that information will not be shared in any detailed way.
 

OrangePower

Legend
I'm not exactly sure what you mean by 'ELO (in its pure form) is binary' - but the concept of ELO method makes no assumption about win and loss. in plain language ELO method is based on a concept of an 'expected result between two sides'. What the expected result is? Well, that varies depending on the ranking difference between two sides. For example, in tennis, the expected result between 4.35 and 4.12 players may be 2:0 (in sets), or 2:1(in sets) or 5 games difference - that depends on who is designing the formula. There's no 'right' way to do it - what works the best is based on experience, empirical data, etc.
Now let's assume that whoever designed the system decided that the expected result between 4.35 and 4.12 (or any other two sides where ranking difference is 0.23) is 5 games difference. Now the 4.35 player won 7:6, 6:4. So while he won, he 'lost' per ELO formula since he was expected to win by 5 games while he won by only 3. The formula now will give you number of ranking points that each player gained/lost. that may be further adjusted by the 'importance' of the match - the designer may want to give more weight to sectional matches vs. everyday league match.
The system is very non-linear. Meaning if you perform better than expected while playing a player with similar ranking you will gain a lot. But performing better than expected vs. a player with vastly lower ranking will barely earn you any points. Eventually, at some point (in tennis maybe when players are like more than 1.5 ranking points apart), you do not really gain anything no matter how much you win.
In that sense USTA algorithm --is-- ELO based. BTW - so is FIFA ranking, and FIFA also uses goal differential (and not just win/loss) when calculating ranking points. But Canada Rogers tennis ranking uses win/loss only.

Again, there's no 'right' method. using games differential gives you more granularity, which helps if your data pool is somewhat limited. how many matches does one play during one year? 10? - that is not that many for statistical purposes.

I think you are mistaken about what ELO is. ELO determines the probability of win/lose between two opponents of different ratings, and provides the formula for adjusting the players' ratings based on their starting ratings and the actual win/lose result. ELO does not attempt to predict partial results (i.e. a score in tennis).

No doubt there are other algorithms that use some of ELO as a basis, and then do consider scores rather than absolute results, but there are no longer the pure ELO algorithm. If you are interested, there is a lot of material about ELO on the web, specifically as originally developed and currently implemented for chess.

The rest of your post is correct, in terms of describing how the current algorithm, and an expected-score based system in general, works, but is orthogonal to the discussion on ELO.

The example in your post does highlight one of the major flaws of the current algorithm: Let's take a similar example of a 4.35 vs a 4.20, and let's say the expected difference is 4 games. And let's say the outcome is 6-0, 6-7, 0-1 . Now the 4.35 has won 12 games and the 4.20 has won 8, such that this result is exactly consistent with the expectation, and neither player's rating is adjusted. This is clearly not representative of the reality that the 4.20 beat a higher-rated 4.35.
 

jmnk

Hall of Fame
I think you are mistaken about what ELO is.
Well, we just have to disagree here.

ELO determines the probability of win/lose between two opponents of different ratings, and provides the formula for adjusting the players' ratings based on their starting ratings and the actual win/lose result. ELO does not attempt to predict partial results (i.e. a score in tennis).
ELO formula most certainly calculates what the expected score between two sides is based on their respective strength (current ranking) - if you normalize the scoring such that all the outcomes are in 0-1 range, which can be done for any type of competition. Accidentally, it is (almost) exactly the same as saying that it 'determines the probability of win/lose between two opponents' :)

from Wikipedia http://en.wikipedia.org/wiki/Elo_rating_system:

"Supposing Player A was expected to score E points but actually scored S points. The formula for updating his rating is

23fbcb658ac1e2565003c2190f28a21e.png


A player's expected score is his probability of winning plus half his probability of drawing. Thus an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead a draw is considered half a win and half a loss.
If Player A has true strength Ra and Player B has true strength Rb, the exact formula (using the logistic curve) for the expected score of Player A is

b0366725c224ee55eab6e2371dc6a0ef.png



Similarly the expected score for Player B is

96a818972fe2bc94e9b3e0e6115ba232.png


"[end quote]

In practice, for USTA tennis you can easily normalize via the following:
'the best possible result' is 12 games difference (6:0, 6:0 score) - so that is '1' in ELO calculations
'the worst possible result' is -12 games difference (0:6, 0:6 score) - so that is '0' for ELO calculations.
'the tie' is 0 game difference.

so now if per ELO the expected score for any two players is for example 0.75 than that means that the game difference in that match should be 0.75*24-12=6 (meaning a routine 6:3, 6:3 type of the score)

if the expected score is 0.35 than the game difference (expected) is 0.4*24-12=-3.6. that means that if a lower ranked player lost like 4:6, 6:7 than he actually 'won' since he performed better than was expected.

No doubt there are other algorithms that use some of ELO as a basis, and then do consider scores rather than absolute results, but there are no longer the pure ELO algorithm. If you are interested, there is a lot of material about ELO on the web, specifically as originally developed and currently implemented for chess.
yes, thanks for the suggestion. I'm actually painfully familiar with the algorithm, as I have indeed played tournament chess for quite a while.

The beauty of ELO algorithm is that it can be applied to many, many various scenarios. Individual chess game is in fact not that great for that purpose as it provides only three possible outcomes (win/loss/tie), which is why the formula is applied to a set of chess games - either a match, or a tournament, or multiple games played over a given time period.

That way the opponents ranking can be averaged over multiple opponents, and the overall result from many games can be actually anything in 0 to 1 range (like 5 wins in 15 games = 0.33) - which corresponds much better to the concept of 'expected result' for ELO purposes.

The rest of your post is correct, in terms of describing how the current algorithm, and an expected-score based system in general, works, but is orthogonal to the discussion on ELO.

The example in your post does highlight one of the major flaws of the current algorithm: Let's take a similar example of a 4.35 vs a 4.20, and let's say the expected difference is 4 games. And let's say the outcome is 6-0, 6-7, 0-1 . Now the 4.35 has won 12 games and the 4.20 has won 8, such that this result is exactly consistent with the expectation, and neither player's rating is adjusted. This is clearly not representative of the reality that the 4.20 beat a higher-rated 4.35.
I agree with you here. You make a good point. This is really due to the way tennis match is scored - you win by winning most sets and not necessarily most games (or points). But I would not call it a 'major flaw'. On average you will find very high correlation between a tennis match win and the game's difference. sure, there are going to be exceptions, as in your example, but they are really rare. the best method would be to make some sort of adjustment to game-difference method such that a winning player (the one that actually won a tennis match) is always assured a positive game difference, regardless of what the actual game difference was. i would vote for that.
 
Last edited:

OrangePower

Legend
Well, we just have to disagree here.
Actually, no, you have convinced me, so I recant :)
In practice, for USTA tennis you can easily normalize via the following:
'the best possible result' is 12 games difference (6:0, 6:0 score) - so that is '1' in ELO calculations
'the worst possible result' is -12 games difference (0:6, 0:6 score) - so that is '0' for ELO calculations.
'the tie' is 0 game difference.

so now if per ELO the expected score for any two players is for example 0.75 than that means that the game difference in that match should be 0.75*24-12=6 (meaning a routine 6:3, 6:3 type of the score)

if the expected score is 0.35 than the game difference (expected) is 0.4*24-12=-3.6. that means that if a lower ranked player lost like 4:6, 6:7 than he actually 'won' since he performed better than was expected.
I think our disagreement centered around the transformation between tennis score and ELO score... your example illustrates your position well. I'm not convinced that such a transformation (or any similar transformation even if non-linear) is valid in terms of the results it produces, but you've convinced me that it could well be the way USTA does it.

yes, thanks for the suggestion. I'm actually painfully familiar with the algorithm, as I have indeed played tournament chess for quite a while.
I've played some in the past myself, so perhaps if we have any further disagreements we can settle them over a combined chess/tennis challenge:) Then all we need to agree on is how to weigh the results!
 

asimple

Semi-Pro
I think the key issue here is not the rating system, but the fact that there is an incentive to play at a lower rating for leagues. I took a long period of time off from tennis, and just started to play leagues last year. It is actually a lot of fun, but IMHO it has drastically changed the game. I don't mind the ratings games as much as the new "line is out" philosophy which is common on many teams. The rating system should and probably does do its job assuming its not gamed. In the greater scheme of things there isn't too much difference if a low 4.5 is on a 4.0 team or vice-versa. The real issue is people intentionally gaming the system. Even in this case I don't know that I get it too much. Winning a national 3.5 championship when your a 4.5 player seems pretty lame to me.

My particular case is probably somewhat common. I was a mid 5.0 when I was young, but took many years off and put on 70 pounds. I self rated as a 4.5 last year and had a winning record but got bumped down to 4.0 which was probably a mistake based on the way I was playing. Over the year I decided to get my game back into shape and lost 40 pounds. At this point I am nowhere near a 4.0 and improving. I've played a few matches so far this year, and based on the results I'm pretty sure my rating is nowhere near the 4.0 level anymore. I'm playing both 4.5 and 4.0, but mainly focusing on 4.5, but I guess as lame as it is I too am continuing the 4.0 play.
 

schmke

Legend
My particular case is probably somewhat common. I was a mid 5.0 when I was young, but took many years off and put on 70 pounds. I self rated as a 4.5 last year and had a winning record but got bumped down to 4.0 which was probably a mistake based on the way I was playing. Over the year I decided to get my game back into shape and lost 40 pounds. At this point I am nowhere near a 4.0 and improving. I've played a few matches so far this year, and based on the results I'm pretty sure my rating is nowhere near the 4.0 level anymore. I'm playing both 4.5 and 4.0, but mainly focusing on 4.5, but I guess as lame as it is I too am continuing the 4.0 play.

This is the challenge for any rating system and especially those like NTRP where there are levels a player plays at. Ignoring any gaming that may go on, player's games may be getting better or worse based on age, practice, physical condition, and so on. So a rating will always be moving one way or another to some degree, for some, that movement may be very significant. The result is that someone may be at the "wrong" level at some point in time during a year until their level is reset at the end of the year.

Your situation sounds like this is occurring "naturally", not through any gaming and this is to be expected, and may result in your 4.0 team having a better chance of doing well. IMHO, this is the natural ebb and flow that results in teams at a given level being better or worse.

But you raised a good point about the incentive for someone to be rated lower than they should be which leads to gaming the system to accomplish that. As long as individuals get a rating and that dictates what level they can play at and teams can be formed more or less from any collection rated at the same level, you are going to have this problem.

One idea (that I have not fully thought through, so just throwing it out for discussion) would be to have a system more like English Soccer (and other similar leagues) where it is a team that is "rated" and plays at a given level and may be promoted up or relegated down based on win/loss performance.

Individuals would still have to have ratings and there would have to be rules about what level team new players can join and to the degree possible these would need to err on the side of having a player on a team at higher level rather than lower, and you'd want to limit roster turnover so a team at a lower level couldn't recruit too many ringers, or force a team to play at a higher level if a roster is a certain percentage new.

Yes, the USTA has the move up or break up rule, but perhaps this needs to be revisited or tightened up.

This may not work with USTA League tennis where teams are less formal than a soccer league, and I'm sure there are other issues, but it is something to think about.
 
Last edited:

schmke

Legend
Schmke, when you say "give less weight to matches between mis-matched players, unless there is an upset of course" (italics added), is that really what you meant? Because a tank job looks exactly like an upset to the computer. I think the computer already throws out matches when the players/teams are more than 0.5 apart. I think you touched on a better idea for eliminating tanking, which to eliminate matches that are too far from the "expected range". It would be simple enough for the USTA to do some calculations to identify match results that are outside, say, a 95% frequency of occurrence, and eliminate them from the NTRP calculation on the suspicion of tanking or injury or other unreliable indicator of ability. Yes that would wipe out the occasional wonderful and well-earned upset, but for every one of those I suspect it would also eliminate 20 tank jobs.

Yes, both of the items I mentioned need to be used.

For example, if in a 4.0 match, an about to be bumped to 4.5 (rated very near 4.5) plays a low-end 3.5 that is playing up (rating very near 3.5), he should in theory win 0 & 0. But if he happens to give up a game and wins 1 & 0 an unweighted system would ding the winning player pretty severely. This is a match that should be given less weight because the opponents are far apart.

But you don't want to discredit the match entirely, so if the match does get closer or perhaps even the lower rated player wins, you want to give it more weight. BUT, you also need the "throw a match" check which would look at the "rating profile" of both players and if this match looks like an anomaly, either throw it out or give it less weight.
 

jmnk

Hall of Fame
Actually, no, you have convinced me, so I recant :)

I think our disagreement centered around the transformation between tennis score and ELO score... your example illustrates your position well. I'm not convinced that such a transformation (or any similar transformation even if non-linear) is valid in terms of the results it produces, but you've convinced me that it could well be the way USTA does it.


I've played some in the past myself, so perhaps if we have any further disagreements we can settle them over a combined chess/tennis challenge:) Then all we need to agree on is how to weigh the results!

hey, i like you already!

back in the days we used to have triathlon of sort: tennis + speed chess + well, 'drinking certain substance that made you play either sport rather challenging' :) and yes, there were heated discussions on how to combine the score and how to assign weight to each activity. It still brings tears to my eye.

i completely agree that applying ELO to any competition is sort of an art. While it gives you a very well formula to calculate the results you still need to figure out what makes the most sense for input values. For example, i would not be terribly surprised if one found a close correlation between the length of the tennis match and the relative strength of the opponents. So it very well may be that the length should be used as a criteria. like 'if 4.5 plays 4.3 the expected length is 1h45min'. The possibilities are really endless.

Incidentally, since we have touched on doubles vs. singles too. there are other algorithms used in on-line multiplier gaming that supposedly are better than ELO is assessing rankings of the players involved. These concepts are more applicable to doubles in tennis as players may play with different partners, against varying sets of opponents.

glad we could have a nice discussion.
 

LuckyR

Legend
For those wondering what the statistical difference is between a 3.68 and a 3.69, see the table below Using a methodology very similar to Schmke's (so I don't have everyone's true NTRPs, just my estimates), and a large database of over 20,000, here's what I calculate. I sure hope the table comes out legibly.

[Rating difference] [% won by higher player]
0.00-0.01 52%
0.01-0.05 55%
0.05-0.10 63%
0.10-0.15 69%
0.15-0.20 72%
0.20-0.25 77%
0.25-0.30 80%
0.30-0.35 83%
0.35-0.40 86%
0.40-0.45 89%
0.45-0.50 91%

I suspect the percentages for the upper categories are somewhat artificially depressed by tanking, but of course have no proof.

So the answer to 3.68 vs. 3.69 is one could expect the 3.69 to win about 52% of the time, based on this admittedly limited sample and imperfect NTRP calculation method.


Wow, that looks like a lot of work (the adding of your estimation of the NTRPs, not the calculation itself).

Of course to get the actual data (perhaps similar or identical to your estimate, or not...) it would not take nearly the effort for the USTA.

A 2% difference by your estimate (using 50% as the standard), seems reasonable at first glance, of course the number is meaningless (or more meaningless, since it is your estimation of the true NTRP) without a mention of the error of the calculation.
 
Top