
Register  FAQ  Members List  Calendar  Search  Today's Posts  Mark Forums Read 

Thread Tools  Display Modes 
01232013, 06:56 AM  #21  
Legend
Join Date: Jun 2004
Posts: 8,490

Quote:
Another guy was bumped to 4.5T after winning a a 4.0 tourney playing only 3 matches. 

01232013, 07:27 AM  #22  
Professional
Join Date: Jan 2010
Posts: 1,167

Quote:
You didn't say if the 33 teammate had that record at 4.0 or if he was playing up at 4.5. Losing against strong opponents can still improve your rating. And that lopsided win can skew things, particularly if the opponent is or ends up being benchmark. The 4.5T could have been very close to the threshold to be bumped, and then just 3 matches with good results against strong opponents can easily push him over the threshold. 

01232013, 07:55 AM  #23  
Legend
Join Date: Jun 2004
Posts: 8,490

Quote:
The tournament guy had no rating, no prior USTA history other than juniors which were 5 years + ago. He played no other matches for the entire year. I played this guy and I am not saying it was an unfair bump. The guy was ranked in the top 75 in Florida as a junior so he was definitely out of level. I was just pointing out people can get bumped with not a lot of matches. 

01232013, 08:51 AM  #24  
Professional
Join Date: Jul 2009
Posts: 969

Quote:
And USTA algorithm is based on ELO principle. and it does adjust the rating after every match (granted, not after a set but after a match). It adjusts player's dynamic rating. Your periodend ranking is essentially your dynamic ranking at the end of the ranking period, rounded to 0.5. 

01232013, 08:57 AM  #25  
Professional
Join Date: Jul 2009
Posts: 969

Quote:


01232013, 09:34 AM  #26  
Legend
Join Date: Jun 2004
Posts: 8,490

Quote:
These are the changes I would like to see: I would like to see a larger emphasis on matches played at sectionals and Nationals. There should be a much lower threshold to bump these players then what there is currently. I think there should be slightly more emphasis on winloss. The current thinking is that someone can be 100 or 010 against the same set of opponents and still have similar ratings if the guy who lost all the time had close matches and the guy who won all the time but barely won. The truth is the guy who won all the time is better because he knows how to pull out close matches and that should be worth something. No ESR bump downs only bump ups. This prevents current loophole of bumped players playing some spring matches losing them and then rejoining their old team in the fall. To prevent tanking. End of year calculations: High rated players who lose to low rated players (especially if they are a full level below) should have that result thrown out. If the low rated players continue to win and show they are actually at a higher level, the scores could be kept. 

01232013, 09:45 AM  #27  
Professional
Join Date: Jan 2010
Posts: 1,167

Quote:
I might also look at a nonlinear scale for the component that uses game differential (perhaps the USTA does today with the table they have, I don't know) and also give less weight to matches between mismatched players, unless there is an upset of course. Also, I'd probably look at another approach than the averaging the latest match rating with last three dynamic ratings approach to make more recent matches count more. I do some other things with my other ratings systems that seems to work reasonably well. Last, I'd look at ways to try to keep thrown matches from affecting ratings too much. Potential ideas are throwing out results that are way above or below normal, or at least giving them less weight. Last edited by schmke : 01232013 at 09:49 AM. 

01232013, 09:56 AM  #28  
Professional
Join Date: Jan 2010
Posts: 1,167

Quote:
This is already done, although I don't know specifics, through the benchmark calculations. Matches played against benchmark players do count for more, and all players at sectionals and nationals are benchmark by definition. If anyone has more information on this part of the yearend calculation, would love to hear it. 

01232013, 10:01 AM  #29  
Legend
Join Date: Jun 2006
Location: The Great NW
Posts: 6,348

Quote:
As I mentioned in my previous post, what strikes someone as "logical" is going to be based in this case by whether they really believe there is an actual measurable and most of all, reproducable difference between a 3.68 and a 3.69. Obviously anyone can perform the mathematical calculation to come up with such numbers. But is there any data to say that the numbers 3.68 and 3.69 will predict a different outcome in future matchplay (does the 3.69 win more matches)? My guess is there is no such data. Well if there is no difference in matchplay results between 3.68 and 3.69 then from a purely statistical standpoint (this is not my opinion, it is a statistical reality) they should both be 3.7. In the extreme example of treating all 4.0s exactly the same (1 significant digit), you would use simple won/loss records. It would be an OK system (not great). It wouldn't be very nuanced and would miss outliers who played a skewed set of opponents. But OTOH you wouldn't have cases like a prior thread where some guy goes 90 or somesuch and doesn't get bumped. Perhaps that guy played a very odd set of matches, but on the face of it, it looks weird, especially for the next guy who plays him. I gives the impression (perhaps true, perhaps false) that the USTAs system is missing the forest for the trees, or worse. I get your last paragraph's example, but you know what? The most visible tennis ranking systems in the world (the ATP and WTA) don't use any opponent ranking information in their calculation of ranking. A trip to the quarters in a 500 level is the same whether you beat the World #1 60, 60 or if you were LL with three walkovers. You'd think from these boards that the Pro ratings would be all over the place, but except for players coming off of injury, it seems to work just fine. 

01232013, 10:10 AM  #30  
Legend
Join Date: Jun 2004
Posts: 8,490

Quote:


01232013, 10:12 AM  #31  
Legend
Join Date: Jun 2006
Location: The Great NW
Posts: 6,348

Quote:
And to be honest, I'm not even advocating totally ignoring opponent ranking, just point out that there is likely no statistical justification taking that ranking to 3 significant digits. Last edited by LuckyR : 01232013 at 10:15 AM. 

01232013, 10:27 AM  #32  
Professional
Join Date: Jan 2010
Posts: 1,167

Quote:
Quote:
First, while your earlier statement that a 3.68 and 3.69 are not really any different and I assume you'd round both to 3.7 (?) so they are equal, you then have the problem that a 3.64 and 3.65 are near equals but when you round, they are now a tenth apart (3.6 and 3.7) which has suddenly introduced significance you didn't intend and probably isn't appropriate. Second, a player can more easily get stuck at a rating. If they are a 3.7 and have a good match and their rating goes up to 3.74, you round them back down to 3.7. This happens again and they are again at 3.7. If you let the rating stay 3.74, that next match may improve them a bit and their steady improvement can be rewarded by gradually getting closer to 4.0 rather than having the rounding issue hold them back. Basically, I don't see a downside to using hundredths. We all know that any given match has variables and a player may play above or below their current rating. So using hundredths and having a 3.69 and 3.68 doesn't mean the 3.69 is better or is going to win more head to head matches, it just means that is what their rating is. Dropping significant digits only causes problems like I describe above. Now, I will grant you that if the USTA were to publish ratings more granular than half points, I would recommend only going to tenths as going farther doesn't serve much purpose. But from a calculation standpoint, at least hundredths is really required. 

01232013, 10:40 AM  #33 
Professional
Join Date: Aug 2010
Posts: 1,476

I find it amusing that this post has generated such discussion when it was basicly a lie.
The original claim was that a person with 100% win ratio at 4.5 was bumped down. The facts are 1 This person was never bumped down 2 His 100% win ratio was only over 4 doubles matches. Heck I could flip a coin and get 100% heads over 4 flips. Does that mean coin flips should no longer be used ?
__________________
Völkl PB10 Mid with some strings at some tension 
01232013, 10:43 AM  #34 
Legend
Join Date: Jun 2006
Location: The Great NW
Posts: 6,348

Obviously we are each making some guesses as to What Would Happen If...
As it turns out there is an objective way of seeing if 3 significant digits is appropriate. The USTA could verify whether players one hundreth of a point perform any differently, with a few mouse clicks and publish the results. Heck they could likely get the results published in a Journal somewhere. That information would not be my (or anyone else's) opinion, it would be a statistical fact. All I'm saying is that IMO such a review would reveal that there is no justification for the current system, but you are correct, until the USTA provides the info, we are all, essentially guessing. 
01232013, 10:55 AM  #35  
Hall Of Fame
Join Date: Sep 2007
Location: NorCal Bay Area
Posts: 3,875

Quote:
This method would be superior in considering scores such as 76, 16, 10. As I understand it, in the current system adjustments are based on total number of games (perhaps I am wrong?). So in this example, the game score is 912, and player A is determined to have 'lost', although of course he won the first match! The binary ELO per set method would recognize this as 2 sets won for player A, and one set for player B. The primary drawback is of course not being able to differentiate between for example 61, 61, and 76, 76. ELO can be adjusted to consider margin of set score in addition to won/loss, but I'm not even sure that would be better. I think scores within a set are often not representative of relative strength anyway. Also, comparing with the current algorithm, the current algorithm already has a similar (and actually more significant) flaw in that the third set is just recorded and considered as 10. Quote:
I've implemented this for other things and think it would be a good fit for tennis. 

OrangePower 
View Public Profile 
Find More Posts by OrangePower 
01232013, 11:04 AM  #36 
Rookie
Join Date: Jun 2012
Posts: 135

Schmke, when you say "give less weight to matches between mismatched players, unless there is an upset of course" (italics added), is that really what you meant? Because a tank job looks exactly like an upset to the computer. I think the computer already throws out matches when the players/teams are more than 0.5 apart. I think you touched on a better idea for eliminating tanking, which to eliminate matches that are too far from the "expected range". It would be simple enough for the USTA to do some calculations to identify match results that are outside, say, a 95% frequency of occurrence, and eliminate them from the NTRP calculation on the suspicion of tanking or injury or other unreliable indicator of ability. Yes that would wipe out the occasional wonderful and wellearned upset, but for every one of those I suspect it would also eliminate 20 tank jobs.
For those wondering what the statistical difference is between a 3.68 and a 3.69, see the table below Using a methodology very similar to Schmke's (so I don't have everyone's true NTRPs, just my estimates), and a large database of over 20,000, here's what I calculate. I sure hope the table comes out legibly. [Rating difference] [% won by higher player] 0.000.01 52% 0.010.05 55% 0.050.10 63% 0.100.15 69% 0.150.20 72% 0.200.25 77% 0.250.30 80% 0.300.35 83% 0.350.40 86% 0.400.45 89% 0.450.50 91% I suspect the percentages for the upper categories are somewhat artificially depressed by tanking, but of course have no proof. So the answer to 3.68 vs. 3.69 is one could expect the 3.69 to win about 52% of the time, based on this admittedly limited sample and imperfect NTRP calculation method. For my third and fourth cents I'd agree that in a redesign I'd add some weighting for actually winning the match. But to base ratings solely on won/lost records would be much more flawed than even the current system, for all the reasons schmke points out. 
NumbersGuy 
View Public Profile 
Find More Posts by NumbersGuy 
01232013, 12:46 PM  #37  
Professional
Join Date: Jul 2009
Posts: 969

Quote:
Now let's assume that whoever designed the system decided that the expected result between 4.35 and 4.12 (or any other two sides where ranking difference is 0.23) is 5 games difference. Now the 4.35 player won 7:6, 6:4. So while he won, he 'lost' per ELO formula since he was expected to win by 5 games while he won by only 3. The formula now will give you number of ranking points that each player gained/lost. that may be further adjusted by the 'importance' of the match  the designer may want to give more weight to sectional matches vs. everyday league match. The system is very nonlinear. Meaning if you perform better than expected while playing a player with similar ranking you will gain a lot. But performing better than expected vs. a player with vastly lower ranking will barely earn you any points. Eventually, at some point (in tennis maybe when players are like more than 1.5 ranking points apart), you do not really gain anything no matter how much you win. In that sense USTA algorithm is ELO based. BTW  so is FIFA ranking, and FIFA also uses goal differential (and not just win/loss) when calculating ranking points. But Canada Rogers tennis ranking uses win/loss only. Again, there's no 'right' method. using games differential gives you more granularity, which helps if your data pool is somewhat limited. how many matches does one play during one year? 10?  that is not that many for statistical purposes. 

01232013, 01:02 PM  #38 
Professional
Join Date: Jul 2010
Location: Cackalacky South
Posts: 1,400

I might add that the dynamic ratings calculated each "evening" by our [collectively] USTA computer; aka 'Algy', also takes into account each member's last two matches, when available. Effectively averaging an available three matches whilst generating the member's new dynamic rating. This deals to some reasonable degree with anomalies.
Our area league hosted an evening with one of the gentleman at USTA that was instrumental in the development of the algorithm and the processes involved in the computer rating system and is actively managing the system currently. He spoke very freely about the system and how it works while stopping short of revealing > the algorithm <. Schmke is pretty much spot on about how this all works, and while many of us are curious about the algorithm and its inner workings, that information will not be shared in any detailed way. 
01232013, 03:51 PM  #39  
Hall Of Fame
Join Date: Sep 2007
Location: NorCal Bay Area
Posts: 3,875

Quote:
No doubt there are other algorithms that use some of ELO as a basis, and then do consider scores rather than absolute results, but there are no longer the pure ELO algorithm. If you are interested, there is a lot of material about ELO on the web, specifically as originally developed and currently implemented for chess. The rest of your post is correct, in terms of describing how the current algorithm, and an expectedscore based system in general, works, but is orthogonal to the discussion on ELO. The example in your post does highlight one of the major flaws of the current algorithm: Let's take a similar example of a 4.35 vs a 4.20, and let's say the expected difference is 4 games. And let's say the outcome is 60, 67, 01 . Now the 4.35 has won 12 games and the 4.20 has won 8, such that this result is exactly consistent with the expectation, and neither player's rating is adjusted. This is clearly not representative of the reality that the 4.20 beat a higherrated 4.35. 

OrangePower 
View Public Profile 
Find More Posts by OrangePower 
01232013, 07:26 PM  #40  
Professional
Join Date: Jul 2009
Posts: 969

Well, we just have to disagree here.
Quote:
from Wikipedia http://en.wikipedia.org/wiki/Elo_rating_system: "Supposing Player A was expected to score E points but actually scored S points. The formula for updating his rating is A player's expected score is his probability of winning plus half his probability of drawing. Thus an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead a draw is considered half a win and half a loss. If Player A has true strength Ra and Player B has true strength Rb, the exact formula (using the logistic curve) for the expected score of Player A is Similarly the expected score for Player B is "[end quote] In practice, for USTA tennis you can easily normalize via the following: 'the best possible result' is 12 games difference (6:0, 6:0 score)  so that is '1' in ELO calculations 'the worst possible result' is 12 games difference (0:6, 0:6 score)  so that is '0' for ELO calculations. 'the tie' is 0 game difference. so now if per ELO the expected score for any two players is for example 0.75 than that means that the game difference in that match should be 0.75*2412=6 (meaning a routine 6:3, 6:3 type of the score) if the expected score is 0.35 than the game difference (expected) is 0.4*2412=3.6. that means that if a lower ranked player lost like 4:6, 6:7 than he actually 'won' since he performed better than was expected. Quote:
The beauty of ELO algorithm is that it can be applied to many, many various scenarios. Individual chess game is in fact not that great for that purpose as it provides only three possible outcomes (win/loss/tie), which is why the formula is applied to a set of chess games  either a match, or a tournament, or multiple games played over a given time period. That way the opponents ranking can be averaged over multiple opponents, and the overall result from many games can be actually anything in 0 to 1 range (like 5 wins in 15 games = 0.33)  which corresponds much better to the concept of 'expected result' for ELO purposes. Quote:
Last edited by jmnk : 01232013 at 09:33 PM. 


Thread Tools  
Display Modes  

