More fun with ELO ratings

falstaff78

Hall of Fame
There's been a spate of recent articles applying ELO ratings to tennis. These ratings attempt to correct for strength of competition thereby eliminating strength of era from the discussion.

I thought it would be useful to collect them in one place, up to and including Jeff Sackmann's latest, today.

Enjoy!

What are ELO ratings:
https://en.m.wikipedia.org/wiki/Elo_rating_system

Carl Bialik:
http://fivethirtyeight.com/features/djokovic-and-federer-are-vying-to-be-the-greatest-of-all-time/

Key result: Borg, Djokovic and Federer have the three highest peaks in the open era. Murray is hugely, hugely underrated by virtue of being born at the wrong time!
morris-bialik-mens-tennis-elo-2-1.png



By age: Fed's post 30 level is unprecedented
morris-bialik-mens-tennis-elo-1-11.png



Jeff Sackmann:
http://www.tennisabstract.com/blog/...-djokovic-and-roger-federer-and-rafael-nadal/

Fed and Djokovic have the highest hard court level in Open Era. Nadal and Borg have highest clay level in Open Era.

Sleepomeno:
http://sleepomeno.github.io/blog/2015/09/08/Historical-ELO-Tennis-Rating/

Enjoy!
 
Last edited:
Nice to see Hewitt making the top 15 on HC ;)

Glad to see my points about Murray in 2009 being backed up. It was one of his very best years and clearly greater than say 2010.
 
Nice to see Hewitt making the top 15 on HC ;)

Glad to see my points about Murray in 2009 being backed up. It was one of his very best years and clearly greater than say 2010.

His win % v non-top 5 players in 2009 was the highest it's been for him.

I'm always doubtful of any system that places Sampras low down the rankings. This is the problem with statistical analysis like this. It doesn't matter how good your system is, people have rigid preconceptions that will not be swayed.

It goes against the "rules", but you almost have to modify your system to fit certain preconceptions, but avoid bias at the same time. Virtually no one ranks Lendl above Sampras, so there is the first place to start. Make sure your system ranks Sampras higher.
 
Why is Andy Murray in this discussion? The guy has two grand slams! You might as well as Sergei Bruguera to the mix. It's a huge disservice to the likes of Federer, Nadal, Djokovic, Borg, Lendl, Laver, Sampras to lump Andy Murray in with them. :mad:
 
His win % v non-top 5 players in 2009 was the highest it's been for him.

I'm always doubtful of any system that places Sampras low down the rankings. This is the problem with statistical analysis like this. It doesn't matter how good your system is, people have rigid preconceptions that will not be swayed.

It goes against the "rules", but you almost have to modify your system to fit certain preconceptions, but avoid bias at the same time. Virtually no one ranks Lendl above Sampras, so there is the first place to start. Make sure your system ranks Sampras higher.

His win/loss record in general was right up there that year. Underrated year for Murray for sure.

I agree about Sampras, clearly he is a greater player than Murray. A system which doesn't recognize this is surely flawed to some degree. His overall consistency seems to let him down in these sorts of analysis. I have no idea how to account for these things though using this particular rating.
 
Why is Andy Murray in this discussion? The guy has two grand slams! You might as well as Sergei Bruguera to the mix. It's a huge disservice to the likes of Federer, Nadal, Djokovic, Borg, Lendl, Laver, Sampras to lump Andy Murray in with them. :mad:

Murray at his best > Sampras at his best based on these rankings.

wat
 
Murray at his best > Sampras at his best based on these rankings.

wat
That pretty much sums up Elo right there. It's all bullcrap! It all means diddly squat. It's just another opportunity, a weak one btw, for sports writers to portray themselves as intellectuals and/or academics. :rolleyes:
 
Murray at his best > Sampras at his best based on these rankings.

wat
As best as I can see the problem Sampras has with Elo ratings is his relatively weaker opposition. Murray had to play against much stronger opponents, at least as measured by the Elo ratings.
 
I love the idea of ELO ratings but it seems very difficult to make it really work. I adtually did my own elo type thing not that long ago, specifically for Wimbledon. It was very half-*****, done in one evening, and had so many flaws that I didn't even bother to expand on it or even keep the data. However, the results were (based on peak):

1. McEnroe (1984)
2. Federer (2006)
3. Borg (??)
4. Sampras (??)
5. Becker (1989)
6. Edberg (1990)
7/8. Connors/Nadal (??/??)
9. Stich (1991)
10. Murray (2013)

(This was done before this years Wimbledon)

Now, this may not be a perfect set of results, but I defy anyone to say it is worse than any other and I can assure you the time and effort spent was considerably less.
 
Very nice.

Will Novak continue to climb?
My guess is yes.
 
I lol @ anyone spending any considerable time evaluating other people's achievements. Just lol.
You do have a bit of a point here, as cool as this is.

I used to keep massive GOAT Excel spreadsheets, but some time last year I stopped and thought about what I was actually doing.

Another confession: I haven't watched a tennis match in months except the Wimbledon SF and F.
I just watched the score for the USO F while on a walk.

Recently I've been focusing on fitness, studying, and playing tennis myself.
You could say I've become a bit disillusioned with tennis, the same thing happened in 2010-2011 for me.
 
I see many problems with ELO ratings, due to the fact that tennis is not a strictly logical sport the way chess is.
Many aspects of the sport are not reductible to a linear ranking.

1) It doesn't account for the score beyond the final result. Five setters DO mean something (cf. Wawrinka-Djokovic at AO) even if they end in defeat.

2) It has no memory, which supposes the players ignore who is on the other side: totally impossible in tennis, see the Classic Rivalry, or poor Roddick.

3) It doesn't account for styles of play. As of september 2015, 21% odds of Ferrer beating Roger? 4% for Karlovic against Djokovic? 13% for Fognini over Nadal? I don't think so.

4) Surfaces are conflated, which leads to predictive disaster. Wawrinka-Federer: 3-4 on clay, 0-12 on hard.

So for ELO to have significant predictive power it should at least be surface-specific, account for previous defeats, and count the sets/tie-breakers.
Until then, it's just statistical fun.
 
Some great points in the discussion above. My thoughts on response:

1) Elo is a mathematical system which takes into account strength of opposition, based in turn on the strength of the opposition's opposition and so on. By contrast when we tennis fans perceive historical greatness we place a huge premium on major finals. Elo doesn't care about major victories. Therefore ELO is flawed BY DEFINITION.

2) when Murray comes out ahead of Sampras, all this is telling us is, that according to one particular set of rules Sampras underperformed. It doesn't mean we should chuck that set of rules out of the window on one extreme. Or, on the other extreme, to use it as a literal definition of greatness. Rather, as a middle ground, we should add something to our estimation of Murray's underrated greatness. And to remind ourselves that major counts are one important measure of greatness but not the ONLY measure of greatness.

3) would highly recommend people read the actual articles. The authors have addressed some of the points raised here. For example, Jeff has calculated surface-specific ELOs which one poster asked for.
 
Some great points in the discussion above. My thoughts on response:

1) Elo is a mathematical system which takes into account strength of opposition, based in turn on the strength of the opposition's opposition and so on. By contrast when we tennis fans perceive historical greatness we place a huge premium on major finals. Elo doesn't care about major victories. Therefore ELO is flawed BY DEFINITION.

Why would that make ELO flawed by definition? You could argue the opposite, that the ELO ratings show that this hyper focus on Slam finals is flawed. In the end they measure different things.
 
ELO has well-known flaws, although it is interesting.

Really, what we want is to put several dozen to a hundred tiny movement tracking devices on the players, put one inside of the tennis ball at the center, and then mathematically analyze the actual quality of every match by court positioning, speed, acceleration, footwork, pace and spin on shots, depth of shots, closeness to the line, etc.

Any ranking system or rating system that accounts for strength by wins/losses, or even points, will ultimately come up short and not really prove what it is attempting to, because that still doesn't measure the absolute strength of players.

ELO ratings might show Renshaw from the mid 1800s to be approximately as strong as the great players of the modern era, because all it can do is compare how well players in one era perform vis-a-vis one-another. You can also use it to get some kind of idea as to how far a player was above his peers...but again, so what? That doesn't really help in comparing the absolute strength of players across eras.

What we want to figure out is, accounting for unfair advantages like modern rackets/strings, what would happen if all of the great players were in the same era -- who would come out the strongest in that field of greats playing against one-another? No statistical system yet devised will tell us that, because these systems don't actually measure the athleticism and talent of players, nor is it possible for them to.
 
Why would that make ELO flawed by definition? You could argue the opposite, that the ELO ratings show that this hyper focus on Slam finals is flawed. In the end they measure different things.

Agreed. We are saying the same thing but I was not clear enough. What I meant was ELO is inconsistent with respect to fans' intuition by definition.
 
Some great points in the discussion above. My thoughts on response:

1) Elo is a mathematical system which takes into account strength of opposition, based in turn on the strength of the opposition's opposition and so on. By contrast when we tennis fans perceive historical greatness we place a huge premium on major finals. Elo doesn't care about major victories. Therefore ELO is flawed BY DEFINITION.

2) when Murray comes out ahead of Sampras, all this is telling us is, that according to one particular set of rules Sampras underperformed. It doesn't mean we should chuck that set of rules out of the window on one extreme. Or, on the other extreme, to use it as a literal definition of greatness. Rather, as a middle ground, we should add something to our estimation of Murray's underrated greatness. And to remind ourselves that major counts are one important measure of greatness but not the ONLY measure of greatness.

Why do you keep bandying the word "greatness" around as if it had a hard and fast meaning? You're entitled to your opinion on players, but that doesn't in any way undermine ELO computing.


3) would highly recommend people read the actual articles. The authors have addressed some of the points raised here. For example, Jeff has calculated surface-specific ELOs which one poster asked for.
Jeff calculated hard-court and clay, not grass. Then proceeded to ignore surfaces and give generic figures for everyone. You need a different ELO number per surface. The only case where it's unnecessary is predictable conclusions about Djokovic's dominance ("outstanding figures across the board", "this is unreal", "you must be kidding me", etc.).

No, I stand firm by my 4 criticisms 5 posts up. As it stands, ELO is a very distant cousin from the chess measure of skill: rather it is a figure of ability to win individual matches, rather than championships. Not the same thing. You can win 80% of your matches all your life, and never hold an ATP title.
 
Last edited:
No, I stand firm by my 4 criticisms 5 posts up.

Then you obviously haven't read the articles I posted because Carl Bialik explicitly calculates set-level ELOs, and discusses the pros and cons of that approach, and Jeff calculates ELO by surface. So you're already down to 2 criticisms.

More importantly what do you think is the perfect metric? Major victories?!?!

My point all along has been that ELO, for all its flaws, adds something valuable to the discussion. So either it is your position that ELO is literally worthless, or we don't really disagree on anything.
 
Then you obviously haven't read the articles I posted because Carl Bialik explicitly calculates set-level ELOs, and discusses the pros and cons of that approach, and Jeff calculates ELO by surface. So you're already down to 2 criticisms.

More importantly what do you think is the perfect metric? Major victories?!?!

My point all along has been that ELO, for all its flaws, adds something valuable to the discussion. So either it is your position that ELO is literally worthless, or we don't really disagree on anything.
I saw he mentioned the sets ELO figures, but I couldn't find them on the page. If they could be used then yes, I'd be down to 2.

ELO is not an magic number, it's a statistic. Stats can and should be tweaked to fit reality better. No reason to stop short the way it is now. Even the Elo system for chess was modified several times after Arpad Elo's death.

Gosh no, slam count is overhyped enough on here as it is. This is refreshing (albeit still a little frustrating).
 
As best as I can see the problem Sampras has with Elo ratings is his relatively weaker opposition. Murray had to play against much stronger opponents, at least as measured by the Elo ratings.

Ding!

Here's a chart with the actual figures, as calculated by a reddit user
BFRDqHN.png
 
Last edited:
What about an elo system where your opponent is the average rating of all players in the round? So your rated against the field?

I think one of the problems with Sampras is that players of his era were not as consistent/the era was deeper causing more upsets (delete as applicable) thus he is punished for things out of his control i.e not playing the strong competition as often.

Some era's will have strong early rounds, some strong later rounds and it evens out.
 
Back
Top