GOAT - excel model

Amygdal

New User
Hi all, new here. Love tennis but in all reality like analyzing stuff on excel even more.

So I have been working on two excel models recently. One is an NBA model (I like basketball better than tennis), but that's not for here - the other one is for here, and it answers, analytically speaking, a lot of the questions I see repeatedly on this forum.

Note, that the model is not complete, I still want to add a few things, but for what it is now, let me describe it.
Also note, that this model is only for the open era. anything before 1968 is not here:

Fair warning: the first part of this post, is mathematical, if you want to skip it please do, but it is very important in order to understand how I got to the rankings.

When coming to choose GOAT, there are usually two main problems:
1. How do you compare eras and competition
2. What kind of values do you give to any accomplishment. is one grand slam win greater than winning 3 master series tournaments?

My model answers these questions and more, but first the prerequisites for entrance to the model - a player must have, either/or:

1. won a master series / Grand prix tournament, at the very least
2. got to a grand slam final
3. even if he did neither, but ended a year ranked at the top 10 - he can enter the model.
Players excluded? not many, some names I can mention though are Monfils and Magnus Gustafson.

I would also like to emphasize that my model rewards accomplishments, but does not punish failures. for example - Nadal might have lost in the first round of Wimbeldon this year, but it doesn't matter. he of course would not gain any points for not advancing, but he would also not lose anything.

What are the accomplishments analyzed? well, these are the brackets:
1. Any tournament win.
2. Reaching the final of a masters series tournament. For purposes of the model, I decided to treat the WCT and the grand slam cup (circa 70-89 and then 90-99) as masters series tournaments.
3. Winning a masters series tournament.
4. Reaching a world cup final, world masters ,or whatever that end of the year tournament was called during the years.
5. Winning it. notice, that for this model purposes, I treated an Olympic win / final as a World masters. I believe it is as prestigious historically although the ATP rewards it with less points.
6. Reaching a GS final
7. Winning a GS.

In terms of value for each of these accomplishments, my aim was to reach an analytic number for:
GS > WC/OLYMPICS > GS final > WC/OLYMPICS final > Masters series/WCT/grand slam cup > Masters series etc. final > any other tournament win.

On to the values:

1. any 250-500 tournament win = 1 point
2. Grand slams: calculated as the total number of gs won / by the total number of 250-500 tournaments won by players that won at least one grand slam.
the number is 6.22 which kind of makes sense, seeing that a GS awards between 4 and 8 times more ATP points than a 250-500 tournament
=6.22

3. To solve for what a Masters series or WCT win are valued at, I added the total numbers of those tournament wins in the open era, and divided by the total number of grand slams won. I then benchmarked against the number I got earlier for the GS value 6.22 - so:
6.22 / (MS+WCT WINS)/(GS WINS) = 2.66

4. for the value of a grand slam final, I simply, took the ratio between the number of players (count function, not sum) who won masters vs. those who got to a GS final (105/85), and multiplied by the value derived for a masters series at 2.66, so 2.66*(no. of players to win masters/no. of players to advance to a GS final) = 3.29

Another reason this makes sense is because of the resulted ratio between a GS win and a GS final, which is in this model 3.29/6.22 = 0.53, which is a better number for me, than the ATP's 0.6 (1200/2000), because I believe that a win is just worth more. In fact, if I could have have solved for a number around 0.45 I would have been even happier. But I am trying to be objective, and this is the number.

5. Since I have a ratio for win vs. final for a GS, I just used it for a masters series win vs. masters series final, at 0.53, so a MS/WCT/GS CUP final is worth 1.41 points.

6. Perhaps the hardest and most subjective were the WC, or the world tour finals. Here I actually used ATP. if according to ATP, a WC is worth exactly the average between a GS and a masters series tournament, well - that is what I will use too. The reason for this is that there have been too few of these (only once a year) to really play with the numbers.
However the average between GS and MS tours is (6.22+2.66)/2= 4.44

7. Using the win/final ratio here (0.53), solves a value of 2.34 for a WC or Olympic final

All right then, we have all the values, and now it's just a question of plugging in. In fact, all these values are accurate and correspond to the underlying assumptions of:
GS > WC/OLYMPICS > GS final > WC/OLYMPICS final > Masters series/WCT/grand slam cup > Masters series etc. final > any other tournament win.

NOT QUITE!!!!!!!!!!!

You see, this is not enough. There is one more thing I needed to solve for, the strength of competition, or what is a GS in the year 1985 really worth vs. one in say... 2011. How do you solve that.

Well, the next stage was to actually assign a number, or factor if you will, for each season, and for this, for the first time in this model, I used the world rankings.
Note: I only have the world ranking beginning 1973. I arbitrarily assigned a number to 1968-1973 just because I didn't have the energy to start digging and thinking about tackling this one. After all, I think I did enough so far.

Assigning seasons factors, was done this way:
1. First, I took the first phase of my analysis and plugged in to get numbers for every single player who fit the criteria.

2. Then, I took the ATP world end rankings. Threw away anyone who didn't end in the top 10, and chose to look at the top 10 instead.

3. Next, I assigned value percentages to each top 10 location. in the following fashion, and I will illustrate with an example.
Say Federer for sake of argument, has 100 points in my model (he actually has more, we'll get there. I am just trying to get to my point).
If he ended the given year at No. 1, he would have season contribution of the whole 100 points.
However, if he ended the season as No. 2, he would only be assigned 90% of his total points. at No. 3 he would have 80% etc.

Note that if a player never got to the No. 1 position (say murray)than the 100% of his points are at No. 2 (his highest achieved position) so if he ends the season at No. 3, he gets 90% of the points etc.

4. Next we add up all the points for the top 10.

5. However, there is one more correction. a player can't get more points than the player above him in any given year. For example, say federer ends at No. 3 and gets 80% of his total points - these 80% might be higher than the no. of points that Djokovic or Murray have, even though they are ranked higher and got 100%, just because Federer accomplished more -
Since that is impossible, a player can never get more points that any player above him in the rankings. he is limited by a ceiling set by those above him. so he can really get: (the MINIMUM between his % of his total points - OR the points of those above him).

6. And again, we sum all the totals, after the corrections for the top 10.

7. Now we actually have a number or a value for every season. Here a new problem arises.

You see, a really good season (e.g. 2009) can be worth up to three times more than a really bad season (e.g. 2000), since this is unfair, I did the one most controversial and up for debate thing in this whole model - I caped the difference - So that even the very worse season can never be worth less than 75% of the very best season.
Why 75%? No reason whatsoever, except that it sounded like a good number. If anyone wants me to change it, it is very easy to do so.


__________________
So, we now have two things:
1. The total value for each achievement of each player
2. The assigned value for a season's strength.

So, first things first: a season ranking:
1 1985
2 1982
3 1981
4 1978
5 1984
6 1979
7 1980
8 1990
9 1987
10 1983
11 2009
12 1975
13 2007
14 1994
15 2008
16 1977
17 2010
18 1974
19 2005
20 1976
21 1986
22 1989
23 1995
24 1988
25 2006
26 2011
27 1996
28 2004
29 1973
30 2012
31 1991
32 1992
33 1993
34 1998
35 1999
36 1997
37 2002
38 2001
39 2003
40 2000
 

Amygdal

New User
part 2

All we have to do now, is multiply, and here are the results - Mostly expected, with two huge surprises. One of which, comes at the very top.
Also note, that in the parenthesis, I added the ranking of the respective players, if (2) above - the season value, was not assigned:



1. Lendl (!!!!!) (2)
2. Federer (1)
To reverse these numbers, I need to assign a minimum season factor of 0.87 instead of 0.75, and in all truth, I am reluctant to do so.

3. Connors (3)
4. McEnroe (6)
5. Nadal (5)
6. Borg (7)
7. Sampras (!!!) (4)
8. Agassi (8)
9. Becker (9)
10. Vilas (14)
11. Edberg (13)
12. Djokovic (11)
13. Laver (again only open era)
14. Wilander
15. Nastase
16. Ashe
17. Murray
18. Rosewall
19. Newcombe
20. Chang
21. Orantes
22. Muster
23. Courier
24. Roddick
25. Stan Smith

The list goes on and on to No. 166 (Portas)

Surprise surprise. But the real question, is can you point at where the model is wrong:

Well, I do want to point to a few fallacies, however, solving these fallacies will be a problem and will require subjective judgment once again.

1. The varying importance of tournaments throughout the years. e.g. Australia in the 70's, the WCT. Heck, even the GS cup was more important in 1991 than it was in 1998.
I don't know how to solve that.

2. The model does not penalize failures. Lendl can be No. 1 here, but the man lost 11 of his 19 Majors which is a lot (but also won 5 world tour championships, and a bunch of WCT's and master series)

3. The 0.75 number for a season minimum is questionable and not analytic in nature. Although, if I used the real ratio numbers, things would have looked even better for Lendl and worse for Sampras.

4. Longevity and averages. I actually modeled this, but I need to work on it more. Connors is so high, not only because he was a great, but also because he played for 19 seasons, accumulating points along the way. Wouldn't it be nice if I could model for an "average" number of season - say... 8, which is by the way, the number I found for the mode longevity of a "great" career. I will do so in the next step, if there is interest. although from what I have seen, Lendl is really on the top here too, along with Federer.

5. The model does not tell us a thing about actual tennis level. I don't know and can't tell - what the model does a very good job of telling us - is the level of competition in any given year. That's why the 80's are ranked so high - they just had a lot of overachievers back then, Connors, McEenroe, Lendl ,Borg at the beginning, Becker and Edberg at the end.
That's why recent years show us the same, with 3-4 greats playing the game currently.

However, in Sampras's years, well, Becker was nearing the end of his career. Agassi was away for two years at Sampras prime. Courier came for 2 years and fell right back - there was no consistency. And this is not me saying this - it's the model.

Anyway, that's all for now. I hope you find it interesting, and I would be happy to send the excel sheet if there's any interest.
 

NatF

Bionic Poster
Very interesting, Lendl and Connors being so high supports another list...
 
Last edited:

r2473

G.O.A.T.
I think you forgot to carry the 1 is step #247......

If you had done so. I believe you will see that Bill Tilden is GOAT.
 

timnz

Legend
For the record

For the record I think these kinds of effort are a good thing. It takes dedication to look over decades of tennis and try to come up with a fair criteria. It would be good to see your calculations alongside each rated player.

In my opinion the only thing that is really wrong here is the value placed to the Olympics. It has only been a valued event since 2008 ie there have only been 2 events where it has been valued highly. I think that it should settle around 1250 points (compared to 2000 for a Slam) raised from its current 750. It should never be rated higher than a year end championship though - because that would ruin the year end championships tradition over decades of being the top even outside the slams.

I have done my own rankings. The most controversal aspect was the weighting of each of the events. In the end I decided that the best way was just to use to current weighting given by the ATP. Again in my list Lendl rises up very high. (From http://tt.tennis-warehouse.com/showthread.php?t=463381 ):

Like to see us talk about Slams + Season end finals + masters 1000 rather than just Slams, when it comes to evaluating players Open era careers. The season end finals is now a tournament with a rich and strong tradition with great depth of players (over 40 years and top 8 respectively) and the masters 1000's or equivalents pre-1990 have very deep fields. Also there is the WCT finals to consider.

I have only included tournaments of Masters 1000 equivalency and greater to take away the discussion about the depth of field that the older players had to deal with vs today. The thinking is that if we only consider these tournaments of top value then that goes someway to levelling the playing field.

So how to go somewhere to creating a level playing field between current players who tend to play 4 slams a year vs older players of the 70's and early 80's who tended to play only 3 slams a year? Players pre-mid 1985 tended to only play 3 Slams a year versus players today playing 4. There is also the other issue of the WCT finals which was a very important event and the need to include it. Winning it was a great achievement and that fact shouldn't be lost in Open era history. Having said that, players shouldn't get 6 events + Masters 1000 equivalents, where they can gain points in this methodology, because that would be unfair to modern players who only get 5 events + Masters 1000's where they can gain points. The solution proposed is to ONLY include Dallas if a player who won the WCT finals didn't play all the slams in that year. That way the modern players are not disadvantaged. So for example, Lendl's 1982 WCT finals win gets included because he didn't play all the slams that year but his 1985 win doesn't get included because he played all the slams that year. In McEnroe's case only 4 out of 5 of his WCT finals get included as he played all the slams in 1983 when he won the 1983 Dallas event. Becker in 1988 didn't play in all the slams but he did win the WCT finals (over Edberg), as was the case with Connors in 1977 and 1980 and Borg in 1976.

Weightings
Slams + Season End Finals and WCT finals (only if the player didn't play all the Slams that year) + Losing Finals in Slams + Masters 1000 equivalents, with a weighting factor depending on the importance of the event ie 2 x for slams, 1.4 for Season end finals * (including WCT finals), 1.2 for Losing slam finals, 1 x for Masters 1000 equivalents

* I weight the Season end finals at 1.4. The reason for this is that not all of the Masters Cup winners won the tournament in an unbeaten fashion. For instance 1 of Federer's 6 wins he lost a match in the round robin. In 2001 Hewitt was an unbeaten winner but as the 2002 winner he lost one round robin match. No one has lost more than 1 match and gone on to win the tournament - so I thought on average then we could weight it half way between an unbeaten winner (1500 points) and a one match loser (1300 points) but overall winner - to arrive at 1.4. (Currently in the ATP each round robin win is worth 200 points).

NOTE: You may disagree with the weightings. But remember these are not my weightings. They are the present ATP weightings for tournaments. Every time I post these rankings using these weightings people disagree with them, which of course they have a right to do. The problem is, how can we come to an agreement about them with so many opinions? We can't of course. The best I can do is just use the current ATP weightings.

Calculations

Federer = (17 x 2) + (6 x 1.4) + (7 x 1.2) + (21 x 1) = 71.8

Lendl = (8 x 2) + ((5 + 1) x 1.4)) + (11 x 1.2) + (22 x 1) = 59.6

Nadal = (12 x 2) + (0 x 1.4) + (5 x 1.2) + (26 x 1) = 56

Sampras = (14 x 2) + (5 x 1.4) + (4 x 1.2) + (11 x 1) = 50.8

McEnroe (7 x 2) + ((3 + 4) x 1.4)) + (4 x 1.2) + (19 x 1) = 47.6

Borg = (11 x 2) + ((2 + 1) x 1.4)) + (5 x 1.2) + (15 x 1) = 47.2

Connors = (8 x 2) + ((1 + 2) x 1.4)) + (7 x 1.2) + (17 x 1) = 45.6

Agassi = (8 x 2) + (1 x 1.4) + (7 x 1.2) + (17 x 1) = 42.8

Becker = (6 x 2) + ((3 + 1) x 1.4)) + (4 x 1.2) + (13 x 1) = 35.4

Djokovic = (6 x 2) + (2 x 1.4) + (5 x 1.2) + (14 x 1) = 34.8

Edberg = (6 x 2) + (1 x 1.4) + (5 x 1.2) + (8 x 1) = 27.4

Wilander = (7 x 2) + (0 x 1.4) + (4 x 1.2) + (8 x 1) = 26.8
 
Last edited:

NatF

Bionic Poster
I'd be interested in seeing how you rank the various top 10s? Having the career slam should carry some points too imo. Nevermind seen that you've done it..._
 
Last edited:

timnz

Legend
Laver

All we have to do now, is multiply, and here are the results - Mostly expected, with two huge surprises. One of which, comes at the very top.
Also note, that in the parenthesis, I added the ranking of the respective players, if (2) above - the season value, was not assigned:



1. Lendl (!!!!!) (2)
2. Federer (1)
To reverse these numbers, I need to assign a minimum season factor of 0.87 instead of 0.75, and in all truth, I am reluctant to do so.

3. Connors (3)
4. McEnroe (6)
5. Nadal (5)
6. Borg (7)
7. Sampras (!!!) (4)
8. Agassi (8)
9. Becker (9)
10. Vilas (14)
11. Edberg (13)
12. Djokovic (11)
13. Laver (again only open era)
14. Wilander
15. Nastase
16. Ashe
17. Murray
18. Rosewall
19. Newcombe
20. Chang
21. Orantes
22. Muster
23. Courier
24. Roddick
25. Stan Smith

The list goes on and on to No. 166 (Portas)

Surprise surprise. But the real question, is can you point at where the model is wrong:

Well, I do want to point to a few fallacies, however, solving these fallacies will be a problem and will require subjective judgment once again.

1. The varying importance of tournaments throughout the years. e.g. Australia in the 70's, the WCT. Heck, even the GS cup was more important in 1991 than it was in 1998.
I don't know how to solve that.

2. The model does not penalize failures. Lendl can be No. 1 here, but the man lost 11 of his 19 Majors which is a lot (but also won 5 world tour championships, and a bunch of WCT's and master series)

3. The 0.75 number for a season minimum is questionable and not analytic in nature. Although, if I used the real ratio numbers, things would have looked even better for Lendl and worse for Sampras.

4. Longevity and averages. I actually modeled this, but I need to work on it more. Connors is so high, not only because he was a great, but also because he played for 19 seasons, accumulating points along the way. Wouldn't it be nice if I could model for an "average" number of season - say... 8, which is by the way, the number I found for the mode longevity of a "great" career. I will do so in the next step, if there is interest. although from what I have seen, Lendl is really on the top here too, along with Federer.

5. The model does not tell us a thing about actual tennis level. I don't know and can't tell - what the model does a very good job of telling us - is the level of competition in any given year. That's why the 80's are ranked so high - they just had a lot of overachievers back then, Connors, McEenroe, Lendl ,Borg at the beginning, Becker and Edberg at the end.
That's why recent years show us the same, with 3-4 greats playing the game currently.

However, in Sampras's years, well, Becker was nearing the end of his career. Agassi was away for two years at Sampras prime. Courier came for 2 years and fell right back - there was no consistency. And this is not me saying this - it's the model.

Anyway, that's all for now. I hope you find it interesting, and I would be happy to send the excel sheet if there's any interest.

A lot of times people forget to count Laver's 1968 and 1969 Masters 1000 equivalents. I wonder if you have? He has 14 Masters 1000's equivalents. (Interestingly enough he won all of them after the age of 30!):

1968 Los Angeles (PSW Open) - this one was after Rod's 30th Birthday. Tournament was in September, Rod's 30th birthday was the month before in August
1969 Philadelphia
1969 South African Open
1969 Boston (US Pro)
1969 Wembley
1970 Johannesburg
1970 Sydney
1970 Los Angeles (PSW Open)
1970 Wembley
1970 Philadelphia
1971 Rome
1972 Philadelphia
1974 Philadelphia
1974 Las Vegas
 
Last edited:

Amygdal

New User
A lot of times people forget to count Laver's 1968 and 1969 Masters 1000 equivalents. I wonder if you have? He has 14 Masters 1000's equivalents. (Interestingly enough he won all of them after the age of 30!):

1968 Los Angeles (PSW Open) - this one was after Rod's 30th Birthday. Tournament was in September, Rod's 30th birthday was the month before in August
1969 Philadelphia
1969 South African Open
1969 Boston (US Pro)
1969 Wembley
1970 Johannesburg
1970 Sydney
1970 Los Angeles (PSW Open)
1970 Wembley
1970 Philadelphia
1971 Rome
1972 Philadelphia
1974 Philadelphia
1974 Las Vegas

Interesting and no I didn't since wiki gave me only 1970 on. Do you have a source for the data?

Thanks!
 

Amygdal

New User
For the record I think these kinds of effort are a good thing. It takes dedication to look over decades of tennis and try to come up with a fair criteria. It would be good to see your calculations alongside each rated player.

In my opinion the only thing that is really wrong here is the value placed to the Olympics. It has only been a valued event since 2008 ie there have only been 2 events where it has been valued highly. I think that it should settle around 1250 points (compared to 2000 for a Slam) raised from its current 750. It should never be rated higher than a year end championship though - because that would ruin the year end championships tradition over decades of being the top even outside the slams.

I have done my own rankings. The most controversal aspect was the weighting of each of the events. In the end I decided that the best way was just to use to current weighting given by the ATP. Again in my list Lendl rises up very high. (From http://tt.tennis-warehouse.com/showthread.php?t=463381 ):

Like to see us talk about Slams + Season end finals + masters 1000 rather than just Slams, when it comes to evaluating players Open era careers. The season end finals is now a tournament with a rich and strong tradition with great depth of players (over 40 years and top 8 respectively) and the masters 1000's or equivalents pre-1990 have very deep fields. Also there is the WCT finals to consider.

I have only included tournaments of Masters 1000 equivalency and greater to take away the discussion about the depth of field that the older players had to deal with vs today. The thinking is that if we only consider these tournaments of top value then that goes someway to levelling the playing field.

So how to go somewhere to creating a level playing field between current players who tend to play 4 slams a year vs older players of the 70's and early 80's who tended to play only 3 slams a year? Players pre-mid 1985 tended to only play 3 Slams a year versus players today playing 4. There is also the other issue of the WCT finals which was a very important event and the need to include it. Winning it was a great achievement and that fact shouldn't be lost in Open era history. Having said that, players shouldn't get 6 events + Masters 1000 equivalents, where they can gain points in this methodology, because that would be unfair to modern players who only get 5 events + Masters 1000's where they can gain points. The solution proposed is to ONLY include Dallas if a player who won the WCT finals didn't play all the slams in that year. That way the modern players are not disadvantaged. So for example, Lendl's 1982 WCT finals win gets included because he didn't play all the slams that year but his 1985 win doesn't get included because he played all the slams that year. In McEnroe's case only 4 out of 5 of his WCT finals get included as he played all the slams in 1983 when he won the 1983 Dallas event. Becker in 1988 didn't play in all the slams but he did win the WCT finals (over Edberg), as was the case with Connors in 1977 and 1980 and Borg in 1976.

Weightings
Slams + Season End Finals and WCT finals (only if the player didn't play all the Slams that year) + Losing Finals in Slams + Masters 1000 equivalents, with a weighting factor depending on the importance of the event ie 2 x for slams, 1.4 for Season end finals * (including WCT finals), 1.2 for Losing slam finals, 1 x for Masters 1000 equivalents

* I weight the Season end finals at 1.4. The reason for this is that not all of the Masters Cup winners won the tournament in an unbeaten fashion. For instance 1 of Federer's 6 wins he lost a match in the round robin. In 2001 Hewitt was an unbeaten winner but as the 2002 winner he lost one round robin match. No one has lost more than 1 match and gone on to win the tournament - so I thought on average then we could weight it half way between an unbeaten winner (1500 points) and a one match loser (1300 points) but overall winner - to arrive at 1.4. (Currently in the ATP each round robin win is worth 200 points).

NOTE: You may disagree with the weightings. But remember these are not my weightings. They are the present ATP weightings for tournaments. Every time I post these rankings using these weightings people disagree with them, which of course they have a right to do. The problem is, how can we come to an agreement about them with so many opinions? We can't of course. The best I can do is just use the current ATP weightings.

Calculations

Federer = (17 x 2) + (6 x 1.4) + (7 x 1.2) + (21 x 1) = 71.8

Lendl = (8 x 2) + ((5 + 1) x 1.4)) + (11 x 1.2) + (22 x 1) = 59.6

Nadal = (12 x 2) + (0 x 1.4) + (5 x 1.2) + (26 x 1) = 56

Sampras = (14 x 2) + (5 x 1.4) + (4 x 1.2) + (11 x 1) = 50.8

McEnroe (7 x 2) + ((3 + 4) x 1.4)) + (4 x 1.2) + (19 x 1) = 47.6

Borg = (11 x 2) + ((2 + 1) x 1.4)) + (5 x 1.2) + (15 x 1) = 47.2

Connors = (8 x 2) + ((1 + 2) x 1.4)) + (7 x 1.2) + (17 x 1) = 45.6

Agassi = (8 x 2) + (1 x 1.4) + (7 x 1.2) + (17 x 1) = 42.8

Becker = (6 x 2) + ((3 + 1) x 1.4)) + (4 x 1.2) + (13 x 1) = 35.4

Djokovic = (6 x 2) + (2 x 1.4) + (5 x 1.2) + (14 x 1) = 34.8

Edberg = (6 x 2) + (1 x 1.4) + (5 x 1.2) + (8 x 1) = 27.4

Wilander = (7 x 2) + (0 x 1.4) + (4 x 1.2) + (8 x 1) = 26.8

I actually think the main difference between us, is that I took into account, other tournaments won
And also that I factored in the season strength

But it is reassuring to see Lendl high on a different list
 

Amygdal

New User
I'd be interested in seeing how you rank the various top 10s? Having the career slam should carry some points too imo. Nevermind seen that you've done it..._

True. I see the value. I'll try to come up with a sensible and objective number
 
Thanks for the info. What's telling is that assigning values for different criterion (weighting) is very challenging. Also, I do appreciate timnz' use of weightings in congruence with current ATP weightings. So no matter how objective we are with criterion applied, subjectivity cannot be avoided.Choosing the exact criterion to be used in this type of model and then weighting that criterion necessarily involves some subjectivity. It is very challenging to try and consistently value the Olympics, and even the WTF just to begin with. There are so many variables sometimes even going from a year to the next. For example, the Masters was the biggest tournament for the 79-80 seasons, with little doubt. It doesn't quite fit when you try and assign current WTF weightings to a tournament that big in that time period. It had huge crowds at New York's Madison Square Garden when the AO was not even a top ten tournament. It was the 4th biggest tournament in those years. Yet, there's no easy solution. Excellent info.
 
Last edited:

firepanda

Professional
I can certainly see the effort you've put into this, and your results are interesting. I'm still trying to see exactly where Sampras came out so low and Lendl so high. Do you do this sort of thing as your day job?

My main criticism is that I disagree with the whole notion of trying to balance so many factors. For complex systems, you usually try to measure one or a few single variables that correlate well with other variables and are easy to measure and do maths to. :) In the case of climate change research, for instance, they use mean temperature or CO2 levels to measure and predict climate change. Economists use the volatility index or measure the independence of different aspects of the market to measure the global markets. For this reason, I prefer the idea of sticking with something like Grand Slams or total points gained or something to figure who's bestest in tennis. :D

I like what you're doing though and I shall be following this thread with much interest from here on. :p
 

10is

Professional
you usually try to measure one or a few single variables that correlate well with other variables and are easy to measure and do maths to. :)

What? No you don't! If your predictors are highly correlated with one another you run into the problem of multicollinearity - in which case you are better served performing a factor analysis and creating a scale or an index. I assume you meant predictors that are correlated with the "outcome" - not with each other.
 

firepanda

Professional
What? No you don't! If your predictors are highly correlated with one another you run into the problem of multicollinearity - in which case you are better served performing a factor analysis and creating a scale or an index. I assume you meant predictors that are correlated with the "outcome" - not with each other.

Sorry, yes. I should've said 'representative of the outcome' of something. That whole post was struggling grammatically. :/ But you get what I mean. Too complicated.
 

Amygdal

New User
I can certainly see the effort you've put into this, and your results are interesting. I'm still trying to see exactly where Sampras came out so low and Lendl so high. Do you do this sort of thing as your day job?

My main criticism is that I disagree with the whole notion of trying to balance so many factors. For complex systems, you usually try to measure one or a few single variables that correlate well with other variables and are easy to measure and do maths to. :) In the case of climate change research, for instance, they use mean temperature or CO2 levels to measure and predict climate change. Economists use the volatility index or measure the independence of different aspects of the market to measure the global markets. For this reason, I prefer the idea of sticking with something like Grand Slams or total points gained or something to figure who's bestest in tennis. :D

I like what you're doing though and I shall be following this thread with much interest from here on. :p

Yes, excel models (finance and strategy) are pretty much what I do for a living.

Why is Sampras so low?
1. Not many regular tournaments wins, in comparison to Lendl or Connors for example. and in my model if you win 25 tournaments more, that compares to 4 GS.

2. Less Master series titles than would have been expected from someone with 14 GS.

3. Most importantly - people talk about Federer competing in a weak era, and it's true - the first half of the millennium was extremely weak.
BUT in comparison to the 80's (with McEnroe, Lendl, Connors throughout, with Borg and Vilas at the beginning and Edberg and Becker in the end - the 80's were a great era.
Another great era is anything after 2005-6, with Djoko, Murray and of course Nadal and Federer. we are talking about 4 of the top 20 players of all time. 2 of which are top 5. Great era.

the 90's ... well, not so much. Only Agassi really was a top player, and he was out of the picture for a couple of years. Even Courier, another top 30 player was only good for a couple years 92-94 or so. Everybody else after 95 till 2000 was very mediocre (Kafelnikov, Kuerten, Bruguera, Rafter. A bunch of people who are just not good enough to make the era worth-while.
And that is why Sampras really suffers in this model - He achieved a lot with no competition.
 

aldeayeah

G.O.A.T.
By the way, the "season weighting" parameter is the most arbitrary of all.

It follows the principle that domination by a few players is stronger than a more level, deep field.

That principle is very arguable.
 

Amygdal

New User
By the way, the "season weighting" parameter is the most arbitrary of all.

It follows the principle that domination by a few players is stronger than a more level, deep field.

That principle is very arguable.

Not quite true.

To illustrate, I just used an hypothetical season with no real GOAT on top but overall a lot of very good players in the top 10:

1 Agassi
2 Edberg
3 Rosewall
4 Muster
5 Chang
6 Smith
7 Gerulaitis
8 Kuerten
9 Okker
10 Stich

Just what you wanted - a level field pretty much.

This hypothetical season would have been ranked 16 (our of 40) in the model. Not so bad, and definitely the top half of the draw. And this is with only 1 top 10 player (Agassi) who's not even close to top 5.

I am not very objective in this matter, but to me - the one biggest take-away in the model, is actually the season strength. If not for any other reason, than because it's new and fresh (Other models were done for GOAT) and makes sense in many ways.

Let me ask something else. What makes anybody think that Kafelnikov, Stich, Muster and the likes (other players from 1995-2000) were that much competition?
 

aldeayeah

G.O.A.T.
Maybe I didn't quite explain myself - this is the real problem:

A player that has a brief high-level peak, and therefore a low point total, can bring down the score of the whole year by ranking above other players with more accomplished careers!

Let me ask something else. What makes anybody think that Kafelnikov, Stich, Muster and the likes (other players from 1995-2000) were that much competition?

The fact that they were the best competition available in those years.

Again, the fact that they had short peaks don't neccesarily mean they were worse players. Muster won lots of stuff in his best year.
 

Amygdal

New User
got it.
If I do use averages (or better yet - a weighted distribution of average form and longevity, to exclude such guys as Rios ), than it should work better.

I'll tweak
 

BHud

Hall of Fame
Given that GOAT is highly subjective (even in a spreadsheet where it comes down to the weightings one puts on the various accomplishments), isn't all this just mathematical masturbation? Enjoy, but I prefer to go out on a date!
 

Russeljones

Talk Tennis Guru
I also agree about the Olympics' weight being inflated. And the ATP points method really does mean someone who won 5 Masters 1000 titles is ranked higher than a single Slam winner, which we all know is unrealistic.
 

Cormorant

Professional
Does anyone know Albert Portas? I think he'd be overjoyed to hear that he's the 166th-best player of all time.

This is fascinating material all round. I think consistency is sometimes underappreciated in these debates, and it was certainly Ivan and Jimmy's strong suit.
 

Amygdal

New User
Portas is not exactly 166th, he is in this model, but I excluded players who did not meet the minimum terms.
Had Gustaffson for example, met the minimum terms, he would have been ranked around 125 or so, and there are of course others.

However, of course and this goes without saying, anybody who is in the GOAT debate and conversation, did meet the minimum terms.
 

Amygdal

New User
I would also note that even if the Olympics are inflated, it matters very little. so Nadal and Murray were bumped by a few points and someone like Massu is in the model. It is almost insignificant, in the grand scheme of things.

However, I'll tweak. Perhaps value it as a masters series win instead of a WC caliber win.
 

Jackuar

Hall of Fame
Very good analysis Amygdal. Appreciate the effort you've put into this. A very fair analysis and I completely agree with the results.

Just to add some confusion here (or some fuel for argumet let's say).

How do we value something that's not just a win but more than that... I mean, Fed's 2*5 consecutive GS at Wimbly and USO.. Fed 237 weeks, SF streak ( QF streak is a bit over-rated in my opinion but SF is really worth something)... Rafa's 8 FO, Clay-Grass double and twice! Djoko 42-0 streak, Mac's 43-0 streak...

When we say we need a model to define and clarify GOAT status, these are things that add to the value of a player beyond just points and rankings...

And in that sense, I only expect Federer, Nadal and Laver to be higher than where they're in your list but otherwise I completely agree with the rankings your model has produced.
 

firepanda

Professional
I also agree about the Olympics' weight being inflated. And the ATP points method really does mean someone who won 5 Masters 1000 titles is ranked higher than a single Slam winner, which we all know is unrealistic.

Is it? Before Murray had won the US Open, he had won God only knows how many masters tournaments, but not a slam. And yet many did consider him 'greater' than, say, Gaudio. And also, we're talking mainly about the higher levels, not for single slam winners.
 

NatF

Bionic Poster
Is it? Before Murray had won the US Open, he had won God only knows how many masters tournaments, but not a slam. And yet many did consider him 'greater' than, say, Gaudio. And also, we're talking mainly about the higher levels, not for single slam winners.

Murray had been in many slam finals and semi's though. Basically he was a consistant threat in slams in the later rounds for years, plus all those masters and a consistant world #4. It's not just masters titles versus slam titles. Careers encompass more than this.
 

MichaelNadal

Bionic Poster
This is really interesting stuff but too much to wrap my head around right now. The list looks about right though for the most part.
 

topher

Hall of Fame
This is an awesome list. What it shows to me is...
1. Numbers can't tell an *exact* story, i.e. Lendl is not the undisputed GOAT in anyone's mind even though he's #1 here.
2. BUT they can tell us something and confirm a lot of our notions. For example, the top 8 seem pretty spot on, even though you can argue the order for thread after thread.

Amazing work OP, hope you keep it up to date. Thanks for sharing.

I would also note that even if the Olympics are inflated, it matters very little. so Nadal and Murray were bumped by a few points and someone like Massu is in the model. It is almost insignificant, in the grand scheme of things.

However, I'll tweak. Perhaps value it as a masters series win instead of a WC caliber win.

I prefer it as a WC equivalent myself, a once every 4 year tournament seems like a pretty big deal to me and I feel like most players (at least recently) have gotten way more motivated for it than for a Master's 1000. But I'm biased :).
 

topher

Hall of Fame
5. However, there is one more correction. a player can't get more points than the player above him in any given year. For example, say federer ends at No. 3 and gets 80% of his total points - these 80% might be higher than the no. of points that Djokovic or Murray have, even though they are ranked higher and got 100%, just because Federer accomplished more -
Since that is impossible, a player can never get more points that any player above him in the rankings. he is limited by a ceiling set by those above him. so he can really get: (the MINIMUM between his % of his total points - OR the points of those above him).

I'd also be interested to know how drastically the season rankings might change if you undid this restriction. I see the logic behind it and I don't disagree, but I'm curious.

Also, I'd note that current players are still adding to their "points" and this will have an extra benefit that will buoy their season rankings which will help both themselves as well as their rivals!

The implementation of that 0.75 season factor has to be the most subjective choice, but there's really no way to pick a right one there. Again, great stuff.
 

topher

Hall of Fame
Amy, not to spam this thread, but before I forget and this thread falls off the front page, I'd like to also pose a question as to how the rankings change when you don't allow that 0.75 season cap at all, and just let great seasons get inflated beyond belief. Also, what's the greatest ratio, i.e. 1985/2000.

I'd also be interested in whether you'd be willing to post this Excel file online for us to look at, perhaps double check if you'd like.
 

Russeljones

Talk Tennis Guru
Is it? Before Murray had won the US Open, he had won God only knows how many masters tournaments, but not a slam. And yet many did consider him 'greater' than, say, Gaudio. And also, we're talking mainly about the higher levels, not for single slam winners.

I thought it's only logical to extrapulate from there. Points from Masters 1000's would be used to bridge gaps/increase gaps between players with different slam tallies. I think that's laughable.
 

dh003i

Legend
I don't think there is any objective way to value season strength, as any models we could use just compare distribution of domination or number of great overall career players.

The only objective way to compare season strength would be to see how well all of the players were playing at a very atomic level. Is a player A who wins a match 6-0 6-0 6-0 playing better than player B who wins a different match 7-5 7-5 7-5? You can't know without considering the quality of the opponents play. This is typically subjectively evaluated. Someday we may be able to mathematically model how well players played based on speed, court positioning, ball-striking, etc, on a point-by-point basis. We don't have the information to do that now.

I think the most objective thing to do is to consider all seasons of equal strength.
 

jg153040

G.O.A.T.
Not quite true.

To illustrate, I just used an hypothetical season with no real GOAT on top but overall a lot of very good players in the top 10:

1 Agassi
2 Edberg
3 Rosewall
4 Muster
5 Chang
6 Smith
7 Gerulaitis
8 Kuerten
9 Okker
10 Stich

Just what you wanted - a level field pretty much.

This hypothetical season would have been ranked 16 (our of 40) in the model. Not so bad, and definitely the top half of the draw. And this is with only 1 top 10 player (Agassi) who's not even close to top 5.

I am not very objective in this matter, but to me - the one biggest take-away in the model, is actually the season strength. If not for any other reason, than because it's new and fresh (Other models were done for GOAT) and makes sense in many ways.

Let me ask something else. What makes anybody think that Kafelnikov, Stich, Muster and the likes (other players from 1995-2000) were that much competition?

Good job for your model. You did a lot of work.

The only problem we have is the strength of competition, since we can only compare them to each other. For example, Roddick cold be better competition in another era, but relative to Fed he looks bad. We can't know.

We also should somehow factor in the current form. I mean a lot of times grand slam champions don't play that well on a day and are not great competition. A journeymen can play higher level of tennis on some days. For example Fed was greater competition in 2006 than is now.

Your formula should somehow take current form into consideration. Also very important suggestion. You shouldn't use rankings as the strength of competition. You should use how well they did on individual surfaces or even tournaments. Because if we use rankings, Sampras is very tough competition on clay. And this year at RG Nadal was nr.4 or was it nr.5? But we all know he is tougher competition on clay. So, we should use surfaces. We also should use only how this guy did on the surface for the last 12 months, to factor in the current form.

Still flawed, because we don't take into account daily form and bad matchups. But still better than just using only rankings. Maybe you should also use the h2h for competition, for bad matchups. For example Davydenko should be very tough competition for Nadal, but easy competition for Federer.

My main point is this, when measuring the strength of competition:
-let's use surfaces or individual tournaments instead of rankings
-let's use h2h for bad matchups for measuring the strength
-let's use only last 12 months for current form

-Since Fedal were getting all atp points and winning all tournaments, this means there is less for competition to get. They can have higher level of play in theory, but on paper look worse. I guess it's impossible to solve this problem, unless you extrapolate for this somehow. Since the level of competition is relative to dominant players, maybe you should use just how competition did vs non top 5 players.

These are my suggestions.
 

sdont

Legend
Interesting stuff. Great effort by the OP.

I'll have some remarks about the model though, especially on the following points:

5. However, there is one more correction. a player can't get more points than the player above him in any given year. For example, say federer ends at No. 3 and gets 80% of his total points - these 80% might be higher than the no. of points that Djokovic or Murray have, even though they are ranked higher and got 100%, just because Federer accomplished more -
Since that is impossible, a player can never get more points that any player above him in the rankings. he is limited by a ceiling set by those above him. so he can really get: (the MINIMUM between his % of his total points - OR the points of those above him).

6. And again, we sum all the totals, after the corrections for the top 10.

7. Now we actually have a number or a value for every season. Here a new problem arises.

You see, a really good season (e.g. 2009) can be worth up to three times more than a really bad season (e.g. 2000), since this is unfair, I did the one most controversial and up for debate thing in this whole model - I caped the difference - So that even the very worse season can never be worth less than 75% of the very best season.
Why 75%? No reason whatsoever, except that it sounded like a good number. If anyone wants me to change it, it is very easy to do so.

IMO, your model should be modified so that it doesn't give you unsatisfactory results in the first place that you feel the need to 'manually' correct.

Ideally, the right way to approach this problem would be to first write a set of criteria that the results should meet, and then optimize the parameters of your model so as to maximize the margins between individual players (kind of like SVM). Not easy to do in practice, though.
 

Amygdal

New User
I'd also be interested to know how drastically the season rankings might change if you undid this restriction. I see the logic behind it and I don't disagree, but I'm curious.

Also, I'd note that current players are still adding to their "points" and this will have an extra benefit that will buoy their season rankings which will help both themselves as well as their rivals!

The implementation of that 0.75 season factor has to be the most subjective choice, but there's really no way to pick a right one there. Again, great stuff.

Sure, very easy tweak. so:

Left side: with restrictions just like in the OP. Right side is without the restrictions

1 1985 1981
2 1982 1985
3 1981 1979
4 1978 1980
5 1984 1982
6 1979 1990
7 1980 1987
8 1990 1988
9 1987 1978
10 1983 1983
11 2009 1989
12 1975 1984
13 2007 1974
14 1994 1975
15 2008 1986
16 1977 2009
17 2010 2008
18 1974 1976
19 2005 2010
20 1976 2005
21 1986 2007
22 1989 1994
23 1995 1977
24 1988 2011
25 2006 2012
26 2011 1995
27 1996 1992
28 2004 1991
29 1973 1973
30 2012 2006
31 1991 1996
32 1992 2003
33 1993 2004
34 1998 1993
35 1999 1999
36 1997 2002
37 2002 2000
38 2001 1998
39 2003 1997
40 2000 2001
 

Amygdal

New User
Good job for your model. You did a lot of work.

The only problem we have is the strength of competition, since we can only compare them to each other. For example, Roddick cold be better competition in another era, but relative to Fed he looks bad. We can't know.

We also should somehow factor in the current form. I mean a lot of times grand slam champions don't play that well on a day and are not great competition. A journeymen can play higher level of tennis on some days. For example Fed was greater competition in 2006 than is now.

Your formula should somehow take current form into consideration. Also very important suggestion. You shouldn't use rankings as the strength of competition. You should use how well they did on individual surfaces or even tournaments. Because if we use rankings, Sampras is very tough competition on clay. And this year at RG Nadal was nr.4 or was it nr.5? But we all know he is tougher competition on clay. So, we should use surfaces. We also should use only how this guy did on the surface for the last 12 months, to factor in the current form.

Still flawed, because we don't take into account daily form and bad matchups. But still better than just using only rankings. Maybe you should also use the h2h for competition, for bad matchups. For example Davydenko should be very tough competition for Nadal, but easy competition for Federer.

My main point is this, when measuring the strength of competition:
-let's use surfaces or individual tournaments instead of rankings
-let's use h2h for bad matchups for measuring the strength
-let's use only last 12 months for current form

-Since Fedal were getting all atp points and winning all tournaments, this means there is less for competition to get. They can have higher level of play in theory, but on paper look worse. I guess it's impossible to solve this problem, unless you extrapolate for this somehow. Since the level of competition is relative to dominant players, maybe you should use just how competition did vs non top 5 players.

These are my suggestions.

Thanks for the input

As in any model, I never said mine was perfect. While I did put a lot of work into it, I believe that using your first suggestion for example, would just yield a lot more work, so I don't know. Looking out surfaces and incorporating in the model, seems a bit tedious. I see the value, but I don't think i'll do it.

I thought about ways to incorporate H2H, and couldn't think of a straight-forward one which will fit into the model. Also, I think that that other model on this forum in recent days (the one where Connors is GOAT, and Bertolucci is a great, while Sampras isn't) overused H2H so much, that it is yielded ridiculous results. So again - really hard.

Regarding your third suggestion, last 12 months. I actually think that's what I did, with ranking the seasons by the end of season ATP ranking, i.e. last 12 months form.
 

topher

Hall of Fame
And for anyone interested, here is the model

https://docs.google.com/file/d/0Bxq5VsGhnlfPTktnUGo1TG1VbGc/edit?usp=sharing

Note - It is not a model I intended to present anywhere, so it could have been clearer and more user friendly.

But hey - I understand it. Please download and change as you please, as my original model is in regular xlsx. format on my computer.

Thanks! This is awesome. Now having looked at it I only have one suggestion:

Concerning the implementation of the season rankings, I see you're currently implementing them as somewhere between 0.75 (year 2000) and 1.00 (1985). You then seem to be stepping incrementally for each year's ranking. This, however, doesn't take into account the difference between years as accurately as it could.

For example, using incremental steps, the difference between 2002 and 2001 is the same as the difference between 2002 and 1997. However, the season points points denote that 2002 and 2001 were comparably weak, while 1997 was quite a bit tougher. To revise that, I implemented a linear interpolation so that these differences be taken into account. So say 1997's got 300 pts, 2000's (the weakest year) got 200 pts and 1985's (the strongest year) got 500 pts. Using your assumed factor of 0.75, the 1997 factor would come out to be: (300-200)/(500-200)*(1-0.75)+0.75

This change helped most years (as no era was as abysmal as the early 2000s), but helped even more the most recent years.

In addition to this, I also took the liberty of making the following changes:

1) Implemented the current Race to London Rankings to get the 2013 season rank (it's now in 28th place) and added 2013's season factor to the 2013 results.

2) Added Stan Wawrinka to the player lists (since he's included in the change #1). Stan comes in at 140 on the list.

3) Added the results of the US Open (1 GS to Nadal, 1 GS Final to Djokovic).

4) Increased Djokovic's yearly #1 ranking time, he now has 100 weeks total.

5) Fixed a bug that I think had to do with you checking those season rankings I'd asked about. I'm sure you're aware of it (it was very obvious as it was skewing the rankings all over the place), essentially the VLOOKUP column for the season factors in the season pages needed to be 9 instead of 5.
 
Last edited:

topher

Hall of Fame
As a result of all of this, the player rankings have now changed. I'll summarize the changes then give the top 25:

1) Nadal is now 4th on the all-time list, barely passing McEnroe (he's still a ways off Connors).

2) Djokovic is now 10th on the all-time list, barely passing Vilas and Edberg (quite a ways off Becker).

3) Edberg has passed Vilas slightly (this is obviously due to my changes).

4) Murray has passed Ashe by a slim margin to 16th.

5) The gap between Federer and Lendl has lessened by a not insignificant amount, while Connors is a distant third. I think Fed could close this gap realistically.

6) Muster is ahead of Orantes at 21.

Here are the rankings:

1 Lendl
2 Federer
3 Connors
4 Nadal
5 McEnroe
6 Borg
7 Sampras
8 Agassi
9 Becker
10 Djokovic
11 Edberg
12 Vilas
13 Laver
14 Wilander
15 Nastase
16 Murray
17 Ashe
18 Rosewall
19 Newcombe
20 Chang
21 Muster
22 Orantes
23 Courier
24 Roddick
25 Smith
 
Last edited:

TMF

Talk Tennis Guru
As a result of all of this, the player rankings have now changed. I'll summarize the changes then give the top 25:

1) Nadal is now 4th on the all-time list, barely passing McEnroe (he's still a ways off Connors).

2) Djokovic is now 10th on the all-time list, barely passing Vilas and Edberg (quite a ways off Becker).

3) Edberg has passed Vilas slightly (this is obviously due to my changes).

4) Murray has passed Ashe by a slim margin to 16th.

5) The gap between Federer and Lendl has lessened by a not insignificant amount, while Connors is a distant third. I think Fed could close this gap realistically.

6) Muster is ahead of Orantes at 21.

Here are the rankings:

1 Lendl
2 Federer
3 Connors
4 Nadal
5 McEnroe
6 Borg
7 Sampras
8 Agassi
9 Becker
10 Djokovic
11 Edberg
12 Vilas
13 Laver
14 Wilander
15 Nastase
16 Murray
17 Ashe
18 Rosewall
19 Newcombe
20 Chang
21 Muster
22 Orantes
23 Courier
24 Roddick
25 Smith

So Lendl is the goat, Laver is at #13, Rosewall is behind Murray.

I would love it if you post them in the "Former Pro Player Talk" forum.
 

topher

Hall of Fame
So Lendl is the goat, Laver is at #13, Rosewall is behind Murray.

I would love it if you post them in the "Former Pro Player Talk" forum.

If you'd read the OP, you'll note that this for the Open Era only so Laver and Rosewall's positions are only natural.

As for Lendl being the goat, multiple attempts to quantify the goat in terms of career achievements have yielded Lendl. This is due to his longevity and consistency, as well as an inability to justify stronger weighting to the GS's (in my opinion). You'll note the top 8 are pretty obvious for the Open era though.
 
Top