Possible Statistical Analysis for Greatness of a Player- Assistance Required

noeledmonds

Professional
Now I am no statistician but I do have some statistical knowledge and training. I contemplated this system to try and incorporate dominance and versatility of a player.

(number of tournaments won) x (winning percentage) = Z

The importance of this first step is that it cancels out longitivity in a career as winning percentage will ultimately drop. It also does not penalise players such as Federer who play a limited schedule and therefore don't win as many tournaments. However those who do play more tournaments do not have their additional tournament victories ignored, merely moderated by their winning percentage. I have called the value obtained Z.

For Muster Z = 3065 (to 4 significant figures)
For Edberg Z = 3146 (to 4 significant figures)

Muster and Edberg receive relatively similar Z values here. Although Muster has won more tournaments, this is cancelled out and more by Edberg’s higher winning percentage.

((1 + number of tournaments won on grass) x (1 + number of tournaments won on clay) x (1 + number of tournaments won on hard courts and carpet))/(number of tournaments won) = Y

This second independent step attempts to account for versatility across the surfaces. A player who wins predominantly on one surface will be penalised. The value is one plus the number of tournaments won to prevent a player from receiving zero for winning no tournaments on a surface. The number of tournaments won is not relevant as it is cancelled out by dividing by the number of tournaments won at the end. I have called this value Y.

For Muster Y = 9.318 (to 4 significant figures)
For Edberg Y = 20.00 (to 4 significant figures)

Here Muster’s Y value is considerably lower than Edberg’s. This reflects Muster’s lack of versatility across the surfaces.

How to combine these two values is where my problem arises. Ranking dominance against versatility is very difficult if not impossible to achieve objectively.

Note I am well aware of the relativly simplistic nature of this analysis and that this ranking system would have flaws, as does any analysis, when it is completed. One of the most obvious flaws is actually also one of the analysises strengths. Obviously the importance of induvidual tournaments in not incopertated. However this does stop one having to rank the importance of tournaments year by year which is a laborious proccess before 1988 and very difficult before 1968.

Does anyone have any idea of a valid progression of this analysis, or should it be disposed of all together?

Nickognito, Moose Malloy, krosero, SgtJohn, Urban, chaognosis, Wuornos and anyone else interested it would be great to hear you views.
 
Last edited:
I follow my fellow countryman Rino Tommasi in claiming that 'matches' are more important than 'tournaments' with the exception of Grand Slams (and maybe the Masters and the Davis Cup).

So my ideal ranking assign points only for winning a match, plus a bonus for a result in grand slam tournaments only. The value of winning a tournaments is a function of the value of the opponents. I can win Bercy beating Hrbaty in the finals, or beating Federer.

But I think a adding a coefficient of versality is a very interesting idea.

Everything imho, obviuosly,

Regards,

c.
 

Steve132

Professional
I think that it's an excellent idea to add versatility to dominance in player assessments. The main issue with doing so is that some surfaces are far more popular than others. Today, for example, about two-thirds of all tournaments are played on hard courts, while very few are played on grass. Carpet is also used far less frequently than it was in the past.
 
Versality is important but dominance on different surfaces is important too.

Agassi is a great player on 4 surfaces, Sampras on three.

But Agassi is a champion on 1 surface only , Sampras on three.

If I imagine Agassi compete with the best of all time, there's no way he could win the French, Wimbledon or the Masters, and it's very difficult he could win the U.S.Open. Maybe he can have 4 finals.

Sampras on the contrary would probably lose in the very first rounds in the French, but coul win Wimbledon, Us Open and the Masters, for sure.

So, when we talk about the 'alltime', the definition of 'greatness' changes.

Agassi is a versatile player, a 'great' player on 4 surfaces. But, in the 'alltime' point of view, Agassi is 'great' only on hardcourts.

So, I think that versality is important, but it's also important the rating of dominance on every surface.

Regards,

c.
 

Wuornos

Professional
Now I am no statistician but I do have some statistical knowledge and training. I contemplated this system to try and incorporate dominance and versatility of a player.

(number of tournaments won) x (winning percentage) = Z

The importance of this first step is that it cancels out longitivity in a career as winning percentage will ultimately drop. It also does not penalise players such as Federer who play a limited schedule and therefore don't win as many tournaments. However those who do play more tournaments do not have their additional tournament victories ignored, merely moderated by their winning percentage. I have called the value obtained Z.

For Muster Z = 3065 (to 4 significant figures)
For Edberg Z = 3146 (to 4 significant figures)

Muster and Edberg receive relatively similar Z values here. Although Muster has won more tournaments, this is cancelled out and more by Edberg’s higher winning percentage.

((1 + number of tournaments won on grass) x (1 + number of tournaments won on clay) x (1 + number of tournaments won on hard courts and carpet))/(number of tournaments won) = Y

This second independent step attempts to account for versatility across the surfaces. A player who wins predominantly on one surface will be penalised. The value is one plus the number of tournaments won to prevent a player from receiving zero for winning no tournaments on a surface. The number of tournaments won is not relevant as it is cancelled out by dividing by the number of tournaments won at the end. I have called this value Y.

For Muster Y = 9.318 (to 4 significant figures)
For Edberg Y = 20.00 (to 4 significant figures)

Here Muster’s Y value is considerably lower than Edberg’s. This reflects Muster’s lack of versatility across the surfaces.

How to combine these two values is where my problem arises. Ranking dominance against versatility is very difficult if not impossible to achieve objectively.

Note I am well aware of the relativly simplistic nature of this analysis and that this ranking system would have flaws, as does any analysis, when it is completed. One of the most obvious flaws is actually also one of the analysises strengths. Obviously the importance of induvidual tournaments in not incopertated. However this does stop one having to rank the importance of tournaments year by year which is a laborious proccess before 1988 and very difficult before 1968.

Does anyone have any idea of a valid progression of this analysis, or should it be disposed of all together?

Nickognito, Moose Malloy, krosero, SgtJohn, Urban, chaognosis, Wuornos and anyone else interested it would be great to hear you views.

Hi Noel.

I certainly wouldn't abandon this work. Your ideas are sound and I think the outputs would have some validity.

If your anything like me you'll then not be satisfied with the results and go on to add othjer factors like relative value of tournaments won and incorporate the rating of the players faced from the first iteration of the calculation to get an adjustment based on the ability of who else was active in the world at that time.

Without seeing the series this is a difficult question to answer but there are two methods that jump immediately to mind.

1. You could calculate the mean and standard deviation for both the Z and Y series. By then calculating the position of each rating within the distribution, e.g. PlayerA is 2.5 Standard Deviations above the mean of the population within groupo Z, you make each series comparable. You can then simpley add the results from each series together to derive an appropriately weighted rating.

2. A more straight forward methodology would be to simply order each series such that you know the rank of each player within each distribution. E.g Lendl might be 5th in the Z series and 14th in the Y series. This again has the effect of putting both series on the same scale, albeit crudely. Again you could just add them together and players with lower ratings would be considered to have better evidence of greatness under your methodology

Personally I would prefer option 1.

Do not however be tempted to simply multiply the two values as this will give a disproprtionate weighting to the series with higher proprational standard deviation and I assume you would want to avoid this.

You will probably find some unusual results, i.e. players being ranked higher than you would expect. This is normal and you then need to decide whether these anomolies are possibly a true reflection of playing standard or if not what has distorted the output and how this might be corrected.

Let me know how you get on.

Regards

Tim

PS. I'm sure you have already thought of this but the surface thing is likely to devalue earlier players as the number of surfaces gradually reduce to a point where there is only grass! Inconsistent data, part of the pain of being a statistician !
 

urban

Legend
Its a good idea, to somehow integrate versatility into such analysis. Its of course one of the 3 ingredients to determine greatness including dominance and longevity. Its difficult for the pre open pro era, because for a great part, clay wasn't a dominant surface on the pro tour, and its difficult to identify clay tournaments at all. The French pro was played at RG only in the 50s, early 60s and 68. There were clay pro events in Europe, at Kitzbühel, Scheveningen, Geneva or Barcelona, but not a real clay 'cracker' for most of the 60s. Raymond Lee in his analyis doesn't have a versatility criterium, which shows,how difficult it is to put this criterium into numbers.
 

noeledmonds

Professional
I follow my fellow countryman Rino Tommasi in claiming that 'matches' are more important than 'tournaments' with the exception of Grand Slams (and maybe the Masters and the Davis Cup).

Interesting view point. I do think that match wins are of course important. Do you not think that winning percentage covers this though as this has match wins vs. match loses. Incoperting Davis Cup matches is something I had not considered though. This would be important particualarly in earlier tennis years where the Davis Cup was much bigger than it is today.

Its a good idea, to somehow integrate versatility into such analysis. Its of course one of the 3 ingredients to determine greatness including dominance and longevity. Its difficult for the pre open pro era, because for a great part, clay wasn't a dominant surface on the pro tour, and its difficult to identify clay tournaments at all. The French pro was played at RG only in the 50s, early 60s and 68. There were clay pro events in Europe, at Kitzbühel, Scheveningen, Geneva or Barcelona, but not a real clay 'cracker' for most of the 60s. Raymond Lee in his analyis doesn't have a versatility criterium, which shows,how difficult it is to put this criterium into numbers.

In practise you are correct that versility as a crieria may be difficult to implement. I thought that perhaps for some pre-open era tennis that different types of class could be classified as different surfaces. I have heard that different grasses played very differently, probabely as differently as todays Wimbledon grass and some hard courts.

If your anything like me you'll then not be satisfied with the results and go on to add othjer factors like relative value of tournaments won and incorporate the rating of the players faced from the first iteration of the calculation to get an adjustment based on the ability of who else was active in the world at that time.

I agree, and I am already considering how to incoperate more variables.


1. You could calculate the mean and standard deviation for both the Z and Y series. By then calculating the position of each rating within the distribution, e.g. PlayerA is 2.5 Standard Deviations above the mean of the population within groupo Z, you make each series comparable. You can then simpley add the results from each series together to derive an appropriately weighted rating.

2. A more straight forward methodology would be to simply order each series such that you know the rank of each player within each distribution. E.g Lendl might be 5th in the Z series and 14th in the Y series. This again has the effect of putting both series on the same scale, albeit crudely. Again you could just add them together and players with lower ratings would be considered to have better evidence of greatness under your methodology

Personally I would prefer option 1.

Do not however be tempted to simply multiply the two values as this will give a disproprtionate weighting to the series with higher proprational standard deviation and I assume you would want to avoid this.

My first inclination was to simply multiply the values but as you say this would give a very disproprtionaly weighted result. Calculating the S.D is a good and statistically valid idea. Without a data base of information though it would be tedious for me to implement accross all the players.

Versality is important but dominance on different surfaces is important too.
Agassi is a great player on 4 surfaces, Sampras on three

Agreed and that is why I am trying to construct a ranking system that includes both. I obviously rate Sampras above Agassi because of his dominance but you have to give a player like Agassi credit in a ranking system for his versility which most ranking systems do not. This ranking system does not yet either as Agassi only won one grass court tournament so he would perform worse than someone like Lendl who won several grass tournaments but failed to win Wimbledon.
 
Last edited:
I don't agree with your last considerations.

When we debate about GOAT questions, our standard has to change and improve.

In an year, Sampras is not a great clay-court player, because he's maybe the n.4-5 of the year, and the n.15 of the decade. Agassi is a great clay-court player because is one of the top10 of the decade.

We can say for example that , in the '90 the top8 are:

on hardcourts: Sampras, Agassi, Edberg, Courier, Rafter, Becker, Chang, Lendl
on carpet: Sampras, Becker, Ivanisevic, Agassi, Stich, Edberg, and who you want
on clay: Kuerten , Courier, Bruguera, Agassi, Moya, Gomez, Kafelnikov, Muster
on grass: Sampras, Ivanisevic, Edberg, Becker, Agassi, Stich, Krajicek

So we can say that for a 'decade versatility' Agassi is the best,'cause he's in the top8 in every surface.
But I think it has no sense tu say that, for example, Rosset is more versatile than Muster. Rosset is maybe the n.65 on hardcourts, 70 on grass, 50 in carpet, and 68 on clay, and Muster the 3rd on clay, 24 on hardcourts, 25 on carpet and 180 on grass. But I don't think that Rosset has to gain points because of his versatility. Versatility does have a sense if allow a player to win everywhere, not to reach the 4th round everywhere.

So, for a decade, Agassi is the most versatile, while Rosset's versatility is not important.

But if we speak about 1920-2007?

We can have for example:

on Hard courts: sampras e Agassi in the top10
on carpet. Sampras in the top10, Agassi n.20
on Clay: Sampras n.80, Agassi n.25
on Grass: Sampras in the top10, Agassi n.30

Now, it's important, from this point of view, Agassi's versatility? No. Because Agassi cannot win in an all-time tournament on grass, clay, or carpet. Sampras can win on three different surfaces.

So, in a decade-point of view, Agassi is more versatile. In an all-time point of view, Sampras is.

The same thing about Lendl and Agassi on grass: for a decade, it's even. Agassi won wimbledon, bit Lendl has a lot of good results in Wimbledon and won the Queen's. But from an all-time point of view, Agassi is clearly the best of them.

Regards,

c.
 

vive le beau jeu !

Talk Tennis Guru
((1 + number of tournaments won on grass) x (1 + number of tournaments won on clay) x (1 + number of tournaments won on hard courts and carpet))/(number of tournaments won) = Y

This second independent step attempts to account for versatility across the surfaces. A player who wins predominantly on one surface will be penalised. The value is one plus the number of tournaments won to prevent a player from receiving zero for winning no tournaments on a surface. The number of tournaments won is not relevant as it is cancelled out by dividing by the number of tournaments won at the end. I have called this value Y.

For Muster Y = 9.318 (to 4 significant figures)
For Edberg Y = 20.00 (to 4 significant figures)

Here Muster’s Y value is considerably lower than Edberg’s. This reflects Muster’s lack of versatility across the surfaces.
i like your approach: i have an infinite versatility, that's cool ! :D
 
Top