our Man Fellipe again! said:There is no glory at all nor is deep science.
It is just an application of well known algorithm for ranking nodes in a network.
The interpretation of the algorithm is fundamentally different from what
reported by reportes. They oversimplified the technique and this unfortunately
has caused many missunderstandings.
The approach is very simple. Each tennis player initially owns the same amount
of credit or prestige. Let say just a unit. This assumption is quite reasonable because
in absence of information (who won against whom, etc..) there is no possibility
to say who is better.
Now if player A looses against player B, A gives his total credit to B.
When you put more information (many matches) then the credit starts to flow
among players. This diffusive process of credit, always reaches a stationary state, which
means that the credit flowing in and flowing out from each player is the same.
PageRank just measures the ammount of credit that you observe in the
stationary state. Notice that, the score of one player is not only related to his personal performances, but
also to those of his opponents, those of the opponents of his opponents, and so on...
When tournaments are considered as independent events, the equation of pagerank reduce
to a very simple assignation of scores (similar to the one used by ATP, if I am not wrong).
Basically, players loosing at first round receive 1 point, players loosing at second round
receive 2 points, players loosing at third round receive 4 points, .... points are doubled at each round.
Grand Slam winners take 128 points, finalist 64. In tournaments with less rounds, like ATP master series,
the winner take 64 points.
In my algorithm, tournaments are not treated as independent, but matches among the same opponents
are aggregated over many tournaments. This means that winning a first round against the winner of Wimbledon, even
if in a minor tournament, is more important than winning at the same round and tournament against
a player who have lost at the first round of Wimbledon.
Ok, I do not want to annoying you with other details.
I just would like to state that there is not personal input in the algorithm.
There is only the assumption that matches represent basic contacts between players.
Notice also that you can construct different networks by imposing filters:
for example, consider only tournaments played on grass, only tournaments played in year 2000, only
Grand Slams, etc...
Oh thanks for bumping it. I loved those charts. So much fun. I am totally nostalgic of that time when *******s hadn't become pompous and overbearing yet and people could have some good innocent fun around hereWe did much better study some years ago, it's
1. Safin
2. Nadal
3. Sampras
4. Kafelnikov
Sorry if it hasn't been updated. Djokovic should be in top 10.
Oh thanks for bumping it. I loved those charts. So much fun. I am totally nostalgic of that time when *******s hadn't become pompous and overbearing yet and people could have some good innocent fun around here
erm, not exactly, our man Fellipe has returned with this (brace yourselves):
and he uses LOOSE instead of LOSE!!
ARRRGGHH!
I meant about this quote:
You actually believe those algorithms weren't based on personal opinion?
Personally, insults actually involving ADHD, Diabetes, Schizophrenia, cancer, mental retardation, etc. truly **** me off. Jokes I can deal with, but serious comments are infuriating. Shoot about bias all you want, I just want a para of rant on that subject
and I am certainly not emo
erm, that'll teach me, then..
lol, yes. he's sent me both those long winded emails and still doesn't really understand where i'm coming from...
In general, players still in activity are penalized with respect to those who already ended their career only for incompleteness of information (i.e., they did not play all matches of their career) and not because of an intrinsic bias of the system.
Players do not need to be classified since everybody has the opportunity to participate to every tournament.
It's astounding that he actually published this in a scientific journal when the algorithm/model is so obviously flawed and can not be applied to ranking great tennis players. He should have known by the results and then analyzing them using common sense, that something was way off.
It would be akin to using the Google algorithm/model (he references) to predict the weather.
Comments regarding how the algorithm was chosen, was not what my retort was about. It was about the definition of algorithm, quite obviously.
Secondly, do you UNDERSTAND the difference between ending a sentence with a period, as opposed to a question mark?
"can you not read, have a coding issue, or are you afflicted with ADD??"
Sorry Bud, I normally like your posts, but this shows me that you are not a scientist. You can't fiddle with your data to make it come out as you like. In this case, it is what it is based on the algorithm. What you can do is just interpret what you get and come up with conclusions as to why you think it comes out as such, which Radicchi does.
If anything it reveals what's being discussed as the current "weak era/strong era" argument. I think that's why Federer ranks so low on the all-time list. He was beating everyone (not to mention that a lot of top players retired or were injured between 2003-2006) affecting his own "prestige" score. He just didn't rack up the wins against what would be perceived as "quality" players. Then, as a follow-up it affects Nadal's ranking because most of his top-10 winning percentage (as many people have pointed is higher than others) is mostly against Federer.
The table list that sticks out to me is the Prestige column of Table 2. I'd agree that the prestige players listed reflects their performance for the given years despite support, but not agreement with the ATP/ITF rankings.
He seems aware of the limitations of his study and is just presenting it as a useful, yet innocuous finding. Does it really matter in the grand scheme of things who the top players are? It's not like he's trying to commit cyber-terrorism to hack government networks or sway public opinion on which stocks to buy to affect global markets.
Eddie Dibbs
Titles: 22
GS Titles: never made it beyond two SF appearances
Highest rank: #5
Versus top 10: 21/51 or 29% (very low)
Career winning %: 584/252 or 70%
Sure, he also deserves to be above Nadal on the list... lol!
Yeah, this points out the flaw.
Using an algorithm is a meaningless endeavor.
A model attempts to predict a known and objective dependent variable using known independent variables. Using past results to predict future results.
In this case, we are predicting a subjective dependent variable. That in and of itself makes the concept flawed.
Bud, I agree with you that Laver should have made the list based on the criteria you posted. You don't even have to compare him to Eddie Dibbs, Ken Rosewall is on the list.
There must have been a cut-off for pre-open era players who played in the open era. Or that Rosewall had more interaction (usable data) with the players who were prominent at the start of the open era. But your numbers do reflect the need for inclusion of Laver.
I think it would be hard to model pre-open data. The not everything was well kept track of.
As for the guy doing this to "brag" that he's a published author, he seems to have credentials that far exceed this publication. I think this was a bit of fun for him and his results are rubbing the pundits the wrong way. If you check his website, the guy just likes to crunch numbers.....
Like I mentioned in my previous post, I think the take-home message of the paper shouldn't be the the title of Table 1, "Top 30 players in the history of tennis." That's just bad form. It should be the Prestige ranking in Table 2, "Best players of the year." Guys on that list that show up 3 or more times most people would agree are good players over an extended period of time.
Although, McEnroe only shows up once in that column.....and Boris Becker not at all.
Yeah but the list also doesn't seem to account for the importance of tournaments, which is a big mistake. The slams are what counts, the rest is just foreplay.Apparently you did not read this bit:
"Radicchi ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. He quantified the importance of players and ranked them by a “tennis prestige” score. This score is determined by a player’s competitiveness, the quality of his performance and number of victories."
So the list is based on scientific research, NOT ON PERSONAL OPINION!!!!!!!!!!!!!!
Now that is a good post. I agree completely here.Best player of all time is opinion as there is no established criteria to use as a measurement of how great a player is or if they are the best. The only people who this algorithm hold merit to are the ones who actually created the algorithm as they designed it according to their own criteria for the best player of all time.
The bias towards total matches played should have been dealt with...The original can be found at PLoSOne (Public Library of Science)
This link explains their methodology in detail, not that it makes a whole lot of difference to the flaws already mentioned. Reading it show there is a significant bias towards total matches played, not quality of average match. This means that highly dominant players like Federer who achieved their slams in a shorter time than their peers (even to 14 slams he was years ahead of Sampras) are effectively penalised.
The research author's email is: f.radicchi@gmail.com
Filippo Radicchi - Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA.
Total number of matches played is your explanation. The percentages don't seem too important, just the absolute numbers. It's a big factor in the study, apparently. Big mistake that favored Jimbo.Check out post #62. I transcribed their OE records. Laver matches or exceeds Dibbs in every single objective category concerning pertinent and measurable results. Yet, Dibbs ends up #18 while Laver ends up missing the list?! :lol:
Here is it again:
Eddie Dibbs: #18 on the list
Titles in OE: 22
GS Titles: never made it beyond two SF appearances
Highest Open Era ranking: #5
Versus top 10 in OE: 21/51 (29%)
Career winning % in OE: 584/252 (70%)
Rod Laver: didn't make the list
Titles in OE: 42
GS titles: 4 (won all majors in 1969)
Highest Open Era ranking: #3
Versus top 10 in OE: 8/20 (29%)
Career winning % in OE: 412/106 (80%)
Total number of matches played is your explanation. The percentages don't seem too important, just the absolute numbers. It's a big factor in the study, apparently. Big mistake that favored Jimbo.
Yeah the slams don't seem to have more importance than the rest, which is a huge flaw.The author completely discounted Laver's 4 Open Era GS titles in 1969! He didn't even make the freakin' list :lol:
How is that explained away in reference to his algorithm?
We identify Rod Laver as the best tennis player between 1968 and 1971, period in which no ATP ranking was still established.
The results are way out in left field and have no bearing in reality when we compare things like total weeks at #1, GS titles, % of wins versus a player's top 10 peers, % of total wins to total losses, etc. So, how could these results be used to "corroborate the accuracy of other well established ranking techniques." (which the author already called inferior to his technique).
I like that he's trying to come up with a different means of analysis. His way takes out the weeks at #1 measure because that's based on tour points and not who you beat. For example, Del Potro shot up the rankings a couple of years back by playing and winning a string of 250/500 level events without meeting the top guys.
Although Rios getting to #1 without winning a major title is reflected in the rankings at the time and Radicchi's prestige score for 1998. Since Rios was beating quality players.
I guess taking out major titles as criteria is good because that minimizes the "cakewalk" draw argument.
And wins against top 10 peers again feeds into the weak/strong era argument because people bring up the complexion of the top 10 of 1992 and the top 10 of 2008 having fewer "accomplished" players in the latter.
Again with most science, it depends on how you look at the data and how you interpret the results.
You hit the nail on the head here.The results are way out in left field and have no bearing in reality when we compare things like total weeks at #1, GS titles, % of wins versus a player's top 10 peers, % of total wins to total losses, etc. ...
However, I listed numerous criteria in my response and you chose to pick out just one (which also is probably the least significant). My argument was an aggregate of a number of results... not simply total weeks at #1.
What about the declarations he stated and I quoted in post #80? I'll repeat them below.
- - -
Quotes from the author:
"The results presented here indicate once more that ranking techniques based on networks outperform traditional methods."
"The prestige score is in fact more accurate and has higher predictive power than well established ranking schemes adopted in professional tennis."
"Prestige rank represents only a novel method with a different spirit and may be used to corroborate the accuracy of other well established ranking techniques."
^^ Where is the author's evidence to bolster these statements?
Yes, especially when Nadal is second in winning % overall in open era and has a winning record against all main rivals including 14 wins over Fed.
I understand the difference. That question YOU asked was highly insulting to me. ADHD/ADD have nothing to do with learning disabilities; to claim that I cannot read because I have ADHD is demeaning and prejudiced because you may know one or two cases. I understand the difference better than you do, don't try running around it.
I find this thead interesting.
I'm surprised the author of the paper actually responded to Timbo's hopeless slice. I'm curious to know your background.
It's a bit of a biased view to report Filippo Radicchi's response to your emails without sharing the content of what you sent.
Some background: Tennis Open Era began April 28th, 1968. First Open Era GS tournament was the 1968 French Open.
- -
Eddie Dibbs: #18 on the list
Titles in OE: 22
GS Titles: never made it beyond two SF appearances
Highest Open Era ranking: #5
Versus top 10 in OE: 21/51 (29%)
Career winning % in OE: 584/252 (70%)
Rod Laver: didn't make the list
Titles in OE: 42
GS titles: 4 (won all majors in 1969)
Highest Open Era ranking: #3
Versus top 10 in OE: 8/20 (29%)
Career winning % in OE: 412/106 (80%)
Laver's numbers are either equal or greater in every single category. So, why didn't he make the list
This is the common sense part of the process the author of this study should have realized and then tweaked the model, accordingly.
The results on a per-surface relationship really might add another wrinkle to the mix. In one way, Sampras would be (and is) considered a great player due to his major title total, but on the other hand, half of them came on grass, and all of them were non-clay. Perhaps that's why he ends up ranking low because his "prestige" total decreases due to his results on clay against other historical players. Nadal also ranks low because of his dominance on clay and losses to less accomplished players (as of the paper's writing) on non-clay surfaces and his career is only midway.
Hi VGP
I wrote a fairly detailed initial email querying the validity of the algorithm with specific reference to cases where the data did not appear to support the conclusion. (Rod Laver's open era GS and the relative positions and performance of Tom Okker and Rafael Nadal, for example)
While Mr Radicchi has been unfailingly polite, he seems impervious to the admission of any alternative view.
I am a University Lecturer and have played tennis all my life.
(the 'loose'/ 'lose'thing is a reference to a different thread, essentially an in-joke, for which I apologize to those unfamiliar with the discussion in question)