OK this is from the draws against Djokovic thread, and data is taken from there too.
First off, I am not good at math or statistics, but even I cannot stand all the stupidity in that thread.
First off there are people that countered that it should be looked from other angle, which I am not sure I agree or not, but I will accept this:
"the rankings are circunstancial, what matters is the fact that in all the draws, Federer and Djokovic had pretty much the same probability of being on the same side of the draw, or being on different sides, yet, almost everytime, they are on the same half."
So, it's about what should have been 50/50 and some feel it is not.
So in that thread it says since august 2007, this, itself is a problem as it is picked after looking at data to try to show a more skewed result than it already is, else why don't you use the data since he is on tour and start to play in GS? which is 2005.
Second and a much bigger problem
there were 17 draws, out of the 17, 15 times Djokovic is on Federer's side.
What should be calculated is assuming 50/50 what's the probability of this happening 15 time or more out of 17 draws. I forgot the close form solution and used a numerical method and it a little above 1%, still low but that's not 0.000122%. And keep in mind this is still skews due to hand picked time period after looking at data.
Instead, everyone screams 0.5^13 for hard court and grass. But you cannot do that, if you look at data and take out the part that doesn't suit your conclusion, it will make your result extremely skewed and you can basically draw any conclusion from any data, despite the real data support you or not.
BTW there should be more complex method or take one more step to control for both exception happened on clay court, but simply take out data doesn't suit your conclusion making it entirely invalid.
First off, I am not good at math or statistics, but even I cannot stand all the stupidity in that thread.
First off there are people that countered that it should be looked from other angle, which I am not sure I agree or not, but I will accept this:
"the rankings are circunstancial, what matters is the fact that in all the draws, Federer and Djokovic had pretty much the same probability of being on the same side of the draw, or being on different sides, yet, almost everytime, they are on the same half."
So, it's about what should have been 50/50 and some feel it is not.
So in that thread it says since august 2007, this, itself is a problem as it is picked after looking at data to try to show a more skewed result than it already is, else why don't you use the data since he is on tour and start to play in GS? which is 2005.
Second and a much bigger problem
there were 17 draws, out of the 17, 15 times Djokovic is on Federer's side.
What should be calculated is assuming 50/50 what's the probability of this happening 15 time or more out of 17 draws. I forgot the close form solution and used a numerical method and it a little above 1%, still low but that's not 0.000122%. And keep in mind this is still skews due to hand picked time period after looking at data.
Instead, everyone screams 0.5^13 for hard court and grass. But you cannot do that, if you look at data and take out the part that doesn't suit your conclusion, it will make your result extremely skewed and you can basically draw any conclusion from any data, despite the real data support you or not.
BTW there should be more complex method or take one more step to control for both exception happened on clay court, but simply take out data doesn't suit your conclusion making it entirely invalid.
Last edited: