Carl Bialik: Nadal Is 80 Percent Of What He Used To Be

bjsnider

Hall of Fame
Bialik uses match stats over the years plugged into his advanced stat: Dominance Ratio. Here's his definition: "DR is the ratio of a player’s winning percentage on points when he’s returning serve to the opponent’s return-point winning percentage."

Then, Bialik adjusted for strength of opponent (or tried to). "I ran a regression on his results, controlling for whom he played in each match, to see how his level varied by year.7 What it showed is that he was at his best in 2010, 2012 and 2013, with a steep dropoff last year and a continued decline this year. He is 20 percent worse now than he was at his best, by this measure."

Here's the data graphed:

bialik-datalab-rafa.png


Bialik's conclusion: "...based on his winning percentage and DR this year, Djokovic is playing at a career-high level on clay. Nadal’s decline since 2012 suggests that a match with the 2012 Djokovic would be a tossup, despite Nadal’s DR of 1.25 against him in 2012. The much-improved Djokovic of 2015 who Nadal might have to face this year would be the big favorite."

I guess that means 90 minutes & straights if that QF match happens?

Full article.
 
Pretty faulty, for example the adjusted D/R assumes players are at the same level 10 years apart? Not to mention guys Nadal might have played multiple times in his early years but not played since his supposed decline won't be included.
 
I wonder what his stats will be like after he adds Nadal's 2015 Roland Garros performance.....
Because Roland Garros tends to be Nadal's best form - last year he lost to Ferrer in straight sets at Monte Carlo (and Almagro at Barcelona, and got hammered by Nishikori at Madrid) and then beat him in 4 sets (including 6-0 6-1) at Roland Garros.
So I think this man did his stats too early ;)
 
Seems like a misuse of statistics by searching for a correlation which lacks causality...


Pretty faulty, for example the adjusted D/R assumes players are at the same level 10 years apart? Not to mention guys Nadal might have played multiple times in his early years but not played since his supposed decline won't be included.

Why do you guys find it faulty?

Isn't dominance ratio simply a rather good measure of someones performance relative to the field/some player at a given point in time? Or am I missing something of how this guy uses it?
 
Why do you guys find it faulty?

Isn't dominance ratio simply a rather good measure of someones performance relative to the field/some player at a given point in time? Or am I missing something of how this guy uses it?

I don't find D/R faulty, in fact I'm using it in a thread I'm putting together. What I find faulty is the Opp adjusted D/R for the reasons I stated.

Controlling for quality of opponent isn’t simple. Tennis has nothing like Sports Reference’s Simple Rating System, a widely accepted measure in several team sports.3 Without a simple rating, I went for something more complex.4 Nadal has faced 66 different opponents at least twice on clay in tour-level matches for which we have his DR in the match.5 If we assume each opponent is playing at the same level each time he plays Nadal,6 then we can use Nadal’s differing results against the same guy in different years to estimate the peaks and valleys of his game.

Too big of an assumption and like I said he's probably dominated a lot of players in his early years that might give him more trouble now e.g. Coria, Gaudio etc...but those guys are retired and he hasn't played then since his decline to get an accurate view.
 
I don't find D/R faulty, in fact I'm using it in a thread I'm putting together. What I find faulty is the Opp adjusted D/R for the reasons I stated.



Too big of an assumption and like I said he's probably dominated a lot of players in his early years that might give him more trouble now e.g. Coria, Gaudio etc...but those guys are retired and he hasn't played then since his decline to get an accurate view.

That's why I wrote that he attempted to adjust for strength of opponent. I'd rather have the attempt than not, even if it is flawed, it's better than not bothering.
 
That's why I wrote that he attempted to adjust for strength of opponent. I'd rather have the attempt than not, even if it is flawed, it's better than not bothering.

Is wrong data better than no data?

Not sure I would agree. Junk in junk out as they say.
 
I don't find D/R faulty, in fact I'm using it in a thread I'm putting together. What I find faulty is the Opp adjusted D/R for the reasons I stated.



Too big of an assumption and like I said he's probably dominated a lot of players in his early years that might give him more trouble now e.g. Coria, Gaudio etc...but those guys are retired and he hasn't played then since his decline to get an accurate view.

Oh, I see, I didn't catch that. I agree, that's an unwarranted assumption. Of course the others won't play at exactly the same level each time they meet Nadal, just like Nadal isn't.

I think just looking at his average D/R over different clay-seasons (and maybe adjust for average ranking of opposition faced instead, or something) would be more than enough. After all, the interesting thing is just how he performs relative to his field now and before.
 
Is wrong data better than no data?

Not sure I would agree. Junk in junk out as they say.

Well, it's all in the degree of the wrongness. Is it more or less wrong to assume that Djokovic is playing at the same level as Foggy or Federer or Stan or whomever, than it is to assume that each of those players is playing at the same level each time they play Nadal? I think it's more wrong to assume Djokovic's level is the same as Foggy's. It's still wrong, because Djokovic's level was much higher in 2011 than 2012, as his results show. But he was still a lot better than Foggy, or even Ferrer. And those guys are all top quality clay opponents. What about the sub-top 25 players Nadal has faced more than once on clay -- guys who have no hope at all no matter Nadal's level? So, I think the strength of opponent adjustment is terribly wrong, but slightly less wrong than assuming Nadal is just playing against the exact same level every match.
 
Oh, I see, I didn't catch that. I agree, that's an unwarranted assumption. Of course the others won't play at exactly the same level each time they meet Nadal, just like Nadal isn't.

I think just looking at his average D/R over different clay-seasons (and maybe adjust for average ranking of opposition faced instead, or something) would be more than enough. After all, the interesting thing is just how he performs relative to his field now and before.

Well, it's all in the degree of the wrongness. Is it more or less wrong to assume that Djokovic is playing at the same level as Foggy or Federer or Stan or whomever, than it is to assume that each of those players is playing at the same level each time they play Nadal? I think it's more wrong to assume Djokovic's level is the same as Foggy's. It's still wrong, because Djokovic's level was much higher in 2011 than 2012, as his results show. But he was still a lot better than Foggy, or even Ferrer. And those guys are all top quality clay opponents. What about the sub-top 25 players Nadal has faced more than once on clay -- guys who have no hope at all no matter Nadal's level? So, I think the strength of opponent adjustment is terribly wrong, but slightly less wrong than assuming Nadal is just playing against the exact same level every match.

I think if you're going to do it, go all out. Look at his main opponents each year e.g. those he had multiple meetings with and then look at their D/R on clay for that year. Adjust that way perhaps.

Or look at his D/R versus top 10 ranked opponents or even top 5 of top 20.
 
I think if you're going to do it, go all out. Look at his main opponents each year e.g. those he had multiple meetings with and then look at their D/R on clay for that year. Adjust that way perhaps.

Or look at his D/R versus top 10 ranked opponents or even top 5 of top 20.

Bialik wrote "Djokovic doesn’t have enough clay matches to do the same regression analysis we did for Nadal." This may be a problem for some of the other guys too.
 
Quite a lot of the things I see posted on fivethirtyeight as it relates to tennis is meant to be less of a rigorous statistical exercise, and more of a primer for talk about quantitative results. True to form, very very few of the articles on this site are about "statistics" in that they have something to say about a functional form of randomness, and more have to do with BI, playing around with numbers until they represent a human-interpretable story, even if it is not statistically sound.

Methodologically speaking, the Opponent-adjusted Dominance ratio is very dubious. In his footnotes, there is way too much degrees of freedom sacrificed when using 66 opponent covariates along with 10 time dummy covariates. This is an approach I see often with undergraduates who attempt a "catch-all" model and end up with inconsistent and over-fitted bloat. There is also the issue that by doing this very analysis, the author believes Nadal's level of play (or metric proxy for level of play) is inconstant over time, yet does not afford that depth to rest of the field. If one were truly to represent the state of affairs where Nadal's level of play changes over time as associated with results against competition who's level of play ALSO changes over time, you have no degrees of freedom and the robustness of your model goes out the window, as your assumption has required you to overfit.
 
Bialik wrote "Djokovic doesn’t have enough clay matches to do the same regression analysis we did for Nadal." This may be a problem for some of the other guys too.

Why not look at average D/R for the opponents?

For example Federer in 2006 was at 1.29 and Djokovic in 2012 was at 1.27.

Djokovic this year is at an immense 1.44 etc...

I don't know, I just don't think his method really tells us much.
 
Seems like a misuse of statistics by searching for a correlation which lacks causality...

I do not understand what this means. There is no such thing as causality as you would recognize it in statistics. Moreover, a regression is not required to be causally linked to have accurate correlations, or else no models could ever be validated.
 
Seems like a misuse of statistics by searching for a correlation which lacks causality...
That's exactly what I think. For one thing, you can't compare two very different years, where a player is up against very different competition. Even when the competition stays the same, the level of those competitors does not stay the same.

I think this guy drew some very incorrect conclusions.
 
I do not understand what this means. There is no such thing as causality as you would recognize it in statistics. Moreover, a regression is not required to be causally linked to have accurate correlations, or else no models could ever be validated.

Why is it that statistics and scientific statements concerning them are so often misused (it shouldn't be a surprise that they are)? The most common culprit is a complete lack of causality.

I have seen the most ridiculous statements based on statistics comparing two completely unrelated variables, showing some kind of significant relation, when in fact there is none. That is because numbers are shown without context or consideration for what they represent. A good example would be a statistic between skin colour and crime rate in the U.S. Statistics would likely show that black people have a considerably higher crime rate than white people for instance. Does that mean that black people are more likely to become criminals by nature? No. Heavens no. The problem here is that the statistic shows bare numbers without context. Any scientist worth his cent will tell you that The conclusion based on this statistic is a misrepresentation of reality, since it predicts a form of causality where there is none at all (skin colour and crime rate). Upon further analysis one might see that there simply is a higher percentile of black people in lower social levels (regrettably, I must add), where crimes are more frequent (for obvious reasons).

What the guy quoted in the OP is doing is just that. He is using some arbitrary numbers and assumes them to be an accurate representation of Nadal's dominance, which is simply false (as some posters have already pointed out). He then goes on to make a very precise conclusion (that Nadal is 80% of what he used to be), which is where the problem really begins. You see, a statistic in and of itself is just numbers and cannot be wrong as long as there is no error in the method or calculation (or similarly). The moment you decide to make a conclusion based on it though, you are implying that there is a causality between the two variables. That is not the case here (and in most statistics nowadays sadly), which is why the conclusion (what this whole thread is about from what I can see) is a fallacy due to the reason I stated.
 
Why is it that statistics and scientific statements concerning them are so often misused (it shouldn't be a surprise that they are)? The most common culprit is a complete lack of causality.

I have seen the most ridiculous statements based on statistics comparing two completely unrelated variables, showing some kind of significant relation, when in fact there is none. That is because numbers are shown without context or consideration for what they represent. A good example would be a statistic between skin colour and crime rate in the U.S. Statistics would likely show that black people have a considerably higher crime rate than white people for instance. Does that mean that black people are more likely to become criminals by nature? No. Heavens no. The problem here is that the statistic shows bare numbers without context. Any scientist worth his cent will tell you that The conclusion based on this statistic is a misrepresentation of reality, since it predicts a form of causality where there is none at all (skin colour and crime rate). Upon further analysis one might see that there simply is a higher percentile of black people in lower social levels (regrettably, I must add), where crimes are more frequent (for obvious reasons).

What the guy quoted in the OP is doing is just that. He is using some arbitrary numbers and assumes them to be an accurate representation of Nadal's dominance, which is simply false (as some posters have already pointed out). He then goes on to make a very precise conclusion (that Nadal is 80% of what he used to be), which is where the problem really begins. You see, a statistic in and of itself is just numbers and cannot be wrong as long as there is no error in the method or calculation (or similarly). The moment you decide to make a conclusion based on it though, you are implying that there is a causality between the two variables. That is not the case here (and in most statistics nowadays sadly), which is why the conclusion (what this whole thread is about from what I can see) is a fallacy due to the reason I stated.

That's not true at all. Anyone who knows statistical modelling to a moderate degree knows that it is an exercise of association - nothing more nothing less. You can draw conclusive hypotheses from statistical correlations without impeding on causal territory - in fact 99% of all models do this. The author has not insinuated that X causes Y, only that X is observed to be functionally associated with Y. There is nothing fundamentally broken about that statement. I don't understand your issue with it at all. There is no parallel between this situation and racial backgrounds of US crime rates, because you have framed the later in a causal context, whereas in this situation, the author has done no such thing.
 
The problem is saying things like Nadal is 80% of ___.

That's just nonsense.

There is so much of that thrown around.

"Nadal's a step slower."

Maybe he is slower, but I'm not going to come to that conclusion based on what I see. I need measurements, and we simply don't have them.

We can tell if his serving is faster or slower, but not his foot speed. Yet you hear people talk about such things all the time as if they are facts.
 
The problem is saying things like Nadal is 80% of ___.

That's just nonsense.

There is so much of that thrown around.

"Nadal's a step slower."

Maybe he is slower, but I'm not going to come to that conclusion based on what I see. I need measurements, and we simply don't have them.

We can tell if his serving is faster or slower, but not his foot speed. Yet you hear people talk about such things all the time as if they are facts.

It is a fact that the greatest sprinters in history recorded faster times at ages younger than Nadal's almost-29. IIRC the cutoff age for fastest times was ~27. Most were younger than that. So, if Nadal is still as fast as he used to be, it says something about his lack of proper training when he was younger. If that's the case, this will be the first I've heard of it.
 
The problem is saying things like Nadal is 80% of ___.

That's just nonsense.

There is so much of that thrown around.

"Nadal's a step slower."

Maybe he is slower, but I'm not going to come to that conclusion based on what I see. I need measurements, and we simply don't have them.

We can tell if his serving is faster or slower, but not his foot speed. Yet you hear people talk about such things all the time as if they are facts.

That is not a causal statement or implies any sense of causality. You can disagree with his methodology (I certainly do), but he is not making causal ties anywhere in the article. I don't know where you are seeing causality, it looks like you simply object to the proxy he uses as a measurement of Nadal's intangible level of play.
 
80% is rather generous I'd say. Player like Berdych bageled him on slow hard court, inconsistent player like Fognini defeated him twice on clay, Murray thumped him dropping just five games on clay and several other losses.
 
That is not a causal statement or implies any sense of causality. You can disagree with his methodology (I certainly do), but he is not making causal ties anywhere in the article. I don't know where you are seeing causality, it looks like you simply object to the proxy he uses as a measurement of Nadal's intangible level of play.

It's clear they are simply referring to validity and mixing up the term with causality (which really isn't that weird, as in both instances it's often the case of people making claims that aren't warranted, just in slightly different ways).

It wasn't that hard to understand what they actually meant, at least when they elaborated, so no need to get so stuck up on the terminology. Show some goodwill.
 
That's not true at all. Anyone who knows statistical modelling to a moderate degree knows that it is an exercise of association - nothing more nothing less. You can draw conclusive hypotheses from statistical correlations without impeding on causal territory - in fact 99% of all models do this. The author has not insinuated that X causes Y, only that X is observed to be functionally associated with Y. There is nothing fundamentally broken about that statement. I don't understand your issue with it at all. There is no parallel between this situation and racial backgrounds of US crime rates, because you have framed the later in a causal context, whereas in this situation, the author has done no such thing.

There is a lot broken with that statement if you decide to see it for what it is.

What you call "functionally associated" is a farce, because it shows a trend with a (more or less) considerable amount of error. I can try and find "functional associations" for nearly anything and I will succeed quite a bit, but that doesn't make conclusions based on those functions, which at the very best can show trends, valid. Functional associations are used when you can see a semblance of a trend in certain variables. If we're speaking truthfully, one should at most be allowed to make vague generalisations when using such statistics. Such statistics are one of the main reasons why we have so much pseudoscience around. I see some general trend within a certain amount of error and make a conclusion based on it, surely I will reach the truth by doing so...

The author of this text makes a few mistakes:

1) his measure for what he calls dominance is laughable.
2) his conclusion is precise way beyond the scope of a mere trend, which means that he sees a strong relationship between these two variables (if he doesn't, then he is not allowed to make the conclusion he makes).

If he used his numbers to say that there is a downwards trend in his dominance, I wouldn't be bothered too much and just point out that his definition and measure of dominance is questionable to say the least.

You seem to have studied some form of statistics or you are involved in the field, which is great, but nowadays statistics are overused and misused, as I said. You speak as somebody who works with statistics, whereas I speak using what I learned through philosophical texts regarding scientific theory, which are much more focused on the truthfulness of statements and which conclusions may be formed at all of their truth is our priority. Do we use statistics beyond the scope of what I have said here? Yes, definitely. Does that make statements/conclusions based on them any more truthful? Not really.
 
There is a lot broken with that statement if you decide to see it for what it is.

What you call "functionally associated" is a farce, because it shows a trend with a (more or less) considerable amount of error. I can try and find "functional associations" for nearly anything and I will succeed quite a bit, but that doesn't make conclusions based on those functions, which at the very best can show trends, valid. Functional associations are used when you can see a semblance of a trend in certain variables. If we're speaking truthfully, one should at most be allowed to make vague generalisations when using such statistics. Such statistics are one of the main reasons why we have so much pseudoscience around. I see some general trend within a certain amount of error and make a conclusion based on it, surely I will reach the truth by doing so...

The author of this text makes a few mistakes:

1) his measure for what he calls dominance is laughable.
2) his conclusion is precise way beyond the scope of a mere trend, which means that he sees a strong relationship between these two variables (if he doesn't, then he is not allowed to make the conclusion he makes).

If he used his numbers to say that there is a downwards trend in his dominance, I wouldn't be bothered too much and just point out that his definition and measure of dominance is questionable to say the least.

You seem to have studied some form of statistics or you are involved in the field, which is great, but nowadays statistics are overused and misused, as I said. You speak as somebody who works with statistics, whereas I speak using what I learned through philosophical texts regarding scientific theory, which are much more focused on the truthfulness of statements and which conclusions may be formed at all of their truth is our priority. Do we use statistics beyond the scope of what I have said here? Yes, definitely. Does that make statements/conclusions based on them any more truthful? Not really.

I agree with most of what you have said, but this is besides the point. As you say "a trend in certain variables" is the essence of what the author is trying to capture. No one can prove a causal relationship because thusfar, there is no statistical definition of philosophical causality (see Granger-cause for a very very specific instance). Those "trends" are not wrong simply because they speak to correlation and not to philosophical causation. I can make a model (and have), predicting the count of pneumonia related deaths caused by the number of winter clothes sales in Scandinavia. It is nonsense, but has predictive power, and that's the goal, not making some causal link saying winter clothing sales cause pneumonia. The goal isn't drawing a causal conclusion. Any reasonably trained statistician will tell you a model can be very succesful while saying nothing about what caused what, and that definitive conclusions can still be drawn from them.

This is all to say, I see weakness in his methodology (as with many, the issue of compressing an idea of "level of play" into a scalar metric), but given the most rigorous methodology available, his conclusion wouldn't be unwarranted.
 
Hmmm, I don't think the statistics are at fault necessarily. Just some of the assumptions we need to make to accept them.

For example: "DR is the ratio of a player’s winning percentage on points when he’s returning serve to the opponent’s return-point winning percentage." This metric is interesting but how accurate is it in showing margin of victory? Since the percentage of points won doesn't really matter so much as winning the critical points. One could hypothetically win only 4 return points in a set yet get a break. On the flip side, you could win nearly 50% of return points, yet not break. I'd like to see a regression on 'DR' vs sets won. I'm sure there is a correlation, but it better be extremely high to convince me this stat is valuable along the lines of 'matches won/lost' or 'sets won/lost'. Actually now that I think about it, a ratio of "breaks/times broken" rather than the percentages of points won would be more telling.

Another example is the way Bialik accounts for opponent strength: "If we assume each opponent is playing at the same level each time he plays Nadal, then we can use Nadal’s differing results against the same guy in different years to estimate the peaks and valleys of his game." Huge and inaccurate assumption, not taking into account the "peaks and valleys" of BOTH players. Even day to day, level of play can fluctuate wildly, let alone talking about over the course of years!! There are tons of factors involved including motivation, fatigue, injuries, and just lack of practice/feel. The article mentions Murray... If you look at the previous 2 meetings before Madrid 2015: Rome 2014, Nadal won 1-6, 6-3, 7-5. RG 2014 Nadal won 6-3, 6-2, 6-1. Two extremely different matches just weeks apart. I'd argue Murray's level changed more than Nadal's, bc he was on fire in Rome and almost beat a decent playing Nadal. Then at the FO he just rolled over without much of a fight.
 
Last edited:
The problem is saying things like Nadal is 80% of ___.

That's just nonsense.

There is so much of that thrown around.

"Nadal's a step slower."

Maybe he is slower, but I'm not going to come to that conclusion based on what I see. I need measurements, and we simply don't have them.

We can tell if his serving is faster or slower, but not his foot speed. Yet you hear people talk about such things all the time as if they are facts.

Actually the article itself never says anything ridiculous like that. Only the clickbait title!

Also I have scientifically calculated that Nadal is actually playing at 67.3% his best level.
*science based on okdude's subjective opinions*
 
Hmmm, I don't think the statistics are at fault necessarily. Just some of the assumptions we need to make to accept them.

For example: "DR is the ratio of a player’s winning percentage on points when he’s returning serve to the opponent’s return-point winning percentage." This metric is interesting but how accurate is it in showing margin of victory? Since the percentage of points won doesn't really matter so much as winning the critical points.
I would only worry about BPs faced, BPs saved and the percentage of both, then the ratio of opportunities.

Novak this year is 83/29 on BPs, meaning that he has only faced 29 on his serve and had 83 opportunities That means he had 2.86 times to break for every time he had to save a BP. That's absurd. I've never seen that happen before. Never. That means that so far Novak has set the bar so high on clay that no one else has had a chance. His percentages of winning these points is down a bit: 69/43, meaning he's saved 69% and he broke 43%, but with such a huge number of opportunities, he is killing everyone.

Nadal is at 192/82 or 2.34, which is as good as any year he has ever had. That's a whopping 192 chance to break, so what as been killing him is this:

57% of BPs saved, only 41% of BPs converted.

These two sets of stats may or may not be reflected in what other people are showing. But they clearly show he is doing great at getting opportunities to break. He just can't convert this year. So I don't think that is health, or speed, or technique, or anything else. It is lack of belief in his shots, and that's all he's missing. Basically he can't pull the trigger at the right time.
 
I would only worry about BPs faced, BPs saved and the percentage of both, then the ratio of opportunities.

Novak this year is 83/29 on BPs, meaning that he has only faced 29 on his serve and had 83 opportunities That means he had 2.86 times to break for every time he had to save a BP. That's absurd. I've never seen that happen before. Never. That means that so far Novak has set the bar so high on clay that no one else has had a chance. His percentages of winning these points is down a bit: 69/43, meaning he's saved 69% and he broke 43%, but with such a huge number of opportunities, he is killing everyone.

Nadal is at 192/82 or 2.34, which is as good as any year he has ever had. That's a whopping 192 chance to break, so what as been killing him is this:

57% of BPs saved, only 41% of BPs converted.

These two sets of stats may or may not be reflected in what other people are showing. But they clearly show he is doing great at getting opportunities to break. He just can't convert this year. So I don't think that is health, or speed, or technique, or anything else. It is lack of belief in his shots, and that's all he's missing. Basically he can't pull the trigger at the right time.

Good stuff Gary. I think that's pretty much what I notice when he plays also.
 
I would only worry about BPs faced, BPs saved and the percentage of both, then the ratio of opportunities.

Novak this year is 83/29 on BPs, meaning that he has only faced 29 on his serve and had 83 opportunities That means he had 2.86 times to break for every time he had to save a BP. That's absurd. I've never seen that happen before. Never. That means that so far Novak has set the bar so high on clay that no one else has had a chance. His percentages of winning these points is down a bit: 69/43, meaning he's saved 69% and he broke 43%, but with such a huge number of opportunities, he is killing everyone.

Nadal is at 192/82 or 2.34, which is as good as any year he has ever had. That's a whopping 192 chance to break, so what as been killing him is this:

57% of BPs saved, only 41% of BPs converted.

These two sets of stats may or may not be reflected in what other people are showing. But they clearly show he is doing great at getting opportunities to break. He just can't convert this year. So I don't think that is health, or speed, or technique, or anything else. It is lack of belief in his shots, and that's all he's missing. Basically he can't pull the trigger at the right time.

Do the sums again against top 10 players.
 
Good stuff Gary. I think that's pretty much what I notice when he plays also.
I just finished watching the 2006 match against Fed in the RG final. It was repeated on Tennis Channel.

These guys were not perfect gods who did nothing wrong. Nadal lost the first set 1/6 and looked pretty awful. Fed lost the next 1/6, and he looked awful.

Fed made more than 50 UEs. Nadal's net play looked nothing like it does today. The announcers were saying that Fed had a better chance at winning an FO than Nadal had in winning W. Fed was returning way back.

The one thing I saw that Nadal obviously did better than was flattening out his ground strokes when he pulled the trigger. Those are the shots I have hardly seen at all this year.

But it's not like either of the guys was perfect "back then", and this was in the middle of Fed's peak.

People only remember the best shots and forget about all the rest.

Nadal may lose in the first round or two. Anything can happen. But until he does I don't see that his game is gone. Just his confidence. In a couple weeks we'll know.
 
Do what sums "again"?

Breakpoints/chances faced and percentage won against top 10 players, presumably.

Novak and Nadal have played different tournaments (Nadal has played smaller clay-tourneys, whereas Novak has only played masters), and so using their general BP-stats on clay this season isn't really representative.
 
Hmmm, I don't think the statistics are at fault necessarily. Just some of the assumptions we need to make to accept them.
Agreed. Folks often confuse statistics with a causality theory because everyone knows to say "correlation does not equal causality" when they don't like the correlation the data analysis reveals.

For example: "DR is the ratio of a player’s winning percentage on points when he’s returning serve to the opponent’s return-point winning percentage." This metric is interesting but how accurate is it in showing margin of victory? Since the percentage of points won doesn't really matter so much as winning the critical points. One could hypothetically win only 4 return points in a set yet get a break. On the flip side, you could win nearly 50% of return points, yet not break. I'd like to see a regression on 'DR' vs sets won. I'm sure there is a correlation, but it better be extremely high to convince me this stat is valuable along the lines of 'matches won/lost' or 'sets won/lost'. Actually now that I think about it, a ratio of "breaks/times broken" rather than the percentages of points won would be more telling.
I think the DR ratio is quite interesting indeed. The whole point of it, I guess, is the amount of "dominance" is not fully reflected in the sets won, ie. a bagel set vs a 7:6 TB against the same opponent tell very different story of the relative level of the two players. So I think the DR is an attempt at quantifying this difference.

Another example is the way Bialik accounts for opponent strength: "If we assume each opponent is playing at the same level each time he plays Nadal, then we can use Nadal’s differing results against the same guy in different years to estimate the peaks and valleys of his game." Huge and inaccurate assumption, not taking into account the "peaks and valleys" of BOTH players. Even day to day, level of play can fluctuate wildly, let alone talking about over the course of years!! There are tons of factors involved including motivation, fatigue, injuries, and just lack of practice/feel. The article mentions Murray... If you look at the previous 2 meetings before Madrid 2015: Rome 2014, Nadal won 1-6, 6-3, 7-5. RG 2014 Nadal won 6-3, 6-2, 6-1. Two extremely different matches just weeks apart. I'd argue Murray's level changed more than Nadal's, bc he was on fire in Rome and almost beat a decent playing Nadal. Then at the FO he just rolled over without much of a fight.
I don't think the same opponent tracking method is "hugely inaccurate". You are right that a given player's level fluctuate a lot. But a large group of players, 66 is 2/3 of the top 100 which is the meaningful representation of the ATP basically, is a very stable gauge of constant levels.

For your guide, the use of "repeat sales" is the fundamental building block of the most trusted housing price index (http://en.wikipedia.org/wiki/Case–Shiller_index).

One nitpick I have is the DR should adjust for playing surface as Nada's health variation often result in large variations of the percentage of clay matches of his total matches played. But this omission probably understates his decline because in bad years he played more exclusively on clay than in the good years.

I'd be interested in seeing the updated stats after RG is over!
 
That's not true at all. Anyone who knows statistical modelling to a moderate degree knows that it is an exercise of association - nothing more nothing less. You can draw conclusive hypotheses from statistical correlations without impeding on causal territory - in fact 99% of all models do this. The author has not insinuated that X causes Y, only that X is observed to be functionally associated with Y. There is nothing fundamentally broken about that statement. I don't understand your issue with it at all. There is no parallel between this situation and racial backgrounds of US crime rates, because you have framed the later in a causal context, whereas in this situation, the author has done no such thing.

Yes this is correct. However, although "correlation does not equal causality", strong correlation often strongly suggests causality.
 
Agreed. Folks often confuse statistics with a causality theory because everyone knows to say "correlation does not equal causality" when they don't like the correlation the data analysis reveals.

I think the DR ratio is quite interesting indeed. The whole point of it, I guess, is the amount of "dominance" is not fully reflected in the sets won, ie. a bagel set vs a 7:6 TB against the same opponent tell very different story of the relative level of the two players. So I think the DR is an attempt at quantifying this difference.


I don't think the same opponent tracking method is "hugely inaccurate". You are right that a given player's level fluctuate a lot. But a large group of players, 66 is 2/3 of the top 100 which is the meaningful representation of the ATP basically, is a very stable gauge of constant levels.

For your guide, the use of "repeat sales" is the fundamental building block of the most trusted housing price index (http://en.wikipedia.org/wiki/Case–Shiller_index).

One nitpick I have is the DR should adjust for playing surface as Nada's health variation often result in large variations of the percentage of clay matches of his total matches played. But this omission probably understates his decline because in bad years he played more exclusively on clay than in the good years.

I'd be interested in seeing the updated stats after RG is over!
No I realize that that is more than an ample study size. I was purely reacting to Bialik's premise that an opponent will play the same or similar level on any given day. For example, Nadal's DR might be particularly high in a given match because his opponent is injured (just an example). So we could observe that Nadal's DR is higher, but it is faulty to attribute this to his level of play... Do you see what I mean? Likewise if said opponent has a fantastic day, Nadal might not be as dominant, but that doesn't mean he necessarily played worse. Again it is not an issue with the statistics, just our conclusions we draw from them. And my opinion is that it is a flaw in the model to try and "adjust" anything because too many factors vary match to match. That being said, simply categorizing the numbers by surface, opponents ranking, surface, month, etc is fine. However you want to break down the numbers.

Going back to the theory of DR, I understand the idea just fine. But answer me this; which is more dominant? A match where you win 6-1 6-7 6-2, or a match you win 6-4, 7-6. Again, I just wonder how strong the correlation between DR and sets won/matches won is. I think a much stronger indicator would be "breaks/times broken" rather than a ratio of percentage of points won. It happens fairly frequently where a player will win more total points but still lose the match! You could have a ton of break chances, but not break.

Yes this is correct. However, although "correlation does not equal causality", strong correlation often strongly suggests causality.
What? No a strong correlation could indicate a possible causality! But there are plenty of other potential reasons for a strong correlation. For example, there could be another factor (or multiple factors) underlying both variables. In some cases, causality may not exist at all.
 
Last edited:
The one thing I saw that Nadal obviously did better than was flattening out his ground strokes when he pulled the trigger. Those are the shots I have hardly seen at all this year.

But it's not like either of the guys was perfect "back then", and this was in the middle of Fed's peak.

People only remember the best shots and forget about all the rest.

Nadal may lose in the first round or two. Anything can happen. But until he does I don't see that his game is gone. Just his confidence. In a couple weeks we'll know.
Totally agree, Gary. He has slowed down for sure but that's not causing him to miss the routine balls and hit short, double fault, or miss countless BP chances. Tennis is so mental, and Nadal is the definition of a 'confidence player'.
 
Last edited:
Very interesting, but, like others, I feel that Bialik's making some pretty big leaps in his assumptions.

More than anything, though, I think it reflects the real tragedy that is how limited the scope of statistics and data are in recorded tennis. FiveThirtyEight guys, who are generally great at what they do, have to make such leaps because there's little data to be had to point them in different directions.
 
Back
Top