Forecasting Nadal's clay season

falstaff78

Hall of Fame
Hello all.

We are a good way into the European clay season.  The top question on everyone's minds is which Nadal is going to show up at Madrid, Rome and, most importantly, Paris.  This is a series of 4 posts, which will attempt to answer this question. To start with, please look at the table below, which summarizes Nadal's clay results from 2005 onwards.

rkbfpw.jpg


Despite the fact that Nadal has won a lot on clay in recent years, there is considerable variation in how well he does. In 2008, 2010 and 2012 he was an absolute murderer on clay. However in 2006 he was vulnerable to Federer at Rome (saved match-points) and at RG (ate a breadstick), in 2009 he lost RG outright, and in 2011 he was pushed to 5 sets by Isner at RG, and played 3 close sets against Federer in the final.

The problem that confronts us is that looking at Nadal's result in Barcelona and Monte Carlo alone is not sufficient information to predict how he is going to do for the rest of the season. Because he always wins! Furthermore, even though he didn't perform to his usual standard at one of these two events this year, how can we quantify how much that should affect our appraisal of his chances going forward?

We therefore need some measure of performance beyond just the result, which can help us predict how the remainder of his season is going to go. This is where dominance ratio (D/R) comes in. D/R is the ratio of:
(% of points won on opponents' serves) / (% of points lost on own serve) As I show in another thread, DR is a very powerful summary of a player's performance. Despite the fact that Nadal generally wins at both MC and Barcelona, some years he does so with less dominance than others. These posts are an attempt to

(1) justify that dominance ratio has predictive power - for Nadal's performance in particular and for all players in general. And
(2) to see what the historical trend suggests for how well Nadal will do going forward, given his dominance ratio at Barcelona and Monte-Carlo this year.

So, please take a look at the chart below, which is an extended version of the one above.

maz1fk.jpg


The first column of data shows Nadal's dominance ratio (i.e. D/R) in each year at Monte-Carlo and Barcelona combined. The second column shows his dominance ratio after Barcelona - i.e. at Hamburg/Madrid, Rome and then Paris. The third column shows his total winning percentage at these three tournaments. The fourth shows his dominance ratio ONLY at Paris. And finally there is a Wikipedia style grid showing results for all three tournaments.

Two observations from this chart.

(1) Intuitively, there is a strong correlation between D/R at MC and Barcelona on the one hand, and how well Nadal does for the rest of the clay season on the other. (I will demonstrate this statistically later in the thread.)

(2), If you take my word for the first point for now, let's take a look at Nadal's clay season so far:

Golden swing: 12-1 record, 1.37 D/R, ave opp. rank: 81.1
MC and Barcelona: 9-1 record, 1.36 D/R, ave opp. rank: 29.9

So although the D/R is constant, the stronger field at MC and Barcelona imply that Nadal's game was probably improving.

The second observation is that his D/R of 1.36 at Monte Carlo and Barcelona is tantalizingly poised near the point which has historically constituted a critical value for him to win RG. In 2006, with a D/R of 1.35 at MC&Barc, he won RG in a close 4-set final, where he won a tie-break and suffered the ignominy of a breadstick. In 2009, with a D/R of 1.34 at MC&Barc, he lost RG. Whenever his D/R has been higher, he has won RG.
 
Last edited:

falstaff78

Hall of Fame
The next step is to try and extend this data. So we take a look at how other players have historically done after having a dominance ratio at Monte Carlo and Barcelona comparable to the one Nadal achieved this year - namely 1.36

I assembled the following dataset. I took the 4 best clay players in the world today, plus all multiple RG finalists since 1991, when the data on www.tennisabstract.com begins. For each player I took data for any year where he played 5 or more matches at Barcelona and Monte-Carlo combined, AND 5 or more matches at Rome, Madrid/Hamburg and RG combined. There are a total of 42 data points - see entire dataset a few posts below.

The chart below shows the same data as the Nadal chart above, but for the 6 player-years where the D/R at Monte Carlo and Barcelona was between 1.31 and 1.41 - i.e. within a narrow range of Nadal's D/R in 2013.

2eb7g51.jpg


By construction these 6 players have an average D/R at MC and Barcelona of 1.35 - very similiar to Nadal in 2013. After Barcelona they went on to achieve a winning percentage of 90% on clay - which translates to one loss roughly every 1.5 tournaments. In 6 RG appearances they garnered 3 titles, 1 final and two early exits. In other words, taking a broader sample of data merely confirms our hypothesis from the Nadal table:

A D/R of 1.36 at Barcelona and M/C predicts that you will have a good shot of winning RG - but by no means is it a lock.
 
Last edited:

falstaff78

Hall of Fame
But should we even care about the chart in the second post?

The next table shows a summary of the entire dataset. It breaks the data into 5 bands, depending on the player's D/R at Barcelona and Monte Carlo. The point of this table is to convince you that as a general rule, dominance ratio at Monte Carlo and Barcelona has strong predictive power for the remainder of the season.

291mjwx.jpg


The first column shows how the bands are defined. The next few columns look at the average performance of players in each band later in the clay season. (You will recognize the orange row from the table I showed you in the post above.)

Specifically we are looking at three metrics. (1), the dominance ratio achieved for the remainder of the clay season. (2), the win% achieved for the remainder of the clay season. And (3), the dominance ratio achieved at the French Open.

For each one of these metrics, there is a smooth increase as we go down the buckets. There are no exceptions to this rule! This suggests that D/R at Barcelona and MC is a strong predictor of clay performance the same season.
 
Last edited:

falstaff78

Hall of Fame
The table above gives us hope about the precision and predictive power of a regression of each of these metrics on D/R at MC&Barc. Which is nothing more than a pompous way of saying we can meaningfully extrapolate a historical trend.

i.e. we can answer the question: given the historical data, how do we expect someone with a D/R of 1.36 at MC and Barcelona to do going forward in the clay season?

In the table below, I show the forecasts for the three metrics above using a variety of samples for robustness. First, I do the extrapolation using data from the entire sample. Then I do the extrapolation from data in the last 5 years. The third row shows the extrapolation from players in the dataset whose D/R at Barcelona and MC was between 1.16 and 1.56 (i.e. within 20 points of Nadal). And finally, I look at Nadal's data only.


2j2goxd.jpg


Observations from this table.

1) The explanatory variable we have chosen (i.e. D/R at MC and Barc) has a LOT of explanatory power for each of the 3 outcome variables. The R^2 for the full sample regression is a very respectable 34-37%. (which is a fancy way of saying that variation in the explanatory variable can explain one-third of the variation in the outcome variables.) In the last 5 years this association has been even stronger.

2) Extrapolating only from the 8 data points from Rafa's career give the most optimistic outlook. However these estimates are questionable because of the small sample size.

3) The data suggests that for the rest of the clay season Rafa ought to have a winning percentage anywhere from the mid eighties - low nineties, which implies 1-2 titles in the next 3 tournaments. Furthermore his dominance ratio at the French ought to be in the mid- to high- 1.30's, which has historically meant a high chance of reaching the final, and an even chance of winning it.

Disclaimer: Nadal COULD sweep all before him and win 3 tournies without dropping a set, and he COULD flame out in the semis of all 3 tournies. This is therefore a good time to highlight the difference between saying "I am personally predicting X" vs. saying "the historical pattern from the data suggests X". Clearly, I am doing the latter.


Anyway hope you found the data interesting. Please give comments, criticisms and suggestions for further investigation.

- F
 
Last edited:

falstaff78

Hall of Fame
Here is the dataset used for the above analysis. The source of this data is the excellent website www.tennisabstract.com. For example, take a look at this
link which shows that last year at Madrid, Rome and Roland Garros, Nadal's win loss record was 13-1 and his D/R was 1.53


30rtpwx.jpg
 
Last edited:

falstaff78

Hall of Fame
COMPARING PREDICTIONS TO ACTUAL RESULTS

Now that the clay season has concluded we can see how he performed relative to the predictions. Turns out the dominance ratio predictions were very accurate. The data did suggest it was likely that Nadal would lose 1 or 2 matches at the 3 tournaments - and there were times when he seemed on the verge of losing - but credit to the CGOAT he pulled through and swept!!

D/R post Monte Carlo - predicted 1.28-1.34
D/R post Monte Carlo - actual 1.38

D/R Roland Garros - predicted 1.32-1.40
D/R Roland Garros - actual 1.38

Number of losses - predicted 1-2
Number of losses - actual 0
 
Last edited:
D

Deleted member 307496

Guest
Final at Hamburg, final at Rome, final at Roland Garros.
 

Flash O'Groove

Hall of Fame
I didn't had the time to look at the data set. But I think your model has a problem. Your independent variable and your dependent variable are both linked closely to another variable, more direct, which is Nadal clay level and consistency.

I guess predicting a Nadal success at RG is a bit pointless, as long as he is in a decent form in the previous tournament.
 
M

monfed

Guest
Well let's hope he doesn't show up at all, that way everyone is happy. :lol:
 

Sorana fan

Banned
I think you didnt take into account Nadal's injury and the fact that he is improving every tournament to (hopefully) reach his former level of play.

If that finally happens, your extrapolation will fail because it assumes a constant level of fitness
 
D

Deleted member 307496

Guest
I think you didnt take into account Nadal's injury and the fact that he is improving every tournament to (hopefully) reach his former level of play.

If that finally happens, your extrapolation will fail because it assumes a constant level of fitness
He hasn't had a constant level of fitness, he played a few tournaments, built himself up to some of his former level, and then took two months off and wasted it all away.
 
Last edited by a moderator:

falstaff78

Hall of Fame
I didn't had the time to look at the data set. But I think your model has a problem. Your independent variable and your dependent variable are both linked closely to another variable, more direct, which is Nadal clay level and consistency.

I guess predicting a Nadal success at RG is a bit pointless, as long as he is in a decent form in the previous tournament.

Hey - good comment. I thought about that issue. I am not trying to isolate a CAUSAL link between D/R at barc and MC, and performance later. in which case I agree i'd have to worry about omitted variables, which would bias the estimates.

rather, I'm looking at a naive correlation between the two. which is merely an empirical matter - i.e. empirically, how does a D/R of 1.36 correlate with future outcomes.

so I should be ok no?
 

falstaff78

Hall of Fame
I think you didnt take into account Nadal's injury and the fact that he is improving every tournament to (hopefully) reach his former level of play.

If that finally happens, your extrapolation will fail because it assumes a constant level of fitness

Did you you even bother to read my posts?

(1) If you look at the first post I explicitly mention that Nadal's game has been improving throughout the clay season. In fact I go so far as to provide evidence - by showing that in the first 3 tournaments he played he achieved a D/R of 1.37 vs. players of average rank 80, and in the last two he has achieved the same D/R against players of average rank 30.

(2) Further, I explicitly mentioned in post no. 4 that the explanatory power of D/R at Barcelona and Monte Carlo is a third - or if we restrict to the last 5 years then two thirds. That's a full 1/3 of variation left to be explained by factors including, among others, variations in fitness, variations in age, etc etc.

(3) Finally, an extrapolation by definition cannot fail. A prediction can fail. An extrapolation is merely a representation of what historical data tell us. But if historical data were always followed, records would never break. This is just an indication of the weight of history confronting Nadal. Surprise, surprise, I said this in post no. 4.
 
Last edited:

Flash O'Groove

Hall of Fame
Hey - good comment. I thought about that issue. I am not trying to isolate a CAUSAL link between D/R at barc and MC, and performance later. in which case I agree i'd have to worry about omitted variables, which would bias the estimates.

rather, I'm looking at a naive correlation between the two. which is merely an empirical matter - i.e. empirically, how does a D/R of 1.36 correlate with future outcomes.

so I should be ok no?

I can't really understand the point of your work. Stats are especially useful to tell us information which are not easy to get by an other mean (like your work on the age of slam winners), but in this case, you only show that playing well correlate with playing well. The D/R is not needed to identify someone who is playing well.

Your minimal treshold of 1.36 is completly skewed by Nadal, because you identified it by looking at Nadal's threshold the only year he did not win RG. Nadal is not a good case to gather datas because he is Nadal. Nadal wins MC, he wins Barcelona, he wins RG. We know that.

But it don't work with any other RG winner. What are the datas for Federer, Gaudio, Ferrero, Kuerten? Fed reached the 3rd of MC in 2009. All the other RG winners were not able to show the same domination than Nadal before their titles.
 

mightyrick

Legend
I like the modeling. However, I question any kind of necessity or importance of it for these purposes.

In tennis, it's pretty straightforward. If someone is does excellent through the clay season, then the probability of continuing to do so in the French Open is high. The same can be said of the hardcourt season. There are few variables.

When someone makes all of the finals in the clay court season, then their likelihood of making the final at the French Open is higher than anyone else.

Again, I think the modeling is great, but I'm not sure it is needed to determine who is likely to go to the finals of the French Open.
 

falstaff78

Hall of Fame
I like the modeling. However, I question any kind of necessity or importance of it for these purposes.

In tennis, it's pretty straightforward. If someone is does excellent through the clay season, then the probability of continuing to do so in the French Open is high. The same can be said of the hardcourt season. There are few variables.

When someone makes all of the finals in the clay court season, then their likelihood of making the final at the French Open is higher than anyone else.

Again, I think the modeling is great, but I'm not sure it is needed to determine who is likely to go to the finals of the French Open.

I can't really understand the point of your work. Stats are especially useful to tell us information which are not easy to get by an other mean (like your work on the age of slam winners), but in this case, you only show that playing well correlate with playing well. The D/R is not needed to identify someone who is playing well.

Your minimal treshold of 1.36 is completly skewed by Nadal, because you identified it by looking at Nadal's threshold the only year he did not win RG. Nadal is not a good case to gather datas because he is Nadal. Nadal wins MC, he wins Barcelona, he wins RG. We know that.

But it don't work with any other RG winner. What are the datas for Federer, Gaudio, Ferrero, Kuerten? Fed reached the 3rd of MC in 2009. All the other RG winners were not able to show the same domination than Nadal before their titles.

Guys thanks a lot for the comments. These are both good questions - I thought I had already addressed them in the posts above. The fact that you are asking means it is not clear, and I have to edit my OPs! :)

Anyway, please look at the table below, which summarizes Nadal's clay results from 2005 onwards.

rkbfpw.jpg


Now despite the fact that Nadal has won a lot on clay in recent years, there is considerable variation in how well he does. In 2008, 2010 and 2012 he was well nigh unbeatable on clay. However in 2006 he was vulnerable to Federer at Rome and at RG, in 2009 he lost RG outright, and in 2011 he was pushed to 5 sets by Isner at RG, and to 4 by Federer in the final.

The problem that confronts us is that looking at Nadal's performance in Barcelona and Monte Carlo alone is not sufficient information to predict how he is going to do for the rest of the season. Because he always wins!

We therefore need some measure of performance beyond just the result, which can help us predict how the remainder of his season is going to go. This is where dominance ratio can help us. Despite the fact that Nadal generally wins at both MC and Barcelona, some years he does so less dominantly than others.

These posts are an attempt to justify that dominance ratio has predictive power - for Nadal's performance in particular and for all players in general. And to see what the historical trend suggests for how well Nadal will do going forward.

Thanks for the suggestion. I will update the OP to make it clearer.

- F
 

Fiji

Legend
Let me check wikipedia.

Last 5 years. Going further back would be irrelevant since the tour is very different now than in 2006 and 2007.

2008: 4 clay titles
2009: 3 clay titles
2010: 4 clay titles
2011: 3 clay titles
2012: 4 clay titles

Looks like he only wins 3 titles during the European clay season in odd years.

He already won 1 title during the European clay season, so he might win 1 or 2 more titles. But he is almost 27 and Djokovic is better than ever on clay now so Nadal might just win 1 of the next 3 clay titles. This could be his worst European clay season with just 2 titles. Will that sole title be the FO? Rome? Madrid? Ask Djokovic.
 

Fate Archer

Hall of Fame
First of all thanks for the post and the references, tennis abstract is indeed a great site for tennis stats.

Now I was in the middle of creating a thread comparing Nadal's Roland Garros 2008 and 2012 runs, and one of the things I look at is the average DR of both runs.

I summed all the DR's and divided by the number of rounds (7), but my numbers don't match the ones in your collumn (got 1.70 for 2008 and 1.78 for 2012). I wonder if I'm doing something completely wrong and there is something on the formula that disallows this intuitive way of getting averages .
 

falstaff78

Hall of Fame
First of all thanks for the post and the references, tennis abstract is indeed a great site for tennis stats.

Now I was in the middle of creating a thread comparing Nadal's Roland Garros 2008 and 2012 runs, and one of the things I look at is the average DR of both runs.

I summed all the DR's and divided by the number of rounds (7), but my numbers don't match the ones in your collumn (got 1.70 for 2008 and 1.78 for 2012). I wonder if I'm doing something completely wrong and there is something on the formula that disallows this intuitive way of getting averages .

Hey so sorry to not respond for this long. I guess I just didn't notice your comment.

D/R is essentially a ratio of two ratios. So taking simple averages of D/Rs is only going to be a very rough approximation. Hope the following example helps.

Match 1 (3 setter). I win 20 off 60 return points (ie 33.3%). I lose 16 of 60 points on my serve (ie 26.7%). My D/R is 33.3% / 26.7% = 1.25

Match 2. (5 setter). I win 40 off 120 return points (ie 33.3%). I lose 44 off 120 points on serve (ie 36.7%). My D/R is 33.3% / 36.7% = 0.90

To calculate my total D/R we'd take my TOTAL winning pct on return points (60 / 180 = 33.3%) divided by losing pct on serve (also 60 / 180 = 33.3%) so my overall D/R = 33.3% / 33.3% = 1.0

However a straight average of my D/Rs would be (1.25 + 0.90) / 2 = 1.08!

In general the more similar the number of return and serve points in each match the better an approximation a straight average will give you! It's always fun to try and play around with hypotheticals like this in a spreadsheet!

Tennis abstract will do the full calculation for whichever subset of matches you tell it to - which is only one of the things which is so awesome about it.

Hope this was helpful and my sincere apologies for not replying unprompted!

- F
 
Last edited:

Fate Archer

Hall of Fame
Hi again falstaff, thanks for the reply and don't worry about the timing, with the traffic this board gets threads with few replies get buried very quickly.

Your example was pretty clear and helpful, though it's somewhat a bummer that it only works as a rough estimate, I thought I was looking at something quite curious.

Still, I wonder how the dominance ratios compare between these two tournaments I refered (2008 and 2012 RG) if we ignore the finals (as to get the average DR up till the semi finas).

I will try looking that up on tennis abstract to see if I can get something that corroborates to the original point I was trying to convey.

Thanks again and sorry for the late reply on my part this time. :)
 

Fiji

Legend
Let me check wikipedia.

Last 5 years. Going further back would be irrelevant since the tour is very different now than in 2006 and 2007.

2008: 4 clay titles
2009: 3 clay titles
2010: 4 clay titles
2011: 3 clay titles
2012: 4 clay titles

Looks like he only wins 3 titles during the European clay season in odd years.

He already won 1 title during the European clay season, so he might win 1 or 2 more titles. But he is almost 27 and Djokovic is better than ever on clay now so Nadal might just win 1 of the next 3 clay titles. This could be his worst European clay season with just 2 titles. Will that sole title be the FO? Rome? Madrid? Ask Djokovic.

Nadal won Barcelona, Madrid and Rome, so he already won his typical 3 clay titles of the european clay season in odd years. RG goes to Nole.
 

vive le beau jeu !

Talk Tennis Guru
2z7ptgw.jpg


ummm i see... i see another soderlingement day !
smiley-bounce012.gif
smiley-bounce012.gif
smiley-bounce012.gif

i see lots of happiness ! i see people dancing in the street to celebrate the fall of the golden bull !
 

falstaff78

Hall of Fame
Comparing forecasts to actual results

In this thread we used Nadal's performances on European clay upto Monte-Carlo to gauge how he might do at Madrid, Rome and Paris. The approach was to use historical performances of his own and other clay court greats as a guide.

Now that the clay season has concluded we can see how he performed relative to the predictions. Turns out the dominance ratio predictions were very accurate. The data did suggest it was likely that Nadal would lose 1 or 2 matches at the 3 tournaments - and there were times when he seemed on the verge of losing - but credit to the CGOAT he pulled through and swept!!

D/R post Monte Carlo - predicted 1.28-1.34
D/R post Monte Carlo - actual 1.38

D/R Roland Garros - predicted 1.32-1.40
D/R Roland Garros - actual 1.38

Number of losses - predicted 1-2
Number of losses - actual 0
 
Top