The Wreck

02-16-2012, 03:16 PM

Had some thoughts rattling around in my head while watching tennis the other day, and always been interested in statistics, so I made a quick blog (temporary, really) to house my thoughts for the time being. Thought I'd post it here and see what people thought and what suggestions you could provide. Thanks.

When watching a tennis match you’ve undoubtedly seen the statistic flashed on the screen: ‘Break Point %’, or something similar. That is, the amount of times a player converted a break point when they had a break point opportunity. You may even have seen the statistic, tracked over the course of a season, percentage of return games won. Successfully returning serve is arguably the most important part of winning in tennis, so it’s good that we have metrics that attempt to show return of serve prowess. Unfortunately the two statistics I’ve mentioned are inherently flawed and can often lead us to false conclusions.

Break point conversion rates sound like a good metric; how often did the player win an important break point when they had the chance. But this percentage on its own can often disguise what really happened. A player who has five break point opportunities and converts on two of them has a break point conversion rate of 40% (the average for most ATP players). But let’s say their opponent in that match had ten break point opportunities and converted on four them. Their percentage is also 40%, but they broke twice as many times. This is where I feel the statistic is flawed.

There may be a game where you get the score to 30-40 in your favor, win the next point, and your break point conversion rate is 100%. Or you could have a game where you go to several deuces, back and forth, and finally convert on your fourth break point opportunity in that game. Your conversion rate is 25%. Why should we care how many tries it took you to break as long as you do actually break serve in the end? The end result is the same. While it’s true a better returner may convert within that one game on fewer opportunities, it’s irrelevant really. You don’t get bonus points in tennis for being efficient.

Percentage of return games won aims to correct for some of the shortcomings of the previous statistic. You get the frequency with which someone breaks and nothing else. Sounds pretty good. As far as measuring just returning ability this statistic works fairly well. But when evaluating a player as a whole, it too can mislead. Say you’re on serve with your opponent at 3-3. You get a break and go up 4-3, but on your very next service game you’re broken back and the score is again tied, 4-4. Yes, you broke your opponent so your percentage of return games won increases. But you were immediately broken back which negated your great return game entirely. A break is only useful if you keep the advantage. So sure, it’d be great to have 50% of return games won, but if you are immediately broken back half those times, you’re not getting ahead of your opponents very often.

With that being said, I’ve created what I think is a more comprehensive (though not perfect) statistic called Effective Break Percentage (or Proportion), or as I’ll refer to it, EBP. The formula for the statistic is as follows:

(Breaks – Breakbacks) / (Games with Break Chances)

That is, the number of games in which the player successfully broke their opponent’s serve less the games in which they were immediately broken back (excluding times when serve was broken to win the set). That number is then divided by the number of games in which the player had at least one break point opportunity. To further clarify: over the course of a match you have break point chances in 8 games, you actually break in 3 of them, and after one of those games your opponent immediately broke you back. Your Effective Break Percentage for that match is ((3-1)/8 ) or 2/8 or 25%.

This statistic isn’t entirely without faults. In percentage form, you still face the issue of 1 out of 4 being treated the same as 3 out of 12. So when looking at an individual match, I believe it’s better to leave it as a proportion. Over the course of an entire season though, the amount of games should be high enough to more accurately show who’s more proficient at earning and keeping breaks. I’m open to suggestions on how to account for the discrepancy in individual matches, though.

It’s also a difficult stat to actually compute. You can’t really retroactively go and calculate this percentage without point-by-point breakdowns of matches, which are only available for majors, as far as I know. So unless the Tennis Channel decided to adopt this idea, you’ll have to carefully track the points when watching a match.

So why bother? Well, break points have been considered by many to be the most important statistic to focus on in determining the winner of a match. Research has been done to show that that may not actually be the case. I’m not convinced entirely, though. In the future I’d like to test the predictive power of EBP and see if it explains wins any more than break point conversion does.

What’re your thoughts? Is it a useful statistic worth tracking or just as unreliable as break point conversion? As mentioned, any suggestions on how to improve this is much appreciated, and hopefully I can work with the data further in the future.

http://tennismetrics.wordpress.com/

When watching a tennis match you’ve undoubtedly seen the statistic flashed on the screen: ‘Break Point %’, or something similar. That is, the amount of times a player converted a break point when they had a break point opportunity. You may even have seen the statistic, tracked over the course of a season, percentage of return games won. Successfully returning serve is arguably the most important part of winning in tennis, so it’s good that we have metrics that attempt to show return of serve prowess. Unfortunately the two statistics I’ve mentioned are inherently flawed and can often lead us to false conclusions.

Break point conversion rates sound like a good metric; how often did the player win an important break point when they had the chance. But this percentage on its own can often disguise what really happened. A player who has five break point opportunities and converts on two of them has a break point conversion rate of 40% (the average for most ATP players). But let’s say their opponent in that match had ten break point opportunities and converted on four them. Their percentage is also 40%, but they broke twice as many times. This is where I feel the statistic is flawed.

There may be a game where you get the score to 30-40 in your favor, win the next point, and your break point conversion rate is 100%. Or you could have a game where you go to several deuces, back and forth, and finally convert on your fourth break point opportunity in that game. Your conversion rate is 25%. Why should we care how many tries it took you to break as long as you do actually break serve in the end? The end result is the same. While it’s true a better returner may convert within that one game on fewer opportunities, it’s irrelevant really. You don’t get bonus points in tennis for being efficient.

Percentage of return games won aims to correct for some of the shortcomings of the previous statistic. You get the frequency with which someone breaks and nothing else. Sounds pretty good. As far as measuring just returning ability this statistic works fairly well. But when evaluating a player as a whole, it too can mislead. Say you’re on serve with your opponent at 3-3. You get a break and go up 4-3, but on your very next service game you’re broken back and the score is again tied, 4-4. Yes, you broke your opponent so your percentage of return games won increases. But you were immediately broken back which negated your great return game entirely. A break is only useful if you keep the advantage. So sure, it’d be great to have 50% of return games won, but if you are immediately broken back half those times, you’re not getting ahead of your opponents very often.

With that being said, I’ve created what I think is a more comprehensive (though not perfect) statistic called Effective Break Percentage (or Proportion), or as I’ll refer to it, EBP. The formula for the statistic is as follows:

(Breaks – Breakbacks) / (Games with Break Chances)

That is, the number of games in which the player successfully broke their opponent’s serve less the games in which they were immediately broken back (excluding times when serve was broken to win the set). That number is then divided by the number of games in which the player had at least one break point opportunity. To further clarify: over the course of a match you have break point chances in 8 games, you actually break in 3 of them, and after one of those games your opponent immediately broke you back. Your Effective Break Percentage for that match is ((3-1)/8 ) or 2/8 or 25%.

This statistic isn’t entirely without faults. In percentage form, you still face the issue of 1 out of 4 being treated the same as 3 out of 12. So when looking at an individual match, I believe it’s better to leave it as a proportion. Over the course of an entire season though, the amount of games should be high enough to more accurately show who’s more proficient at earning and keeping breaks. I’m open to suggestions on how to account for the discrepancy in individual matches, though.

It’s also a difficult stat to actually compute. You can’t really retroactively go and calculate this percentage without point-by-point breakdowns of matches, which are only available for majors, as far as I know. So unless the Tennis Channel decided to adopt this idea, you’ll have to carefully track the points when watching a match.

So why bother? Well, break points have been considered by many to be the most important statistic to focus on in determining the winner of a match. Research has been done to show that that may not actually be the case. I’m not convinced entirely, though. In the future I’d like to test the predictive power of EBP and see if it explains wins any more than break point conversion does.

What’re your thoughts? Is it a useful statistic worth tracking or just as unreliable as break point conversion? As mentioned, any suggestions on how to improve this is much appreciated, and hopefully I can work with the data further in the future.

http://tennismetrics.wordpress.com/