Tennis match score simulator: % points won -> % matches won

gsung

New User
This morning I was grabbing my coffee and watching a youtube video from TennCom, where he mentioned he usually loses 10-5 to his friend in some practice tiebreaks (i.e. he wins 33% of points). I've been wondering for a long time, if a player has x% chance to win a point, what would be the expected match result? Also I saw this thread when searching for something similar: https://tt.tennis-warehouse.com/index.php?threads/stats-guys-55-games-won-70-matches-won.508622/

Since I've been experimenting a lot with AI coding tools, I finally got off my lazy bum and wrote a simulator for this (or should I say, AI allowed me to remain a lazy bum and still "write" the simulator). Here's what I cooked up this afternoon:


Some interesting results from the simulation -- each row is based on 100,000 simulated matches:
Win probability per point (%)Match win probability (%)Expected games per set
5050 (duh)4.83 - 4.84
51605.06 - 4.58
52705.26 - 4.31
53795.45 - 4.02
54865.59 - 3.75
55915.71 - 3.45
60100 (rounded)5.98 - 2.15 <-- 6-2 expected set score
67 <-- winning 10-5 in a tiebreak1006.00 - 0.97 <-- 6-1 expected set score
721006.00 - 0.50 <-- expect to get one game in a match

For those curious, the simulator's code is here: https://github.com/georgesung/tennis-match-simulator
 
Last edited:

esgee48

G.O.A.T.
I recall reading somewhere the point winning probability of 51% is winning set 6,0 or 6,1. And that a 50.1% point winning probability is your 6,4 or 7,5 result. The probability of losing or winning the point changes as the point progresses. The more often the ball crosses the net, the more the advantage increases for the player who had the advantage to start the match with.
 

gsung

New User
From my simulation if the point winning probability is 51%, the opponent is expected to get 4.58 games per set (note this includes 7-5 and 7-6 sets, and sets where the player loses). Here are some results of simulated matches with 51% point winning probability:

Match 1: 6-3 6-3
Match 2: 4-6 6-1 6-4
Match 3: 6-4 3-6 7-6
Match 4: 1-6 6-2 5-7
Match 5: 6-4 6-0
Match 6: 6-1 3-6 4-6
Match 7: 0-6 6-4 6-0
Match 8: 7-6 6-2
Match 9: 6-4 6-3
Match 10: 2-6 7-6 7-5

Update: I just added the option to simulate no-ad scoring, match tiebreaks (10-point tiebreak in lieu of 3rd set), and Fast 4 format. As expected, each of these changes pushes the match win probability closer to 50-50.
 

travlerajm

Talk Tennis Guru
The AI is too dumb. It says 60% win probability means you win 100% of the matches.

But… tennis doesn’t work that way. You might have 60% point win probability one day, and then against the same opponent, 40% tomorrow.

In other words, yes you have zero chance to win today. But tomorrow might be your day!
 

gsung

New User
The AI is too dumb. It says 60% win probability means you win 100% of the matches.

But… tennis doesn’t work that way. You might have 60% point win probability one day, and then against the same opponent, 40% tomorrow.

Yes this assumes every point has the same win probability, so it doesn't account for different win probabilities when serving/receiving, when the score is close and a player is nervous, players tanking when it's a blowout, etc. Also as you mentioned variations in performance across different days, weather conditions, courts, etc. Also the 100% was due to rounding :D It's closer to 99.64% (I updated the code to show a couple more numbers at the end).

Interestingly, you can see in Federer's career he's won 54% of points, and his overall win-loss is 1251 - 275, i.e. 82% match win % (https://www.atptour.com/en/players/roger-federer/f324/player-stats?year=all&surface=all). From the simulation a 54% point win probability should result in 86% match win rate, which is a bit off from Federer's actual win rate. But if you adjust the point win % slightly to 53.4%, you'll get around Federer's 82% match win rate. In any case, this simple simulation is definitely missing some nuance, it was mostly a curious exercise on a Sunday afternoon :)

If there was a takeaway for me playing matches, mentally, I can remind myself in a tight match that if I win just 52 or 53% of the points, I'm in a good position to win the match (70-80% chance). The margins are pretty small.

PS: The simulation script isn't really "AI" as people speak of it today, it's just some simple logic. AI just helped me write it.
 

travlerajm

Talk Tennis Guru
One of the odd things about tennis is that it’s more possible than you would think to score an upset against someone 2 ntrp levels better than you… and also more possible than you would think to lose to someone 2 ntrp levels below you.

The factors that create volatility in your level tend to get underestimated. Things like conditions changes, racquet changes, clothing changes.

If I usually win 55% of points against my hitting partner, I will usually win by a lopsided score. But it doesn’t take much perturbation to change 55% to 49%.
 

gsung

New User
One of the odd things about tennis is that it’s more possible than you would think to score an upset against someone 2 ntrp levels better than you… and also more possible than you would think to lose to someone 2 ntrp levels below you.

The factors that create volatility in your level tend to get underestimated. Things like conditions changes, racquet changes, clothing changes.

If I usually win 55% of points against my hitting partner, I will usually win by a lopsided score. But it doesn’t take much perturbation to change 55% to 49%.

Now that you mention it, I can see why the top pros seem so focused and serious even when playing against opponents ranked much lower than them. Maintaining that edge and winning 53-54% of the points versus letting up a bit and winning 3-4% less points, is the difference between a confident match win % and a 50/50 match.
 

Connor35

Semi-Pro
This morning I was grabbing my coffee and watching a youtube video from TennCom, where he mentioned he usually loses 10-5 to his friend in some practice tiebreaks (i.e. he wins 33% of points). I've been wondering for a long time, if a player has x% chance to win a point, what would be the expected match result? Also I saw this thread when searching for something similar: https://tt.tennis-warehouse.com/index.php?threads/stats-guys-55-games-won-70-matches-won.508622/

Since I've been experimenting a lot with AI coding tools, I finally got off my lazy bum and wrote a simulator for this (or should I say, AI allowed me to remain a lazy bum and still "write" the simulator). Here's what I cooked up this afternoon:


Some interesting results from the simulation -- each row is based on 100,000 simulated matches:
Win probability per point (%)Match win probability (%)Expected games per set
5050 (duh)4.83 - 4.84
51605.06 - 4.58
52705.26 - 4.31
53795.45 - 4.02
54865.59 - 3.75
55915.71 - 3.45
60100 (rounded)5.98 - 2.15 <-- 6-2 expected set score
67 <-- winning 10-5 in a tiebreak1006.00 - 0.97 <-- 6-1 expected set score
721006.00 - 0.50 <-- expect to get one game in a match

For those curious, the simulator's code is here: https://github.com/georgesung/tennis-match-simulator


This would be quick to simulate, so I may do it for fun & validation.

Did you have a different Pr(Win Point) on serve vs. return -- which would mimic real life?
Or Justus the same Pr(Win point) for all points?
 

TennisOTM

Professional
There is a technique using Markov chain theory that can calculate those exact probabilities without the need for simulations. Here is one I downloaded and played with: https://github.com/Seb943/Markov4Tennis

It looks like your results for match win probability agree with the results I get from that code.

point win probabilityset win probabilitymatch win probability (3 full sets)
0.51​
57.12%​
60.60%​
0.52​
64.01%​
70.47%​
0.53​
70.47%​
78.99%​
0.54​
76.34%​
85.86%​
0.55​
81.50%​
91.00%​
0.6​
96.34%​
99.61%​
0.67​
99.88%​
99.9996%​
0.72​
99.996%​
>99.9999%​
 

nolefam_2024

Talk Tennis Guru
There is a technique using Markov chain theory that can calculate those exact probabilities without the need for simulations. Here is one I downloaded and played with: https://github.com/Seb943/Markov4Tennis

It looks like your results for match win probability agree with the results I get from that code.

point win probabilityset win probabilitymatch win probability (3 full sets)
0.51​
57.12%​
60.60%​
0.52​
64.01%​
70.47%​
0.53​
70.47%​
78.99%​
0.54​
76.34%​
85.86%​
0.55​
81.50%​
91.00%​
0.6​
96.34%​
99.61%​
0.67​
99.88%​
99.9996%​
0.72​
99.996%​
>99.9999%​
Now look at big 3 rivalry.

Nole won under 52% pts vs Rafa. They are so close.
 

TennisOTM

Professional
Now look at big 3 rivalry.

Nole won under 52% pts vs Rafa. They are so close.
The Markov code allows you to put in two point probabilities: one for each player winning their service points. That should reflect real-world data more accurately, especially for the pros. Do you have the Nole vs. Rafa point win % for each of their serves?
 

Connor35

Semi-Pro
The Markov code allows you to put in two point probabilities: one for each player winning their service points. That should reflect real-world data more accurately, especially for the pros. Do you have the Nole vs. Rafa point win % for each of their serves?

I should've thought of this! I had this class in grad school, but I never use this methodology, so I tend to forget about it. It's actually quite simple matrix multiplication, especially for a single game.
 

zoingy

Rookie
Did you have a different Pr(Win Point) on serve vs. return -- which would mimic real life?
Or Justus the same Pr(Win point) for all points?
Yea this is pretty important - both winning 100% of their own serves is gonna result in a pretty different distribution than both winning 50% of their own serves
 

TennisOTM

Professional
It's also interesting to look at the probability distribution of different final scores, and there are some results you might not expect. For example, in the case that every point (and thus every game) of a set is a 50/50 coin flip, what is the most likely final score of a set?

You might guess that 7-6 is the most likely score in this perfectly even matchup, but it's actually 6-4, which is twice as likely to occur than 7-6. Even 6-2 is more likely to happen than 7-6. Here's the breakdown by rank order of likelihood:

1. 6-4 (24.6%)
2. 6-3 (21.9%)
3. 6-2 (16.4%)
t4. 7-6 (12.3%)
t4. 7-5 (12.3%)
6. 6-1 (9.4%)
7. 6-0 (3.1%)
 

tennis3

Hall of Fame
One of the odd things about tennis is that it’s more possible than you would think to score an upset against someone 2 ntrp levels better than you… and also more possible than you would think to lose to someone 2 ntrp levels below you.

The factors that create volatility in your level tend to get underestimated. Things like conditions changes, racquet changes, clothing changes.
You're saying that if I change my socks, I might beat Federer?
 

tennis3

Hall of Fame
No. But you might have better shot to take out Frank your weekend warrior nemesis.
jesus-frank-always-sunny.gif
 

gsung

New User
This would be quick to simulate, so I may do it for fun & validation.

Did you have a different Pr(Win Point) on serve vs. return -- which would mimic real life?
Or Justus the same Pr(Win point) for all points?

In my original post assumed Pr(win point) was the same for all points.

I just added an option to have separate P(win serve point) and P(win return point) today (under advanced settings), but tbh not sure what to make of it yet. Feel free to experiment.
 

gsung

New User
Yea this is pretty important - both winning 100% of their own serves is gonna result in a pretty different distribution than both winning 50% of their own serves

Both players winning 100% of their own serves will result in an infinitely long match :D Without thinking through I just tried that, and my simulator went into an infinite loop.
 

zoingy

Rookie
Both players winning 100% of their own serves will result in an infinitely long match :D Without thinking through I just tried that, and my simulator went into an infinite loop.
Good thing everything's client side!

Not to mention it's completely symmetric anyway hah

That being said, the spread seems to make very little difference (normalized for serve + return win pct) across normal ranges...tho maybe some interesting boundary effects but idk if I've made any sense of them
 

TennisOTM

Professional
Good thing everything's client side!

Not to mention it's completely symmetric anyway hah

That being said, the spread seems to make very little difference (normalized for serve + return win pct) across normal ranges...tho maybe some interesting boundary effects but idk if I've made any sense of them
What I'm seeing, across normal ranges as you say, is that the underdog has a slightly higher chance of an upset as the favorite's serve / return probabilities get higher / lower, keeping the average the same.

For example if the favorite has a 55% chance of winning an average point, that could be 55% for both serve and return, or could 60%/50% serve/return, 70/40, 80/30, etc.

Here are chances that the favorite wins a set, for each combo:

55/55 -> 81.5%
60/50 -> 81.3%
70/40 -> 79.5%
80/30 -> 76.8%
90/20 -> 76.2%

However if you go even more extreme into non-normal (even for John Isner) ranges, there's a point where the trend reverses direction and it becomes less likely for an upset! For example:

95/15 -> 80.8%
99/11 -> 93.4%

Does the simulation agree with this?
 

gsung

New User
What I'm seeing, across normal ranges as you say, is that the underdog has a slightly higher chance of an upset as the favorite's serve / return probabilities get higher / lower, keeping the average the same.

For example if the favorite has a 55% chance of winning an average point, that could be 55% for both serve and return, or could 60%/50% serve/return, 70/40, 80/30, etc.

Here are chances that the favorite wins a set, for each combo:

55/55 -> 81.5%
60/50 -> 81.3%
70/40 -> 79.5%
80/30 -> 76.8%
90/20 -> 76.2%

However if you go even more extreme into non-normal (even for John Isner) ranges, there's a point where the trend reverses direction and it becomes less likely for an upset! For example:

95/15 -> 80.8%
99/11 -> 93.4%

Does the simulation agree with this?

That's really interesting. The simulation currently doesn't count set win % yet (it would be an easy change to do when I have time), so I simulated the entire match and observed the same trend as you mentioned.

Here are results of different serve/return win probabilities, each run over 500k simulations:

54/54 - 0.8587
64/44 - 0.8495
74/34 - 0.8242
84/24 - 0.7982
94/14 - 0.8466
99/9 - 0.9800

Note this assumes the player has a 50/50 chance to serve first at the start of the match. This may not hold true sometimes, for example if you have a player who always elects to serve first, playing against someone who always elects to return, then there is 100% chance the first player will serve first in that match. I'll update the simulation to have a custom probability for serve-first-in-the-match for Player A, could be interesting.
 

TennisOTM

Professional
Note this assumes the player has a 50/50 chance to serve first at the start of the match. This may not hold true sometimes, for example if you have a player who always elects to serve first, playing against someone who always elects to return, then there is 100% chance the first player will serve first in that match. I'll update the simulation to have a custom probability for serve-first-in-the-match for Player A, could be interesting.
I had not thought of that - all the calculations I did using the Markov model assumed that the favorite served first. However, I just tried the reverse and it did not actually matter. When the favorite receives in the first game, their probability of winning the set or match is always exactly the same as when they serve first.

It does matter for the probability of different set scores. When the favorite serves first you are more likely to get a set score of 6-3, and when the underdog serves first you are more likely to get a set score of 6-4. But the probability of winning the set (by any score) is the same in both cases.
 

onehandbh

G.O.A.T.
What I'm seeing, across normal ranges as you say, is that the underdog has a slightly higher chance of an upset as the favorite's serve / return probabilities get higher / lower, keeping the average the same.

For example if the favorite has a 55% chance of winning an average point, that could be 55% for both serve and return, or could 60%/50% serve/return, 70/40, 80/30, etc.

Here are chances that the favorite wins a set, for each combo:

55/55 -> 81.5%
60/50 -> 81.3%
70/40 -> 79.5%
80/30 -> 76.8%
90/20 -> 76.2%

However if you go even more extreme into non-normal (even for John Isner) ranges, there's a point where the trend reverses direction and it becomes less likely for an upset! For example:

95/15 -> 80.8%
99/11 -> 93.4%

Does the simulation agree with this?
I'm guessing that at that extreme, the likelihood of reaching a tiebreak becomes high and super server has a big advantage on serves in the tiebreak where who serves switches more frequently.
 

schmke

Legend
I couldn't help myself and had to join the party and wrote my own simulator for games/sets/matches. My approach is to do a simulation with many (typically a million) iterations to see how things play out on average. My simulation allows for any number of formats (single set, best of 3, best of 5, best of N) and I can specify who serves first. I can also use no-ad scoring.

I use a random number generator to get a random distribution of point winners across the specified point win percentage which allows for point winning streaks either way, but over many iterations the points won percentage evens out to the specified percentages.

Here are some observations.

If both players have an expected point win rate of 50% on their serve each player wins the set 49.9% to 50.1% of the time across different 1M iteration simulations, pretty much like you'd expect. As I think was noted earlier, the most likely scores are:
  • 6-4 or 4-6 - 12.2% to 12.3% each
  • 6-3 or 3-6 - 10.9% to 11.0% each
  • 6-2 or 2-6 - 8.2% each
  • 7-6 or 6-7 or 7-5 or 5-7 - 6.1% to 6.2% each
  • 6-1 or 1-6 - 4.6% to 4.7% each
  • 6-0 or 0-6 - 1.6% each
It is a little surprising the closer scores were down on the list, but group them together and the total is around 24.6% which is right about the same or a bit higher than the two 6-4 scores together.

If player1's point win rate goes to 52%, they end up winning about 57.1%-57.2% of the sets if they serve first. If they serve second it is perhaps a smidge lower at 57.0%-57.2%. In the case of serving first, the most common scores are:
  • 6-3 - 13.1%
  • 6-4 - 12.9%
  • 4-6 - 11.5%
  • 6-2 - 9.7%
  • 3-6 - 8.8%
  • 2-6 - 6.8%
  • 7-5 - 6.7%
  • 7-6 - 6.5%
  • 6-1 - 6.2%
In the case of serving second, the most common scores are:
  • 6-4 - 14.1%
  • 6-3 - 12.0%
  • 4-6 - 10.5%
  • 6-2 - 10.2%
  • 3-6 - 9.8%
  • 7-5 - 6.7%
  • 7-6 - 6.4%
  • 2-6 - 6.4%
If rather than a single set, the match is best 2 out of 3, the serving first win percentage goes to 60.7% and the serving second to 60.5%.

Going to best of 5 it goes to 63.3% and 63.2%.

So the longer the match, the more likely the stronger player wins.

Going back to a single set, but using no-ad scoring, when serving first the win rate drops to 56.3% and serving second, 56.2%. So no-ad scoring does affect things and give the underdog a slightly better chance.

Last (for now), looking at the serve/return splits, here is @TennisOTM's data with my simulation's numbers (favorite serving first) after:

55/55 -> 81.5% - 81.6%
60/50 -> 81.3% - 81.3%
70/40 -> 79.5% - 79.5%
80/30 -> 76.8% - 76.8%
90/20 -> 76.2% - 76.2%
95/15 -> 80.8% - 80.8%
99/11 -> 93.4% - 93.4%

So agree more or less completely here.

Happy to run any other scenarios you might be interested in.

@TennisOTM @gsung
 

anarosevoli

Semi-Pro
Very interesting. Only criticism is that you sound like the win probability could be different serving first or second. That's definitely not the case, it's absolutely the same (these minimal differences you found can only be statistical errors). The probabilities of the individual scores depend from serving first or second as you pointed out but not the overall win probability. The sum of the percentages of the individual winning and losing scores must always be the same for serving first and second.

For every serve game there's a return game, it's absolutely symmetrical. Only the last serve game of the second serving player will not be played if he can't win anymore. It's obvious that this has no influence on win probability, only on the score (6-3 with one break vs. 6-4 with one break). Tie-break is also absolutely symmetrical (for every serve point there's a return point, only the last serve of the second serving player is not being played if he lost already, which obviously has no influence on winning or losing anymore). No chance that anywhere in normal tennis there is a mathematical advantage serving first or second. It can only happen in tournaments with RR mode where exact score counts in the table, here serving first could be of advantage.
 

schmke

Legend
You are right, any differences I noticed are in the statistical noise. If I do more runs of a simulation the slight variation between them is basically the same.

But yes, the more interesting part is the most likely score does change as you'd expect.
 

TennisOTM

Professional
I'm guessing that at that extreme, the likelihood of reaching a tiebreak becomes high and super server has a big advantage on serves in the tiebreak where who serves switches more frequently.
Yes, at very high serving strength for both players, the tiebreak becomes virtually guaranteed. Once a server's point win rate rises past 91%, their probability of holding serve in a regular game passes 99.9%. If both players are getting close to that high, then the probability of winning a set is pretty much equal to the probability of winning a tiebreak.

The chances of winning the tiebreak is maybe not well characterized by the favorite's average point win probability. Perhaps a better way to look at it is the ratio of probabilities of winning a return point:

In the 95/15 case (favorite's serve/return win %), the favorite has a 15% chance of winning a return point and the underdog has a 5% chance, so a 3:1 ratio. Whereas in the 99/11 case, it's an 11:1 ratio. So even though both cases give an average 55% chance that the favorite wins a point, the second case makes it much more likely that the favorite will get the first mini-break needed to win the tiebreak.

Interesting, though, that in the less extreme cases where the tiebreak is less likely to be reached, the trend goes in the other direction. Not sure I have my head around that one yet...
 

TennisOTM

Professional
Last (for now), looking at the serve/return splits, here is @TennisOTM's data with my simulation's numbers (favorite serving first) after:

55/55 -> 81.5% - 81.6%
60/50 -> 81.3% - 81.3%
70/40 -> 79.5% - 79.5%
80/30 -> 76.8% - 76.8%
90/20 -> 76.2% - 76.2%
95/15 -> 80.8% - 80.8%
99/11 -> 93.4% - 93.4%

So agree more or less completely here.

Happy to run any other scenarios you might be interested in.
Awesome, seems the Markov-chain probability code is working!

One scenario that would be interesting is re-running the quoted simulations using old-school sets where no tiebreaker is used - just keep simulating regular games until someone is up by two. I wonder if the down-then-up trend would still be there, or if that result is just because of the tiebreaker?
 

zoingy

Rookie
Interesting, though, that in the less extreme cases where the tiebreak is less likely to be reached, the trend goes in the other direction. Not sure I have my head around that one yet...
I cobbled together a quick sim to break down set wins - there appears to be a bit of a Simpson's paradox-esque effect happening. Seems like the crucial bit is that an advantage in overall point win percentage is less realizable in TBs than in set-games, at all levels of serve/return win% spread. Or in other words, the underdog wins relatively more in TB sets than in non-TB-reaching sets, regardless of servebotness.

So there's a spot where an increase in servebotness increases the chances of reaching a TB, but the TB win% hasn't gone up enough to compensate.

Chart plots various proportions vs serve winrate, with serve + return winrate = 1.1.

qF0M47g.png
 

TennisOTM

Professional
I cobbled together a quick sim to break down set wins - there appears to be a bit of a Simpson's paradox-esque effect happening. Seems like the crucial bit is that an advantage in overall point win percentage is less realizable in TBs than in set-games, at all levels of serve/return win% spread. Or in other words, the underdog wins relatively more in TB sets than in non-TB-reaching sets, regardless of servebotness.

So there's a spot where an increase in servebotness increases the chances of reaching a TB, but the TB win% hasn't gone up enough to compensate.

Chart plots various proportions vs serve winrate, with serve + return winrate = 1.1.

qF0M47g.png
Ahh yeah that's it. Here's another version of the prior result that illustrates the same thing:

prob(win srv/ret point)prob(no tb)prob(win set : no tb)prob(tb)prob(win set : tb)prob(win set)
55/55
91.5%​
83.0%​
8.5%​
65.4%​
81.5%​
60/50
90.6%​
82.9%​
9.4%​
65.5%​
81.3%​
70/40
81.0%​
82.6%​
19.0%​
66.3%​
79.5%​
80/30
50.2%​
85.2%​
49.8%​
68.3%​
76.8%​
90/20
13.1%​
94.1%​
86.9%​
73.5%​
76.2%​
95/15
4.3%​
98.7%​
95.7%​
80.0%​
80.8%​
99/11
1.3%​
99.99%​
98.7%​
93.3%​
93.4%​

You can see that all the columns except for the last one (overall set win %) trend in the same direction. As the "servebotness" increases, the favorite's chances of winning non-tiebreak sets goes up, and their chances of winning tiebreaks goes up too. But because the underdog has a better chance in TB sets across every row, they can benefit overall if there's a higher chance of getting there (to a point).
 

schmke

Legend
Awesome, seems the Markov-chain probability code is working!

One scenario that would be interesting is re-running the quoted simulations using old-school sets where no tiebreaker is used - just keep simulating regular games until someone is up by two. I wonder if the down-then-up trend would still be there, or if that result is just because of the tiebreaker?
I can do that! I have an option to not use tie-break sets.

Just for fun, I did a simulation where each server won 90% of their service points. The results vary, but hundreds of games to even thousands occur in the set in some cases. At 80% each there are a few normal looking sets but quite a few still in the 20s or higher. The highest I saw was 77-75.

But taking your table again and changing my column to be with no tie-breaks:

55/55 -> 81.5% - 82.2%
60/50 -> 81.3% - 82.1%
70/40 -> 79.5% - 81.4%
80/30 -> 76.8% - 84.2%
90/20 -> 76.2% - 93.9%
95/15 -> 80.8% - 98.7%
99/11 -> 93.4% - 99.9923%

So the inflection point moves, but still goes down initially then up. And in the last case, there are a lot of sets that go to the many thousands of games, but the most common single set score is still 6-3.
 

schmke

Legend
I can do that! I have an option to not use tie-break sets.

Just for fun, I did a simulation where each server won 90% of their service points. The results vary, but hundreds of games to even thousands occur in the set in some cases. At 80% each there are a few normal looking sets but quite a few still in the 20s or higher. The highest I saw was 77-75.

But taking your table again and changing my column to be with no tie-breaks:

55/55 -> 81.5% - 82.2%
60/50 -> 81.3% - 82.1%
70/40 -> 79.5% - 81.4%
80/30 -> 76.8% - 84.2%
90/20 -> 76.2% - 93.9%
95/15 -> 80.8% - 98.7%
99/11 -> 93.4% - 99.9923%

So the inflection point moves, but still goes down initially then up. And in the last case, there are a lot of sets that go to the many thousands of games, but the most common single set score is still 6-3.
It is worth noting that in your scenarios, while averaging the points won percentage for each server comes out to 0.55, the points won/lost doesn't necessarily come out to 55% due to there being more/fewer points played on each player's serve in the different scenarios.

Going back to using tie-breaks and adding the actual points won percentage:

55/55 -> 81.5% - 55.0%
60/50 -> 81.3% - 55.1%
70/40 -> 79.5% - 54.9%
80/30 -> 76.8% - 54.0%
90/20 -> 76.2% - 53.3%
95/15 -> 80.8% - 53.2%
99/11 -> 93.4% - 53.3%

This makes sense as player 1's games are ever so slightly easier as the percentages go up so slightly fewer points are played on their serve on average and more on player 2's, so the percentage of points won overall by player 1 drops slightly.
 

Moon Shooter

Hall of Fame
It's also interesting to look at the probability distribution of different final scores, and there are some results you might not expect. For example, in the case that every point (and thus every game) of a set is a 50/50 coin flip, what is the most likely final score of a set?

You might guess that 7-6 is the most likely score in this perfectly even matchup, but it's actually 6-4, which is twice as likely to occur than 7-6. Even 6-2 is more likely to happen than 7-6. Here's the breakdown by rank order of likelihood:

1. 6-4 (24.6%)
2. 6-3 (21.9%)
3. 6-2 (16.4%)
t4. 7-6 (12.3%)
t4. 7-5 (12.3%)
6. 6-1 (9.4%)
7. 6-0 (3.1%)

I almost hate to say it but this seems to give some weight to WTN’s claim counting sets is just as accurate as counting games. In players of absolute equal ability a more lopsided score of 6-2 is more likely then a score of 7-6. Or am I reading too much in to that?
 

schmke

Legend
I almost hate to say it but this seems to give some weight to WTN’s claim counting sets is just as accurate as counting games. In players of absolute equal ability a more lopsided score of 6-2 is more likely then a score of 7-6. Or am I reading too much in to that?
Sure, you can hand pick two set results to compare and make a true statement, but I don't think that reflects the whole.

Yes, 6-2 slots in a little higher than 7-6, but that is simply because 7-6 requires threading the needle to be tied at 6-6 to have the tie-break. With even players, there are lots of ways to finish the set before that.

I think it is more valuable to look at the sum of the chances of close sets versus not close, and I'll use one break of serve as indicating close.

In the case of player 1 serving first, that is any score of 6-3,6-4,4-6,7-5,5-7,7-6,6-7 and using my numbers for this scenario (which have match @TennisOTM's) that is over 60% of the sets. So you'd expect sets between equal players to be close more often than not, so when they aren't, that can be meaningful, especially if they aren't consistently.

It is also important to note that for the most part, matches aren't a single set. There are a lot more permutations of "close" in this case to do a simple analysis, but perhaps I'll work something up to see what percentage are close in an actual full match.
 

TennisOTM

Professional
I almost hate to say it but this seems to give some weight to WTN’s claim counting sets is just as accurate as counting games. In players of absolute equal ability a more lopsided score of 6-2 is more likely then a score of 7-6. Or am I reading too much in to that?
It is also important to note that for the most part, matches aren't a single set. There are a lot more permutations of "close" in this case to do a simple analysis, but perhaps I'll work something up to see what percentage are close in an actual full match.
Yes, I think the game score will be more informative for a multi-set match and certainly for the sum over multiple matches. In the coinflip matchup a 6-2 set is just as likely as 2-6. Put another way, the loser of the first set in the coinflip matchup has a 50% chance or higher of doing better in the second set.
 
I think it's still informative, because ability levels are wide enough that there's a lot of 6-0, 6-1, and 6-2 sets in USTA. If someone joins and loses their first three matches in straight sets, that might not tell you how good they are (just that they're worse than their opponents); but if you look at the game scores, if they lost their first three sets 6-0, 6-2, 6-1, 6-2, 6-0, you know they're way worse then all their opponents, but if their losses were 6-3, 7-6, 7-6, 6-4, 7-5, 6-2, then you know they're probably pretty close and at the right level, just need a bit more match practice.

I agree that this means that a single 6-3 set isn't that informative - maybe a 6-3 set is two very even players with someone winning a set by one break, maybe a 6-3 set is a mismatch where the lower player got a lucky start - but the rating system should accumulate information a good bit faster if it includes games.
 
Last edited:

schmke

Legend
I did a million iteration simulation of best 2 of 3 set matches between two players with 50% point win percentages.

Overall there were 1,470 different score combinations

Some observations about the matches being close:
  • 27.8% had at least one set that was 7-6
  • 27.9% had at least one set that was 7-5
  • 50.1% had at least one set that was 7-6 or 7-5
  • 50.1% had at least one set that was 6-4
  • 80.5% had at least one set that was 7-6, 7-5, or 6-4
  • 94.6% had at least one set that was 7-6, 7-5, 6-4, or 6-3
What about more lopsided set scores?
  • 35.9% had at least one set that was 6-2
  • 21.8% had at least one set that was 6-1
  • 7.6% had at least one set that was 6-0
  • 56.8% had at least one set that was 6-2, 6-1, or 6-0
 

jm1980

Talk Tennis Guru
FYI there is an excellent page here showing the pure math of how the probability of winning points correlates with the probability of winning games, sets, and matches. No massive simulations required:

 

schmke

Legend
FYI there is an excellent page here showing the pure math of how the probability of winning points correlates with the probability of winning games, sets, and matches. No massive simulations required:

Agree, excellent page looking at the math of it.

I happen to like the simulations as I can record the different ways one gets to winning a match and look at all of those very easily. You could do every scenario with the mathematical approach too, but that would be a lot of work looking at all the permutations.
 

schmke

Legend
I did a million iteration simulation of best 2 of 3 set matches between two players with 50% point win percentages.

Overall there were 1,470 different score combinations

Some observations about the matches being close:
  • 27.8% had at least one set that was 7-6
  • 27.9% had at least one set that was 7-5
  • 50.1% had at least one set that was 7-6 or 7-5
  • 50.1% had at least one set that was 6-4
  • 80.5% had at least one set that was 7-6, 7-5, or 6-4
  • 94.6% had at least one set that was 7-6, 7-5, 6-4, or 6-3
What about more lopsided set scores?
  • 35.9% had at least one set that was 6-2
  • 21.8% had at least one set that was 6-1
  • 7.6% had at least one set that was 6-0
  • 56.8% had at least one set that was 6-2, 6-1, or 6-0
I forgot another measure of close, whether the match goes three sets or not. That number is 49.9%. Adding these to the 7-6, 7-5, 6-4, or 6-3 grouping inches that up to 95.8%.
 

TennisOTM

Professional
Agree, excellent page looking at the math of it.

I happen to like the simulations as I can record the different ways one gets to winning a match and look at all of those very easily. You could do every scenario with the mathematical approach too, but that would be a lot of work looking at all the permutations.
The Markov chain model does that too without needing simulations. In a split second it generates a matrix of probabilities of reaching every possible final score from every possible starting score. Beautiful math at work! The guy who wrote the code explains it a bit here:


The only think that takes a bit more work is setting up the inputs to handle different kinds of scoring formats.
 

travlerajm

Talk Tennis Guru
I had not thought of that - all the calculations I did using the Markov model assumed that the favorite served first. However, I just tried the reverse and it did not actually matter. When the favorite receives in the first game, their probability of winning the set or match is always exactly the same as when they serve first.

It does matter for the probability of different set scores. When the favorite serves first you are more likely to get a set score of 6-3, and when the underdog serves first you are more likely to get a set score of 6-4. But the probability of winning the set (by any score) is the same in both cases.
There was an article published that found the first game of a match has a significantly higher chance of being a break compared to the rest of the games. Given that, the conclusion was that electing to receive is a smart choice giving a slight statistical edge.
 

TennisOTM

Professional
There was an article published that found the first game of a match has a significantly higher chance of being a break compared to the rest of the games. Given that, the conclusion was that electing to receive is a smart choice giving a slight statistical edge.
Yeah, while the model says there is no difference, there are lots of reasons why the assumptions of the model, like the service point winning probability being constant throughout the match, might not be true. Could be different situational patterns for different players, like different psychological effects of playing while ahead or behind in the score.

This article describes a finding that pro players won slightly more often when serving first, though the reason is unclear:

 

socallefty

G.O.A.T.
At rec level, there are usually multiple breaks in a set and rarely are sets decided by one break. So while there may be a higher chance of being broken in the first game, most likely you will break back and keep the set even till the late stages against a closely matched opponent. Then the question becomes whether you want to serve first at 4-4, 5-5 or second at 4-5, 5-6. I would prefer serving first in the late stages of a set as if you serve 2nd, there is a lot more pressure if you are down 0-15, 15-30 as you know a couple of lost points can cost you the set and any breakpoint is a set point against you.

Also if you are serving first and there are no breaks, it seems mentally easier to be up 1-0, 2-1, 3-2, 4-3 etc. and try to break serve rather than be down 0-1, 1-2, 2-3, 3-4 etc. and try to hold serve.
 
Last edited:

schmke

Legend
So if I could improve to win an extra 5% of points I would probably move up more then a level. Is that right? I would win 81% of sets and 91% of matches against my current/old self?
Yep, if you win 55% of your service points and 55% of return points, you win a two set match 91% of the time.

If you instead assume that against yourself there is some advantage to serving, say you were winning 60% of service points so the change is to 65% serving and 45% receiving, it drops slightly to 90%.
 
Top