# Statistics Project, Need Help!

Discussion in 'Archive Reports & Interviews' started by Filipinotennis, Apr 10, 2016.

1. ### FilipinotennisNew User

Joined:
Mar 30, 2011
Messages:
53
Hey everyone, my statistics professor is assigning a project where I can choose any topic I like. For my project I am trying to determine whether or not racquet brand affects the number of unforced errors hit in an ATP match. I need help finding old match statistics. I know the ATP website puts up archived stats like 1st serve percentage, aces, etc. but unforced errors are not listed. Do you guys have any suggestions? Any help would be greatly appreciated!

2. ### teekaywhySemi-Pro

Joined:
Jun 19, 2013
Messages:
780
I think you may have chosen to explore a tough one.
You've got several issues to deal with and I'm not sure how you would word your null hypothesis in such a way as to not introduce an easy way to refute your analysis.
#1. Tennis is a two person sport (or 4 if you play doubles). There are other factors involved not the least of which, are the opponents which can induce unforced errors (heretofore referred to as UFE)
#2. Players don't always use the brand the paintjob suggests and I don't think the ATP tracks that sort of thing anyway.
#3. Court surface. Pro players play on all 4 major surfaces. Would you limit yourself to one surface? This is significant as grass has its own challenges with timing as well as clay.
#4. Your sample size may not be helpful even if you could find a player who had a different racquet brand halfway through their career. The possibility of the player themselves being unaccounted for is too significant.

I think id approach your experiment with either a different sport (baseball is first) or try to narrow your hypothesis to something more manageable like service game hold percentage of lefties vs. righties on clay to refute or support correlation between lefty serve efficacy on one particular surface. This may be extrapolated into any number of conclusions.

Good luck!

MethodTennis and yonexRx32 like this.
3. ### MathGeekProfessional

Joined:
Jul 28, 2014
Messages:
1,366
Don't pick a project until you secure access to the data to do it.

You can never be sure of getting the data you need until you have it.

MethodTennis and r2473 like this.
4. ### young prodigyRookie

Joined:
Sep 11, 2016
Messages:
154
also racket brand is categorical data thus it cant necessary be compared to the unforced errors which are quantitative and correlation doesn't imply causation so.

5. ### MathGeekProfessional

Joined:
Jul 28, 2014
Messages:
1,366
It is always true that correlation does not imply causation.

But most of the time the scientific method works by disproving hypotheses, not by proving them. The longer a hypothesis survives and the more tests that fail to disprove a scientific hypothesis, the more likely there is to be something to it.

So while correlation does not imply causation, the absence of correlation does disprove a hypothesis of causation in a properly designed experiment. Experiments that disprove or fail to disprove an interesting hypothesis are themselves relevant and interesting and commonly published. In contrast, I almost always recommend rejection of papers that claim to have proved something. Failing to disprove something is support, not proof.

In most cases, a strong correlation can be said to support a hypothesis in a well designed experiment.

Racquet brand is categorical data, and in most cases of sporting equipment, it is better to use exact make and model. However, a lot of pro equipment is so specialized that make and model are no longer meaningful, which would then require an experimental design that defined some class features that could be known and quantified. But in most cases, the data will not be available for this design.

Most well designed stats experiments hope that potential confounding factors (like athlete ability in this case) cancel each other out through equivalent pools and good sample sizes. I doubt that will be the case with sporting equipment, because in most cases, one manufacturer spends a lot more money paying athletes to use their equipment so the pool of players using one brand is not comparable to the pool of players using another brand.

young prodigy likes this.
6. ### Topspin ShotLegend

Joined:
Jul 24, 2009
Messages:
6,465
I know this is an old thread, but you can definitely compare categorical to continuous data. Look up dummy variables. Also, the big issue with the OP's idea is less the correlation-causation thing and more that there's no way racket brand affects unforced errors.