money_ball
Rookie
So I recently found this GitHub repository that contains CSV files for the data from the ATP website:
The CSV files currently in that repo has data from 1877-2016. I also wanted the 2017 data, so I downloaded the Python scripts and scraped that data myself, and then loaded all the CSV files into a local PostgreSQL database for my querying needs.
Who knows the extent of how messed up is the rest of the ATP data?
The CSV files currently in that repo has data from 1877-2016. I also wanted the 2017 data, so I downloaded the Python scripts and scraped that data myself, and then loaded all the CSV files into a local PostgreSQL database for my querying needs.
- Tournaments data (1877-2017): 4,407 records
- Match scores data (1877-2017): 188,384 records
- Match stats data (1991-2017): 91,956 records
- Rankings data (1973-2017): 2,606,950 records
- Players vital stats data (1877-2017): 10,912 records
Who knows the extent of how messed up is the rest of the ATP data?