Going for Gold

The Olympics is the biggest intercultural event to ever exist. With a long-standing history and great popularity, there is plenty of data to explore; our dataset has over 120 years’ worth of athlete observations. We are interested in exploring some questions regarding the presence of certain characteristics in relation to gold medals achieved. For example, do characteristics such as an athlete’s sex, weight, and country of origin affect the likelihood of winning a gold medal? Do these characteristics vary by country, sport, or geographic location? Can we make any links between a country’s weather and culture to their success in certain sports? And how do these findings change over time? We intend to investigate these questions to see if we can discover a pattern among the types of athletes that win gold medals.

Our dataset consists of 13 total variables: 3 quantitative and 10 categorical. Our variables for this dataset include the athlete’s ID, name, sex, age, height, weight, country, sport, event, and if/what medal they won. We also have information corresponding to the year, season (winter/summer), and the host city of the Olympics that each athlete competed in.

We got this dataset from Kaggle.com. Because of this, there is no data cleaning that we can see. All of the entries are consistent with the variables that they are recorded under. There are two variables that will not be used for the purposes of our dataset because they will not help us answer our questions. We are deciding to keep the variable ID in because this simplifies our analysis. The names of the athletes could make our analysis messy and difficult to read. With ID, this will allow us to keep the analysis clean and if we need to know the athlete’s names, we can match the ID with the athlete.

https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results