3. Overview
Dataset: Olympics Dataset for 120 year
This dataset contains 271,116 samples in total.
Each sample has age, height weight, year, sport, country, medal, and so on.
Client : SportsStats, a sports analysis firm
Based on our analysis, they might be able to get interesting findings regarding Olympic
medalists and athletes. This information should provide insight into how an athletic
organization in each country can improve the performance of their members in order to
get a good results in the Olympic.
4. Question
1. What results have each country achieved?
1. Number of athletes of each country
2. Number of medalists of each country
3. Ratio of medalists to number of athletes in each country
2. How has the Olympic changed over time
1. Number of athletes in each Olympic game
2. Ratio of female athletes to male athletes in each game
3. Number of sports in each game
3. To what extent do physical characteristics influence each sport?
1. Age
2. Height
3. Weight
5. Initial Hypothesis
1. What results have each country achieved?
1. I think the United States or China has the largest number in term of both athletes and medalists.
2. Ratio of medalists to number of athletes should not largely different for every country, even though the
ratio could be close to zero for a number of countries.
2. How has the Olympic changed over time
1. Number of athletes in each Olympic game: This should have increased since the beginning mainly due
to the increase in population.
2. Ratio of female athletes to male athletes in each game: This also should have increased.
3. Number of sports in each game: This might not change drastically, because some sports have been
excluded while some have been newly adopted.
3. To what extent do physical characteristics influence each sport?
1. Age: I believe that a certain age can not have an impact on performance.
2. Height: This would positively affect outcomes of some sports like basketball.
3. Weight: I believe that competitions are separated based on weight in most cases, so we might not be
able to get interesting findings.
6. Approach
All the questions can be answered with simple SQL queries using aggregate functions.
COUNT will be used for question 1 and 2
AVG and VAR/ STDEV will be used for question 3 -> if the variance/ standard deviation of a
certain sport is small compared with others, it might be possible to argue that such a specific
factor could affect a performance of that sport.
8. Descriptive Stats
Below are basic Information of the main table and descriptive stats for the columns with
numerical values.
-> These are what we should look at first in order to get an overview of the data we chose.
9. Numbers by countries
The following are the top 20 countries in terms of medals and athletes, and descriptive stats of
those categories.
-> The results for each country are the most important in the Olympics, and also directly related to
the hypothesis that I have set up.
Top 20 countries Descriptive Stats
10. Initial Findings
1. As for age, the mean is 25.6 years and the standard deviation is 6.4 years. Therefore, it
should be possible to argue that there is a high chance that we can perform best within
our 20s.
2. Regarding year, 25, 50 and 75 percentiles are 1960, 1988, and 2002, even though the data
covers 120 years from 1896,
3. US is by far the strongest in all categories, with Russia and Germany dominating the top
three places. Moreover, Since we can see several countries from colder regions such as
Russia, Norway, and Sweden, different results are expected for the Summer and Winter
Olympics.
4. In terms of distribution, while the standard deviations for the silver and bronze medals are
nearly identical, only that of the gold medal differ significantly. In addition, more than half
of the countries obtained only zero gold medals and one silver and one bronze medal.
11. What we got about the hypothesis
1. What results have each country achieved?
1. The US is clearly the strongest country in the world, but China is not as dominant as I expected. It is likely that
China has been able to get notable results only recently for some political or economic reasons.
2. Ratio of medalists to number of athletes : At first glance, it appeared that the number of medals is almost
proportional to the number of the athletes. However, the ratios of the US and Russia are clearly higher than
those of other countries. This ratio should be directly calculated and looked into further.
2. How has the Olympic changed over time
1. Number of athletes in each Olympic game: Based on the percentiles, this figure has increased as I expected.
2. Ratio of female athletes to male athletes in each game: Further calculation is needed.
3. Number of sports in each game: Further calculation is needed.
3. To what extent do physical characteristics influence each sport?
1. Age: It is still unclear whether a specific age can have an impact on performance, but at least we can say
there is a high chance that we can perform best within our 20s.
2. Height: Further calculation is needed.
3. Weight: Further calculation is needed.
13. Correlation between Athletes and Medals
The following table represent correlation coefficients among the number of athletes and
each medal.
We sometimes tend to focus on the number of medals to measure the outcome of each country,
but we can say that such results are determined before the Olympic games start, because it
clearly correlates with how many athletes each country can send to games.
Interestingly, the coefficient of gold medal is smaller than others, so obtaining a gold medal may
require something more special than other medals.
14. Ratio of medalists to number of athletes
As new metrics, I calculated ratio of medalists to number of athletes by countries excluding those
not having medals. so that we can compare the level of athletes in each country.
It shows that the max values are 3 to 5 times as high as the mean values.
-> We can conclude that some Olympic athletes in particular countries are more likely to obtain
medals those in other countries.
15. Time Series - Summer
The number of athletes had rapidly increased until 80’s, and it has remained flat since then.
The number of sports has also increased, but not as fast as the number of athletes.
It is possible to suppose that the summer Olympic would not grow anymore. One of the
reason might be physical limitations for setting up a venue.
Number of athletes Number of sports
16. Time Series - Winter
Although the size of the games is less than half of the Summer games, it still keeps growing.
The number of sports has almost remained unchanged from the beginning.
The contents might not drastically change, but the size will be expanded continuously.
Number of athletes Number of sports
17. Time Series - Gender
The above figures represent Ratio of female athletes to male athletes in each game (%).
It appears that there were turning points in 30’s and 90’s where the participation of more
females was promoted.
It may be possible to argue that there is a more chance for women in winter games.
Summer Winter
18. Standard Deviation - Age
Male Female
Even without old sports, people of a broader range of ages can play an active role in
some sports like Archery, Golf, and shooting.
On the other hand, people of particular ages have participated in Football, Boxing ,
Swimming, and so on.
19. Standard Deviation - Height
Male Female
Other than Basketball, some sports that are divided into some classes tend to have high values.
As is generally accepted, certain heights seem to have an advantage in gymnastics
competitions.
20. Standard Deviation - Weight
Male Female
Except some sports that are divided into some classes, the ones that do not require much
movement allow a wider range of weights.
In addition to Gymnastics, a particular range of weights has an advantage in some winter
sports.
21. Conclusion on Hypothesis
1. What results have each country achieved?
1. The US is clearly the strongest country in the world, but China is not as dominant as I expected. It is likely that
China has been able to get notable results only recently for some political or economic reasons.
2. Ratio of medalists to number of athletes : the number of medals is almost proportional to the number of the
athletes. However, the ratios of some countries are clearly 3 to 5 times higher than the average.
2. How has the Olympic changed over time
1. Number of athletes in each Olympic game: Growing has stopped for summer, but the number keeps increasing
for winter.
2. Ratio of female athletes to male athletes in each game: The ratio is still getting bigger. Also, there were turning
points in 30’s and 90’s where the participation of more females was promoted.
3. Number of sports in each game: For summer, the number of sports has increased, although not as fast as the
number of athletes. For winter, it remains almost unchanged from the beginning.
3. To what extent do physical characteristics influence each sport?
1. Age:
2. Height:
3. Weight:
As for all the characteristics, some sports allow a wide range of
people to participate, while some other sports do not.
22. Extra Analysis
The below table represents correlation coefficients among the number of athletes, GDP per
capita, and population of each country.
Since the coefficient between athletes and GDP is higher than that between athletes
population, it should be possible to argue that GDP per capita is a more important factor.
In other words, the size of the resource that a country can spare is likely to be more
important than the size of its population.
23. Summary
If a country wants to get a good result in the Olympic game, you can
advise them to
1. Focus on how many athletes in the country can selected, although special efforts
may be necessary to get gold medals
2. Train more female athletes
3. Realize that sparing more resources will directly contribute to the result.
4. Concentrate its resources on some athletes having the characteristics that are
suitable for their sports.