SlideShare a Scribd company logo
1 of 17
Abstract
Topic:
The relation between smoking and state of health/lifestyle.
Authors:
Alex Luojos, Axel Linnovaara, Ashwath Venkatasubramanian, Sergei Gordienko.
Aim of the project:
The aim of our project is to investigate the relationship between smoking and health state.
Hypotheses:
1. Currently smoking people are sick more often than former smokers and former smokers are
sick more often than never smokers.
2. Currently smoking people are the least physically active, former smokers are more physically
active but never-smoking people are the most physically active
3. Current smokers consume the most alcohol, former smokers consume the intermediate amount
and never smokers consume the least alcohol.
These theses were made based on everyday observations of smokers and non-smokers.
Our target is people aged 18-30, the data will be collected with by a questionnaire by spreading it
through social networks. Additionally, we expect to collect data from approximately 100 people.
We will collect data regarding:
1. Age
2. Sex
3. Smoking habits
4. Weight/ height
5. Sport activity (h/week)
6. Cases of being sick (last 6 months)
7. Alcohol consumption (restaurant portions/week)
8. Personal judgment of health state (1-5)
9. Duration of smoking habit
The survey is included in the appendix of this project.
Introduction
Smoking is the single most important preventable health risk in developed world. It is
also an important cause of premature death worldwide. Smoking cause a wide range of diseases,
including cancers, chronic obstructive pulmonary disease, coronary heart disease and stroke.
There is also a great interest for studying unhealthy lifestyles and their causes, because smoking
and alcohol related health problems are a major burden for the modern society both economically
and socially. In our research we studied the relation between health state and smoking habits.
The main aim of the project was to understand the relationship between smoking and health state
of an individual. We also studied whether or not smoking people use alcohol more often. The
study was carried out by using of internet survey that was sent to Erasmus and medical school
students in Tartu. The study material was analyzed by help of “Stata” program. To be able to
understand the lifestyle and health state of our sample we collected this data through
questionnaire as well.
Course of work and methods
To collect data, it was decided that an online survey would be the most appropriate as
many people can have access to it. Furthermore, to avoid bias, there is a certain degree of
randomization when opening the survey to a wide range of people. The survey comprised of 10
questions capable of obtaining the relevant information about lifestyle and smoking habits of the
subjects. The survey was made and launched on the “surveymonkey.com” platform and spread
through social networks. Overall 104 random exchange and degree students of Tartu University
of different ages and nationalities took part in the survey. The raw data collected was later
processed and this process can be seen in the next section. However, before that the sample was
profiled. See figures 1-6.
Figure 1. Gender distribution of the sample.
Figure 2. Age distribution of the sample
Figure 3. Weight (in Kg) distribution of the sample
Figure 4. Height(in cm) distribution of the sample
Figure 5. Smoking habit distribution of the sample
Figure 6. Health self-evaluation distribution amongst the sample
Health state evaluation by subjects.
1. My health state is dangerously poor (no results for this category).
2. My health state is poor but it is not a danger, however it still needs improvement.
3. My health state is not too bad however it is not too good either.
4. My health state is quite good.
5. My health state is extremely good, it can’t get much better.
After looking through the general profile of the sample, more detailed analysis in order to
check 3 hypotheses was conducted. Due to the pattern of data obtained the main methods applied
were Fisher’s exact test and Kruskal-Wallis test. More complex information about the course of
data processing including graphs and results of test can be found in the next section of the report.
Data processing and results
To process our data we used the program “Stata”. With this program we were able to
process a large amount of data quickly. Our first step was to determine what statistical tests
should be used according to our data. Therefore, we decided to create histograms of the variables
that are part of our hypotheses. This is to show the distribution of our data as a normal
distribution can point to the use of a t-test. Refer to figures 7, 8 and 9.
Figure 7. Histogram showing distribution of amount of times sick in the last 6 months
Figure 8. Histogram showing distribution of hours of exercise per week
Exercising habits of subjects.
1. 0-2 hours per week
2. 3-5 hours per week
3. 6-8 hours per week
4. 9-11 hours per week
5. More than 11 hours per week
Figure 9. Histogram showing distribution of alcohol consumption (portions per week)
As one can see from the graphs above, the data in all of the variables is not normally
distributed. Therefore, it is certain that the t-test cannot be used. As a result, it was decided that
the Kruskal-Wallis test can be used, along with the fisher exact test for tabulated data. The
reason we will not use the chi-squared test instead of the fisher exact test is because we do not
have a very large amount of data.
The first hypothesis to be tested is that currently smoking people are sick more often than
former smokers and former smokers are sick more often than never smokers. For this the
Kruskal-Wallis test can be used. It is a test that shows whether samples originate from the same
distribution, essentially showing if a variable has a significant effect on the results. It is similar to
a t-test in the way that in the test we seek to reject the null hypothesis and therefore accept the
alternative hypothesis. We can reject the null hypothesis only when the p-value (resulting value
form the test) is equal to or less than 0.05.
H0 (null hypothesis) - The hypothesis that there is no significant difference between
specified populations, any observed difference being due to sampling or experimental
error.
H1 (alternative hypothesis) - The hypothesis that the observations are the result of a
real effect. There is significance of the variable in question.
The resulting p-value that was found by using the Kruskal-Wallis test of sickness by
smoking was 0.3768. Therefore, p-value > 0.05, and this means that we have to accept the null
hypothesis as we don’t have the evidence to reject it and there is no significant effect by
smoking on the amount of times fallen sick. A box and whisker plot can also be generated to
show the relationship between the two variables. See figure 10.
Figure 10. Box and whisker plot showing relationship between smoking and the amount of times
fallen sick in the past year.
X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers
Even this box and whisker plot shows that there is no clear relationship as the medians of
every category of smoking are around the same. If the hypothesis stayed true then one would
expect to see never-smokers with the lowest median, former smokers with the second highest
median and current smokers with the highest median.
To make sure, we also used the fisher exact test on this hypothesis. However, as the
fisher exact test can be used only with two categorical variables, we had to make our sick
variable into two categories: fallen sick in the last 6 months and not fallen sick in the last 6
months. After this new variable was generated we tabulated the data. See table 1.
Table 1. Table showing frequency of falling sick amongst current smokers, former smokers and
never smokers.
Smoking Not fallen sick Fallen sick Total
0 (never smokers) 15
25.86
43
74.14
58
100.00
1 (former smokers) 3
20.00
12
80.00
15
100.00
2 (current smokers) 8
25.81
23
74.19
31
100.00
Total 26
25.00
78
75.00
104
100.00
The fisher exact test is very similar to the Kruskal-Wallis test, in a way that it is based
upon the null and alternative hypothesis. The same rules apply so p-value has to be equal to or
less than 0.05 for us to reject the null hypothesis and believe that there is a statistically
significant effect.
The result for the fisher exact test is 0.952. This means that we do not have enough
evidence to reject the null hypothesis and we must believe that there is no effect or statistically
significant difference. Therefore once again, it is proved that smoking does not have an effect on
amount of times one falls sick.
The next hypothesis that needs to be tested is that currently smoking people are the least
physically active, former smokers are more physically active but never-smoking people are the
most physically active. An immediate problem that we saw with the data about sport activity is
that only one person exercised more than 11 hours per week (5th category). Therefore, what was
done was that the 4th and 5th categories were joined together. So the new categories for sport
activity would look like:
1. 0-2 hours per week
2. 3-5 hours per week
3. 6-8 hours per week
4. More than 9 hours per week
See table 2 below.
Table 2. Table showing amount of sport activity amongst current smokers, former smokers and
never smokers.
Sport activity (hours per week)
Smoking 0-2 3-5 6-8 9+ Total
Never
smokers
25 21 7 5 58
Former
smokers
8 3 2 2 15
Current
smokers
17 8 4 2 31
Total 50 32 13 9 104
With this data we can carry out a fisher exact test to verify if our hypothesis is true or not
or if we need to accept the null hypothesis. The result of the fisher exact test is: 0.840. This
means that we do not have enough evidence to reject the null hypothesis, which is that there is no
effect on sport activity with smoking; therefore we have to accept it. Once again, this can be
shown on a box and whisker plot. See figure 11.
Figure 11. Box and whisker plot showing relationship between smoking and the amount of hours
of sport activity in a week.
X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers
Y-axis (sport activity): 1: 0-2 hours per week, 2: 3-5 hours per week, 3: 6-8 hours per week,
4: 9-11 hours per week, 5: More than 11 hours per week
This box and whisker plot also shows that there is no clear relationship as the medians are
all about in the same range. If the hypothesis was true the median of never smokers group would
be the highest, former smokers in the middle and current smokers the lowest.
The third and final hypothesis that has to be investigated is that current smokers consume
the most alcohol, former smokers consume the intermediate amount and never smokers consume
the least alcohol. The set of data regarding this test is quite large and therefore the fisher exact
test can’t be used. Instead we will revert back to the Kruskal-Wallis test. The p-value of this test
ended up being 0.0001. This means that there is enough evidence to reject the null hypothesis,
which would be that current smoker, former smokers and never smokers consume around the
same amount of alcohol. However, since the null hypothesis is rejected we can accept the
alternative hypothesis, which is that current smokers consume the most alcohol, former smokers
consume the intermediate amount and never smokers consume the least alcohol.
Furthermore, we can show this relationship using a box and whisker plot once again. See
figure 12 below.
Figure 12. Box and whisker plot showing relationship between smoking and the consumption of
alcohol (restaurant portions per week)
X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers
From this graph, it can be observed that there is a very clear pattern as stated in our
hypothesis. The median of the never smoking group is the lowest, and then next comes the
former smoking group and the highest median is of the current smoking group. This clear pattern
proves our hypothesis.
Conclusions
The study was concentrated around three hypotheses that predicted the associations
between smoking and health, or smoking and lifestyle. In these hypotheses the assumptions were
that people who smoke are sick more often, smokers are less physically active than non-smokers,
and smokers consume more alcohol than non-smokers.
When studying the assumption that smokers are sick more often, no consistent
association was found between susceptibility to infections and smoking by using Fischer’s test
(p=0.952, therefore H0 couldn’t be rejected). Also no difference was found between groups when
comparing the self-evaluation of their health.
With the assumption that smokers are less physically active than non-smokers, no reliable
association was found with Fisher’s exact test (p-value=0.324, H0 couldn’t be rejected). With
this hypothesis, however, the results could have been affected by our population, sample size and
wide categories for physical activity. Neither of the groups were physically very active, and with
our broad physical activity categories, almost all the answers fell in the same category of
exercise (0-2h a week).
Unlike in the previous two hypotheses, considerable association was found between
volumes of consumed alcohol and smoking within the sample. Kruskal-Wallis test was used to
determine the equality of populations and chi-squared test showed positive association with
smoking and volume of alcohol used (p-value=0.0001). Even though causalities cannot be
determined from this study, it would seem that in this sample of students, habits of smoking and
drinking alcohol were often concentrated on the same subjects.
It has to be noted that when interpreting the validity of the results, a few other concerns
should be taken into account. As the study was conducted via social media groups, it was
probably answered mostly by Erasmus students, and also to some degree by medical students.
Therefore could be assumed that these samples consist mainly of people who in general are very
healthy (young age, short smoking time, ability to acquire higher education and travel abroad).
As the sample of the study was very small and sample was taken from very specific population
with assumingly similar lifestyle, these results shouldn’t probably be extrapolated to concern any
wider groups of people.
What might also be interpreted from our results is that within these student groups with
young and generally healthy people, the primary health effects of smoking were not yet visible
on their health. If a follow-up study could be conducted, it would be interesting and perhaps
more fruitful to see if the association would be visible in these samples after 10 or 20 years.
Appendix
Survey questions:
1. How old are you?
2. Male/Female?
3. Are you a smoker? Yes, I currently smoke/No, I used to smoke/No, I never smoked
4. For how many years or months have you smoked for continuously? (Continuous smoking
defined as at least a pack of cigarettes a week) **Only answer if you are a current smoker or
former smoker**
4. How much do you smoke? (Packs per week)
a) 1
b) 2
c) 3
d) 4
e) 5
f) 6
g) 7
5. What is your weight? (in Kg)
6. What is your height? (in cm)
7. How many times have you fallen sick (flu or cold), in the last 6 months?
8. How many portions of alcohol do you consume in a week? *One portion is defined as a
restaurant portion such as 500 ml of beer (standard glass of beer), 175 ml of wine (standard glass
of wine), 45 ml of hard liquor (standard shot).*
9. How would you grade your state of health from 1 to 5?
1: My health state is dangerously poor.
2: My health state is poor but it is not a danger, however it still needs improvement.
3: My health state is not too bad however it is not too good either.
4: My health state is quite good.
5: My health state is extremely good, it can’t get much better.
10. How many hours of sporting activity do you do in a week (slow walking does not count!)?
a) 0-2 hours a week
b) 3-5 hours a week
c) 6-8 hours a week
d) 9-11 hours a week
e) More than 11 hours a week

More Related Content

Viewers also liked

Viewers also liked (12)

M. ega wahyudi 1eb
M. ega wahyudi 1ebM. ega wahyudi 1eb
M. ega wahyudi 1eb
 
evalucaion de informatica
evalucaion de informaticaevalucaion de informatica
evalucaion de informatica
 
Regina clemons ppt
Regina clemons pptRegina clemons ppt
Regina clemons ppt
 
La historia del dj
La historia del djLa historia del dj
La historia del dj
 
Seminario 2.
Seminario 2. Seminario 2.
Seminario 2.
 
The Hike. The impact on small business
The Hike. The impact on small businessThe Hike. The impact on small business
The Hike. The impact on small business
 
Sarcopenia
SarcopeniaSarcopenia
Sarcopenia
 
Teen thriller films research
Teen thriller films researchTeen thriller films research
Teen thriller films research
 
RESEARCH PROPASAL
RESEARCH PROPASALRESEARCH PROPASAL
RESEARCH PROPASAL
 
Back toschoolpowerpointforbelhaven
Back toschoolpowerpointforbelhavenBack toschoolpowerpointforbelhaven
Back toschoolpowerpointforbelhaven
 
CV SUPPLEMENT
CV SUPPLEMENTCV SUPPLEMENT
CV SUPPLEMENT
 
Manelis Peli. Cv.
Manelis Peli. Cv.Manelis Peli. Cv.
Manelis Peli. Cv.
 

Similar to DCP EPIDEMIOLOGY PRJECT

UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docx
UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docxUNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docx
UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docxwillcoxjanay
 
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New Study
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New StudyCannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New Study
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New StudyEvergreen Buzz
 
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docx
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docxThe Impact of Trying Electronic Cigarettes on CigaretteSmoki.docx
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docxrtodd33
 
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docxLesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docxSHIVA101531
 
QUANTITATIVE RESEARCH METHODS.pptx
QUANTITATIVE RESEARCH METHODS.pptxQUANTITATIVE RESEARCH METHODS.pptx
QUANTITATIVE RESEARCH METHODS.pptxfatimah tambi
 
1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesseTatianaMajor22
 
Health Promotion.pptx
Health Promotion.pptxHealth Promotion.pptx
Health Promotion.pptxRasha sakr
 
Unassisted cessation
Unassisted cessationUnassisted cessation
Unassisted cessationUCT ICO
 
Data gathering | Primary data collection | Healthcare blockchain
Data gathering | Primary data collection | Healthcare blockchainData gathering | Primary data collection | Healthcare blockchain
Data gathering | Primary data collection | Healthcare blockchainPubrica
 
Cc feb 2014 newsletter final web
Cc feb 2014 newsletter final webCc feb 2014 newsletter final web
Cc feb 2014 newsletter final webCancer Council NSW
 
Research process | Meta-analysis research | Systematic review and meta-analysis
Research process | Meta-analysis research | Systematic review and meta-analysisResearch process | Meta-analysis research | Systematic review and meta-analysis
Research process | Meta-analysis research | Systematic review and meta-analysisPubrica
 
Society for research on nicotine and tobacco conference abstracts srnt 2014
Society for research on nicotine and tobacco conference abstracts srnt 2014Society for research on nicotine and tobacco conference abstracts srnt 2014
Society for research on nicotine and tobacco conference abstracts srnt 2014Georgi Daskalov
 

Similar to DCP EPIDEMIOLOGY PRJECT (20)

UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docx
UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docxUNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docx
UNDERSTANDING AND EVALUATINGR E S E A R C HPart II.docx
 
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New Study
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New StudyCannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New Study
Cannabis Smoke Does Not Harm Your Lungs Like Tobacco Smoke Says New Study
 
THE EFFECTS OF CIGARETTE SMOKING ON SEMEN QUALITY OF INFERTILE AND FERTILE ME...
THE EFFECTS OF CIGARETTE SMOKING ON SEMEN QUALITY OF INFERTILE AND FERTILE ME...THE EFFECTS OF CIGARETTE SMOKING ON SEMEN QUALITY OF INFERTILE AND FERTILE ME...
THE EFFECTS OF CIGARETTE SMOKING ON SEMEN QUALITY OF INFERTILE AND FERTILE ME...
 
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docx
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docxThe Impact of Trying Electronic Cigarettes on CigaretteSmoki.docx
The Impact of Trying Electronic Cigarettes on CigaretteSmoki.docx
 
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docxLesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
 
Brmedj00047 0038
Brmedj00047 0038Brmedj00047 0038
Brmedj00047 0038
 
QUANTITATIVE RESEARCH METHODS.pptx
QUANTITATIVE RESEARCH METHODS.pptxQUANTITATIVE RESEARCH METHODS.pptx
QUANTITATIVE RESEARCH METHODS.pptx
 
ECON PROJECT FINAL1
ECON PROJECT FINAL1ECON PROJECT FINAL1
ECON PROJECT FINAL1
 
Alcohol Presentation2
Alcohol Presentation2Alcohol Presentation2
Alcohol Presentation2
 
1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse1. In a study, 28 adults with mild periodontal disease are assesse
1. In a study, 28 adults with mild periodontal disease are assesse
 
Slides for stats
Slides for statsSlides for stats
Slides for stats
 
Charting lifestyle changes from big data sets
Charting lifestyle changes from big data setsCharting lifestyle changes from big data sets
Charting lifestyle changes from big data sets
 
Health Promotion.pptx
Health Promotion.pptxHealth Promotion.pptx
Health Promotion.pptx
 
Unassisted cessation
Unassisted cessationUnassisted cessation
Unassisted cessation
 
AMRS_Project_1_Report
AMRS_Project_1_ReportAMRS_Project_1_Report
AMRS_Project_1_Report
 
Data gathering | Primary data collection | Healthcare blockchain
Data gathering | Primary data collection | Healthcare blockchainData gathering | Primary data collection | Healthcare blockchain
Data gathering | Primary data collection | Healthcare blockchain
 
Cc feb 2014 newsletter final web
Cc feb 2014 newsletter final webCc feb 2014 newsletter final web
Cc feb 2014 newsletter final web
 
Research process | Meta-analysis research | Systematic review and meta-analysis
Research process | Meta-analysis research | Systematic review and meta-analysisResearch process | Meta-analysis research | Systematic review and meta-analysis
Research process | Meta-analysis research | Systematic review and meta-analysis
 
Society for research on nicotine and tobacco conference abstracts srnt 2014
Society for research on nicotine and tobacco conference abstracts srnt 2014Society for research on nicotine and tobacco conference abstracts srnt 2014
Society for research on nicotine and tobacco conference abstracts srnt 2014
 
DNP Capstone Project Sample
DNP Capstone Project SampleDNP Capstone Project Sample
DNP Capstone Project Sample
 

More from Ashwath Venkatasubramanian

More from Ashwath Venkatasubramanian (7)

extended essay
extended essayextended essay
extended essay
 
TOK Essay
TOK EssayTOK Essay
TOK Essay
 
The association between gut microbiota and body weight
The association between gut microbiota and body weightThe association between gut microbiota and body weight
The association between gut microbiota and body weight
 
Ethics essay
Ethics essayEthics essay
Ethics essay
 
Immunology essay
Immunology essayImmunology essay
Immunology essay
 
Molds and excessive humidity in the living environment
Molds and excessive humidity in the living environmentMolds and excessive humidity in the living environment
Molds and excessive humidity in the living environment
 
Clinical uses of gonadotropins and GnRH agonists and antagonists
Clinical uses of gonadotropins and GnRH agonists and antagonistsClinical uses of gonadotropins and GnRH agonists and antagonists
Clinical uses of gonadotropins and GnRH agonists and antagonists
 

DCP EPIDEMIOLOGY PRJECT

  • 1. Abstract Topic: The relation between smoking and state of health/lifestyle. Authors: Alex Luojos, Axel Linnovaara, Ashwath Venkatasubramanian, Sergei Gordienko. Aim of the project: The aim of our project is to investigate the relationship between smoking and health state. Hypotheses: 1. Currently smoking people are sick more often than former smokers and former smokers are sick more often than never smokers. 2. Currently smoking people are the least physically active, former smokers are more physically active but never-smoking people are the most physically active 3. Current smokers consume the most alcohol, former smokers consume the intermediate amount and never smokers consume the least alcohol. These theses were made based on everyday observations of smokers and non-smokers. Our target is people aged 18-30, the data will be collected with by a questionnaire by spreading it through social networks. Additionally, we expect to collect data from approximately 100 people. We will collect data regarding: 1. Age 2. Sex 3. Smoking habits 4. Weight/ height 5. Sport activity (h/week)
  • 2. 6. Cases of being sick (last 6 months) 7. Alcohol consumption (restaurant portions/week) 8. Personal judgment of health state (1-5) 9. Duration of smoking habit The survey is included in the appendix of this project. Introduction Smoking is the single most important preventable health risk in developed world. It is also an important cause of premature death worldwide. Smoking cause a wide range of diseases, including cancers, chronic obstructive pulmonary disease, coronary heart disease and stroke. There is also a great interest for studying unhealthy lifestyles and their causes, because smoking and alcohol related health problems are a major burden for the modern society both economically and socially. In our research we studied the relation between health state and smoking habits. The main aim of the project was to understand the relationship between smoking and health state of an individual. We also studied whether or not smoking people use alcohol more often. The study was carried out by using of internet survey that was sent to Erasmus and medical school students in Tartu. The study material was analyzed by help of “Stata” program. To be able to understand the lifestyle and health state of our sample we collected this data through questionnaire as well. Course of work and methods To collect data, it was decided that an online survey would be the most appropriate as many people can have access to it. Furthermore, to avoid bias, there is a certain degree of randomization when opening the survey to a wide range of people. The survey comprised of 10 questions capable of obtaining the relevant information about lifestyle and smoking habits of the subjects. The survey was made and launched on the “surveymonkey.com” platform and spread
  • 3. through social networks. Overall 104 random exchange and degree students of Tartu University of different ages and nationalities took part in the survey. The raw data collected was later processed and this process can be seen in the next section. However, before that the sample was profiled. See figures 1-6. Figure 1. Gender distribution of the sample.
  • 4. Figure 2. Age distribution of the sample Figure 3. Weight (in Kg) distribution of the sample
  • 5. Figure 4. Height(in cm) distribution of the sample Figure 5. Smoking habit distribution of the sample
  • 6. Figure 6. Health self-evaluation distribution amongst the sample Health state evaluation by subjects. 1. My health state is dangerously poor (no results for this category). 2. My health state is poor but it is not a danger, however it still needs improvement. 3. My health state is not too bad however it is not too good either. 4. My health state is quite good. 5. My health state is extremely good, it can’t get much better. After looking through the general profile of the sample, more detailed analysis in order to check 3 hypotheses was conducted. Due to the pattern of data obtained the main methods applied were Fisher’s exact test and Kruskal-Wallis test. More complex information about the course of data processing including graphs and results of test can be found in the next section of the report.
  • 7. Data processing and results To process our data we used the program “Stata”. With this program we were able to process a large amount of data quickly. Our first step was to determine what statistical tests should be used according to our data. Therefore, we decided to create histograms of the variables that are part of our hypotheses. This is to show the distribution of our data as a normal distribution can point to the use of a t-test. Refer to figures 7, 8 and 9. Figure 7. Histogram showing distribution of amount of times sick in the last 6 months
  • 8. Figure 8. Histogram showing distribution of hours of exercise per week Exercising habits of subjects. 1. 0-2 hours per week 2. 3-5 hours per week 3. 6-8 hours per week 4. 9-11 hours per week 5. More than 11 hours per week
  • 9. Figure 9. Histogram showing distribution of alcohol consumption (portions per week) As one can see from the graphs above, the data in all of the variables is not normally distributed. Therefore, it is certain that the t-test cannot be used. As a result, it was decided that the Kruskal-Wallis test can be used, along with the fisher exact test for tabulated data. The reason we will not use the chi-squared test instead of the fisher exact test is because we do not have a very large amount of data. The first hypothesis to be tested is that currently smoking people are sick more often than former smokers and former smokers are sick more often than never smokers. For this the Kruskal-Wallis test can be used. It is a test that shows whether samples originate from the same distribution, essentially showing if a variable has a significant effect on the results. It is similar to a t-test in the way that in the test we seek to reject the null hypothesis and therefore accept the alternative hypothesis. We can reject the null hypothesis only when the p-value (resulting value form the test) is equal to or less than 0.05.
  • 10. H0 (null hypothesis) - The hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error. H1 (alternative hypothesis) - The hypothesis that the observations are the result of a real effect. There is significance of the variable in question. The resulting p-value that was found by using the Kruskal-Wallis test of sickness by smoking was 0.3768. Therefore, p-value > 0.05, and this means that we have to accept the null hypothesis as we don’t have the evidence to reject it and there is no significant effect by smoking on the amount of times fallen sick. A box and whisker plot can also be generated to show the relationship between the two variables. See figure 10. Figure 10. Box and whisker plot showing relationship between smoking and the amount of times fallen sick in the past year. X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers
  • 11. Even this box and whisker plot shows that there is no clear relationship as the medians of every category of smoking are around the same. If the hypothesis stayed true then one would expect to see never-smokers with the lowest median, former smokers with the second highest median and current smokers with the highest median. To make sure, we also used the fisher exact test on this hypothesis. However, as the fisher exact test can be used only with two categorical variables, we had to make our sick variable into two categories: fallen sick in the last 6 months and not fallen sick in the last 6 months. After this new variable was generated we tabulated the data. See table 1. Table 1. Table showing frequency of falling sick amongst current smokers, former smokers and never smokers. Smoking Not fallen sick Fallen sick Total 0 (never smokers) 15 25.86 43 74.14 58 100.00 1 (former smokers) 3 20.00 12 80.00 15 100.00 2 (current smokers) 8 25.81 23 74.19 31 100.00 Total 26 25.00 78 75.00 104 100.00 The fisher exact test is very similar to the Kruskal-Wallis test, in a way that it is based upon the null and alternative hypothesis. The same rules apply so p-value has to be equal to or less than 0.05 for us to reject the null hypothesis and believe that there is a statistically significant effect. The result for the fisher exact test is 0.952. This means that we do not have enough evidence to reject the null hypothesis and we must believe that there is no effect or statistically significant difference. Therefore once again, it is proved that smoking does not have an effect on amount of times one falls sick. The next hypothesis that needs to be tested is that currently smoking people are the least physically active, former smokers are more physically active but never-smoking people are the most physically active. An immediate problem that we saw with the data about sport activity is that only one person exercised more than 11 hours per week (5th category). Therefore, what was
  • 12. done was that the 4th and 5th categories were joined together. So the new categories for sport activity would look like: 1. 0-2 hours per week 2. 3-5 hours per week 3. 6-8 hours per week 4. More than 9 hours per week See table 2 below. Table 2. Table showing amount of sport activity amongst current smokers, former smokers and never smokers. Sport activity (hours per week) Smoking 0-2 3-5 6-8 9+ Total Never smokers 25 21 7 5 58 Former smokers 8 3 2 2 15 Current smokers 17 8 4 2 31 Total 50 32 13 9 104 With this data we can carry out a fisher exact test to verify if our hypothesis is true or not or if we need to accept the null hypothesis. The result of the fisher exact test is: 0.840. This means that we do not have enough evidence to reject the null hypothesis, which is that there is no effect on sport activity with smoking; therefore we have to accept it. Once again, this can be shown on a box and whisker plot. See figure 11.
  • 13. Figure 11. Box and whisker plot showing relationship between smoking and the amount of hours of sport activity in a week. X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers Y-axis (sport activity): 1: 0-2 hours per week, 2: 3-5 hours per week, 3: 6-8 hours per week, 4: 9-11 hours per week, 5: More than 11 hours per week This box and whisker plot also shows that there is no clear relationship as the medians are all about in the same range. If the hypothesis was true the median of never smokers group would be the highest, former smokers in the middle and current smokers the lowest. The third and final hypothesis that has to be investigated is that current smokers consume the most alcohol, former smokers consume the intermediate amount and never smokers consume the least alcohol. The set of data regarding this test is quite large and therefore the fisher exact test can’t be used. Instead we will revert back to the Kruskal-Wallis test. The p-value of this test ended up being 0.0001. This means that there is enough evidence to reject the null hypothesis, which would be that current smoker, former smokers and never smokers consume around the
  • 14. same amount of alcohol. However, since the null hypothesis is rejected we can accept the alternative hypothesis, which is that current smokers consume the most alcohol, former smokers consume the intermediate amount and never smokers consume the least alcohol. Furthermore, we can show this relationship using a box and whisker plot once again. See figure 12 below. Figure 12. Box and whisker plot showing relationship between smoking and the consumption of alcohol (restaurant portions per week) X-axis (smoking): 0 - Neversmokers, 1 - Former smokers, 2 - Current smokers From this graph, it can be observed that there is a very clear pattern as stated in our hypothesis. The median of the never smoking group is the lowest, and then next comes the former smoking group and the highest median is of the current smoking group. This clear pattern proves our hypothesis.
  • 15. Conclusions The study was concentrated around three hypotheses that predicted the associations between smoking and health, or smoking and lifestyle. In these hypotheses the assumptions were that people who smoke are sick more often, smokers are less physically active than non-smokers, and smokers consume more alcohol than non-smokers. When studying the assumption that smokers are sick more often, no consistent association was found between susceptibility to infections and smoking by using Fischer’s test (p=0.952, therefore H0 couldn’t be rejected). Also no difference was found between groups when comparing the self-evaluation of their health. With the assumption that smokers are less physically active than non-smokers, no reliable association was found with Fisher’s exact test (p-value=0.324, H0 couldn’t be rejected). With this hypothesis, however, the results could have been affected by our population, sample size and wide categories for physical activity. Neither of the groups were physically very active, and with our broad physical activity categories, almost all the answers fell in the same category of exercise (0-2h a week). Unlike in the previous two hypotheses, considerable association was found between volumes of consumed alcohol and smoking within the sample. Kruskal-Wallis test was used to determine the equality of populations and chi-squared test showed positive association with smoking and volume of alcohol used (p-value=0.0001). Even though causalities cannot be determined from this study, it would seem that in this sample of students, habits of smoking and drinking alcohol were often concentrated on the same subjects. It has to be noted that when interpreting the validity of the results, a few other concerns should be taken into account. As the study was conducted via social media groups, it was probably answered mostly by Erasmus students, and also to some degree by medical students. Therefore could be assumed that these samples consist mainly of people who in general are very healthy (young age, short smoking time, ability to acquire higher education and travel abroad). As the sample of the study was very small and sample was taken from very specific population
  • 16. with assumingly similar lifestyle, these results shouldn’t probably be extrapolated to concern any wider groups of people. What might also be interpreted from our results is that within these student groups with young and generally healthy people, the primary health effects of smoking were not yet visible on their health. If a follow-up study could be conducted, it would be interesting and perhaps more fruitful to see if the association would be visible in these samples after 10 or 20 years. Appendix Survey questions: 1. How old are you? 2. Male/Female? 3. Are you a smoker? Yes, I currently smoke/No, I used to smoke/No, I never smoked 4. For how many years or months have you smoked for continuously? (Continuous smoking defined as at least a pack of cigarettes a week) **Only answer if you are a current smoker or former smoker** 4. How much do you smoke? (Packs per week) a) 1 b) 2 c) 3 d) 4 e) 5 f) 6 g) 7 5. What is your weight? (in Kg)
  • 17. 6. What is your height? (in cm) 7. How many times have you fallen sick (flu or cold), in the last 6 months? 8. How many portions of alcohol do you consume in a week? *One portion is defined as a restaurant portion such as 500 ml of beer (standard glass of beer), 175 ml of wine (standard glass of wine), 45 ml of hard liquor (standard shot).* 9. How would you grade your state of health from 1 to 5? 1: My health state is dangerously poor. 2: My health state is poor but it is not a danger, however it still needs improvement. 3: My health state is not too bad however it is not too good either. 4: My health state is quite good. 5: My health state is extremely good, it can’t get much better. 10. How many hours of sporting activity do you do in a week (slow walking does not count!)? a) 0-2 hours a week b) 3-5 hours a week c) 6-8 hours a week d) 9-11 hours a week e) More than 11 hours a week