The document summarizes information from several data analysis studies:
1. A survey of 1691 UK residents analyzed smoking habits through categorical variables like sex and marital status and discrete numerical variables like number of cigarettes smoked per week.
2. UN voting patterns from several countries over years were analyzed through categorical variables like country and numerical variables like percentage of votes and year treated as discrete or continuous.
3. A study of different course sections used observational sampling to understand student satisfaction across sections.
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
MGT assignment 1.docx
1. 1.10 smoking habits of UK residents.
What does each row of the data matrix represent?
Each row of the data matrix represents a single observation which is a UK resident.
As the survey was conducted to study the smoking habits of UK residents, each row
shows the information about a single resident of UK, male or female, who smokes
cigarette or not.
How many participants were included in the survey?
In this particular survey, a total number of 1691 participants were included in the
survey.
Indicate whether each variable in the study is numerical or categorical. If numerical,
identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
Sex: is a categorical data.
Age: can be both discrete and continuous, in this case we consider it as a discrete
numerical data because there are discrete numbers like 44, 53 and so on, but if we
look at exact age, in that case it will be continuous. For example, someone might be
53.342 years old. I mean the number of months and days would be considered as
continuous.
Marital status: is categorical data.
Gross income: is an ordinal categorical data type.
Smoke: is a categorical data.
Amount Weekends: is a discrete numerical data.
Amount Weekdays: is a discrete numerical data.
1.12 UN votes.
List the variables used in creating this visualization.
The variables used in creating this visualization are Arms control and disarmament,
colonialism, economic development, human rights, nuclear weapons and materials,
Palestinian conflict, country, year, and percentage.
Indicate whether each variable in the study is numerical or categorical. If numerical,
identify as continuous or discrete. If categorical, indicate if the variable is ordinal.
2. In this study, the variable “country” which includes US, Canada, and Mexico is a
categorical data type. However, the rest of the variables such as colonialism, year,
human rights and so on are numerical because the information about each variable is
presented in a specific chart, the specific percentages (or Y axis’ decimal numbers)
for each variable is a continuous numerical data and the years indicated in the chart
are discrete numerical data. To illustrate well, the X axis of the charts shows the years
and the Y axis of the charts shows the percentages. Therefore, years indicated in the
charts are discrete numerical data and the percentages indicated in the charts are
continuous numerical data.
1.18 cats on YouTube.
Percentage of all videos on YouTube that are cat videos: is a population parameter.
2%: is a sample statistic
A video in your sample: is an observation.
Whether or not a video is cat video: is a variable.
1.19 course satisfaction across sections.
What type of study is this?
It is an observational type of study.
Suggest a sampling strategy for carrying out this study.
Personally, I suggest a stratified random sampling for this study because the students
are already divided into 4 subgroups, so we can randomly select students from these
subgroups and conduct our survey to make sure each student is included are selected
randomly. This type sampling helps us to make sure that all students are included and
we can select students randomly.
1.27 sampling strategies.
3. He randomly samples 40 students from the study’s population, gives them the survey,
asks them to fill it out and bring it back the next day.
The sampling method proposed for this survey is simple random sample because he
randomly samples 40 students from a list of students. The expected bias for this
survey is Nonresponse bias because some students might not have time to fill the
survey due to being very busy or some of the students might not be able to fill the
survey in that specific period of time due to time shortage.
He gives out the survey only to his friends, making sure each one of them fills out the
survey.
The sampling method proposed for this survey is Convenience sampling because
people who are going fill the survey is his friends that’s why this kind of sampling is
suggested. The bias for this survey is Selection bias because he selected his friends
for this survey so it is possible that the sample may not be representative for all
population.
He posts a link to an online survey on Facebook and asks his friends to fill out the
survey.
The proposed sampling method for this survey is Convenience sampling because
once again he asked his friends to fill the survey. The possible bias for this survey is
Selection bias and Nonresponse bias because first of all the people who fill the
survey are his friends on Facebook so the sample is not representative for a larger or
all population. In addition, there is chance that some of his friends may not fill the
survey as he does not follow up to make sure everyone is answering and filling the
survey.
He randomly samples 5 classes and asks a random sample of students from those
classes to fill out the survey.
4. The proposed sampling method for this survey is Stratified sampling because he
randomly samples 5 classes which are actually 5 stratum or subgroups. Since the
survey population is divided into 5 subgroups the stratified sampling is suggest for the
survey. The possible bias for this survey is Response error because probably some
students may fill the survey incorrectly for any personal reason. For example, some
students may fill the survey incorrectly to protect personal information or to avoid
embarrassment.