SlideShare a Scribd company logo
1 of 52
Download to read offline
Statistical Analysis 2021/22
Lecture 1
About the Course
Introduction to Statistics & Data
What You Need to Know Before Your
Course Begins
ā€¢ We begin on time!
ā€¢ So please, be there on time!
ā€¢ Tuesday, 08:30 ā€“ 11:00
ā€¢ We make a break after 90 min max (usually after 75 minutes), so hold on
for food or other addictions until the break!
ā€¢ Any special treatment / status you need (early exams, longer absencesā€¦)
ā€¢ Send an email or come to my office before October 21 and weā€™ll try to
make some arrangements.
ā€¢ After that, the rules are set for all!
Course Outline
1) Data management
2) Visualization
3) Descriptive statistics
4) Statistical distributions
5) Confidence intervals
6) Hypothesis testing of one population
7) Comparing two populations
8) Non-parametric testing
9) Simple linear and multiple regression
10) Regression model building
11) Time series.
Teaching (Lectures + Exercises +
Assignments)
ā€¢ Offline, IRL, synchronous
ā€¢ Lectures: every Tuesday (08:30 ā€“ 11:00), M. Pahor
ā€¢ they will be recorded, the recording posted on
Canvas
ā€¢ Exercises: every Friday (09:00 ā€“ 11:00), M. Pahor
ā€¢ ā€œcomputer labā€
ā€¢ exercises, examples, Google Sheets and R Studio
Cloud, etc.
ā€¢ Google Sheets and R Studio Cloud
ā€¢ Youā€™ll need Google and an R Studio Cloud accounts
(free)
Teaching (Lectures + Exercises +
Assignments)
ā€¢ Online, Canvas, asynchronous
ā€¢ DIY Exercises, K. Dvorski
ā€¢ You will have recap quizzes with explanations for
every lecture available at all times for you to practice
and prepare for the exams successfully
ā€¢ Extra assignments (10% of your final grade), K. Dvorski
ā€¢ You will have to solve and submit a number of
assignments by a certain date over the course of
the semester (no make-up tests and no deadline
extensions)
ā€¢ The details will be announced in a timely manner
and you will have enough time and TAā€˜s support
for every assignment.
Teaching (Mid-Terms)
ā€¢ Canvas platform
ā€¢ On-line tests (midterms) are a part of the grade (30%) ā†’ There will be
no re-do tests!
ā€¢ Three (3) on-line tests will be given on the dates and times specified.
ā€¢ Each mid-term makes up for 10% of your final grade
ā€¢ Mid-terms are here to help you learn and pass the course! Please, donā€˜t stress!
ā€¢ DIY Exercises, assignments, exercises and lectures help you work
towards your grade.
ā€¢ Donā€˜t worry, if you take an active part in the course, you will have no
problem in passing the mid-terms and the final exam!
Exam & Grading
ā€¢ Course grading: Your course grade will be based on the maximum of the
following decompositions:
ā€¢ Graded tests (midterms) on Canvas (30 percent)
ā€¢ Bonus assignments during semester on Canvas (10 percent)
ā€¢ Final exam (60 percent)
ā€¢ You must answer more than 50% of questions correctly in order to pass the
final exam (e.g., 25 MCQs; 13 correct answers!)
ā€¢ The final exam might be on-line
ā€¢ All the important details on exams and assignments will be announced
shortly!
Time & Space
ā€¢ IRL at SEB:
ā€¢ Lectures: Tuesdays, 8:30-11:00
ā€¢ Exercises: Friday, 09:00 ā€“ 10:30
ā€¢ Instructor: Prof. Marko Pahor
ā€¢ email: marko.pahor@ef.uni-lj.si
ā€¢ Office hours
ā€¢ Wednesdays, 2 p.m. on Zoom (check link)
ā€¢ Assistant: Ms. Katarina Dvorski
ā€¢ will try to answer your questions and provide support within 24
hours from receiving your email.
ā€¢ email: katdvorski@gmail.com
Learning Objectives
After this lecture you should be able to:
LO 1: Describe the importance of statistics.
LO 2: Differentiate between descriptive statistics and
inferential statistics.
LO 3: Explain the various data types
LO 4: Describe variables and types of measurement
scales.
Statistics or Sadistics?
ā€¢ ā€œThere are three types of lies -- lies, damn lies, and statistics.ā€
Benjamin Disraeli
ā€¢ ā€œA single death is a tragedy; a million deaths is a statistic.ā€
Joseph Stalin
ā€¢ ā€œMost people use statistics like a drunk man uses a lamppost; more for
support than illuminationā€
Andrew Lang
What is Statistics?
Statistics
ā€¢ the branch of mathematics that examines ways to process and analyse
data. It provides procedures to collect and transform data in ways that are
useful to business decision makers.
ā€¢ is the science of data. This involves collecting, classifying, summarizing,
organizing, analysing, presenting, and interpreting numerical data.
ā€¢ In the broadest sense, we may define the study of statistics as the
methodology of extracting useful information from a data set.
Statistic
ā€¢ is a numerical measure that describes a characteristic of a sample.
Statistical Analysis
ā€¢ used to manipulate summarize, and investigate data, for the purposes of
useful decision-making.
Statistics & The Real World
ā€¢ Insurance companies use data on homeowners, drivers and many
more to define the insurance premium.
ā€¢ Accountants use sample data concerning a companyā€™s actual
sales revenues to assess whether the companyā€™s claimed sales
revenues are valid.
ā€¢ Marketing experts help businesses decide which products to
develop and offer by using data that reveal consumer preferences.
ā€¢ Politicians focus on public opinion polls to formulate legislation and
to create campaign strategies.
ā€¢ Scientists use data on the effectiveness of drugs and vaccines to
improve our health and advance knowledge.
ā€¦
ā€¢ Finance ā€“ correlation and regression, index numbers, time series
analysis,ā€¦
ā€¢ Marketing ā€“ hypothesis testing, chi-square tests, nonparametric
statistics,ā€¦
ā€¢ HRM ā€“ hypothesis testing, chi-square tests, nonparametric tests,ā€¦
ā€¢ Operations Management ā€“ hypothesis testing, estimation,, analysis
of variance, time series analysisā€¦
Types (Branches) of Statistics
ā€¢ Descriptive statistics
ā€¢ utilizes numerical and graphical methods to look for patterns in a data
set, to summarize the information revealed in a data set, and to
present that information in a convenient form;
ā€¢ focuses on collecting, summarizing and presenting a set of data.
ā€¢ Inferential statistics
ā€¢ utilizes sample data to make estimates, decisions, predictions, or other
generalizations about a larger set of data;
ā€¢ uses sample data to draw conclusions about a population.
ā€¦
Descriptive
Statistics
describe, organize,
summarize
common
terminology: mode,
median, mean,
averages
data presentation
(tables, graphs)
standard tools:
central tendency,
dispersion,
skewness
Inferential
Statistics
generalize findings
from samples to
populations
estimation,
predictions,
assessing
relationships
between variables
common
terminology: margin
of error, statistically
significant
standard tools:
hypothesis testing,
confidence
intervals,
regression analysis
ā€¦
Four Elements of Descriptive Statistical
Problems
1. The population or sample of interest
2. One or more variables (characteristics
of the population or sample units) that
are to be investigated
3. Tables, graphs, or numerical summary
tools
4. Identification of patterns in the data
Five Elements of Inferential Statistical
Problems
1. The population of interest
2. One or more variables (characteristics
of the population units) that are to be
investigated
3. The sample of population units
4. The inference about the population
based on information contained in the
sample
5. A measure of the reliability of the
inference
Statistical Inference
ā€¢ Estimation
ā€¢ e.g., estimate the population mean
weight using the sample mean weight
ā€¢ Hypothesis testing
ā€¢ e.g., test the claim that the population
mean weight is 70 kg
ā€¢ Statistical inference
ā€¢ process of drawing conclusions or
making decisions about a population
based on sample results (data).
ā€¢ making the inference is only part of the
story. We need to know its reliability ā€“
how good the inference is.
Statistical Inference: an estimate, prediction, or some
other generalization about a population based on
information contained in a sample.
Population ā€“ Sample ā€“ Variable
Population: all items of interest in a statistical problem, all elements under study.
Sample: subset of population (should be representative of the population)
Unit of observation: an entity about which information is collected
Data set: all the data collected in a particular study
Elements: individual entities of a data set (individuals)
Observation: the set of measurements obtained for a particular element
Individuals: every individual element in the population
Variables: characteristics of interest for the elements (individuals)
Population & Sample
ā€¢ Population
ā€¢ the entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of
interest;
ā€¢ A population parameter is a number calculated using the
population measurements that describes some aspect of the
population. That is, a population parameter is a descriptive
measure of the population.
ā€¢ Sample
ā€¢ a portion, or part, of the population of interest; it should be
representative of the population;
ā€¢ A sample statistic is a number calculated using the sample
measurements that describes some aspect of the sample.
That is, a sample statistic is a descriptive measure of the
sample.
Variable(s)
Independent Variable
ā€¢ also known as the explanatory or predictor variable
ā€¢ it explains variations in the response variable
ā€¢ in a study, it is manipulated by the researcher.
Dependent Variable
ā€¢ also known as the response or outcome variable
ā€¢ its value is predicted or its variation is explained by the explanatory
ā€¢ variable
ā€¢ in a study, this is the outcome that is measured following
manipulation of the explanatory variable.
Confounding Variable
ā€¢ a variable, other than the independent variable of interest, that may
affect the dependent variable.
ā€¢ an unforeseen or unaccounted-for factor that may call into question
the finding of a relationship between two other factors or variables.
ā€¦
Inferential Statistics: The Need for
Sampling
Resource constraints (e.g., time, money)
ā€¢ obtaining information on the entire population is expensive / time-
consuming
ā€¢ the monthly unemployment rate in the U.S. is calculated by the Bureau of
Labor Statistics (BLS): Is it reasonable to assume that the BLS counts
every unemployed person each month?
It is impossible to examine every member of the population
ā€¢ Suppose we are interested in the average length of life of a VARTA AAA
battery. If we tested the duration of each VARTA AAA battery, then in the
end, all batteries would be dead and the answer to the initial question
would be useless.
Recap: Example 1
ā€¢ ā€žCola warsā€œ is the popular term for the intense competition between Coca-
Cola and Pepsi displayed in their marketing campaigns. Their campaigns
have featured claims of consumer preference based on taste tests.
ā€¢ In 2013, the Huffington Post conducted a blind taste test of 9 cola brands
that included Coca-Cola and Pepsi. (Pepsi finished 1st and Coke finished
5th).
ā€¢ Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers
are given a blind taste test (i.e., a taste test in which the two brand names
are disguised). Each consumer is asked to state a preference for brand A
or brand B.
a) Describe the population.
b) Describe the variable of interest.
c) Describe the sample.
d) Describe the inference.
Source: McClave, J. T., Benson, G. P., & Sincich, T. (2017). Statistics or Business and Economics. Harlow, UK: Pearson
Education Limited, p. 32.
ā€¦
a) Because we are interested in the responses of cola consumers in a taste
test, a cola consumer (individual) is the experimental unit. Thus, the
population of interest is the collection or set of all cola consumers.
b) The characteristic that Pepsi wants to measure is the consumerā€™s cola
preference as revealed under the conditions of a blind taste test, so cola
preference is the variable of interest.
c) The sample is the 1,000 cola consumers selected from the population of all
cola consumers.
d) The inference of interest is the generalization of the cola preferences of the
1,000 sampled consumers to the population of all cola consumers. The
preferences of the consumers in the sample can be used to estimate the
percentage of all cola consumers who prefer each brand.
Source: McClave, J. T., Benson, G. P., & Sincich, T. (2017). Statistics or Business and Economics. Harlow, UK: Pearson
Education Limited, p. 32.
Recap: Example 2
ā€¢ Letā€™s say you want to find out how alcohol consumption affects mortality.
You decide to compare the mortality rates between two groups ā€“ one
consisting of heavy users of alcohol, one consisting of tee-totallers (people
who never drink alcohol).
ā€¢ What would be your independent and dependent variable in this case?.
ā€¢ If you find that people who consume more alcohol are more likely to die, it
might seem intuitive to conclude that alcohol use increases the risk of
death. In reality, however, the situation might be more complex. It is
possible that alcohol use is not the only mortality-affecting factor that differs
between the two groups?
ā€¢ What would be possible confounding variables?
ā€¦
ā€¢ alcohol consumption is the independent variable
ā€¢ mortality is the dependent variable
ā€¢ age, sex, ethnicity, diet, BMI are possible confounding variables
WHAT ARE DATA?
WHERE DO WE GET DATA AN HOW?
Data: A Quick Intro
ā€¢ Letā€˜s say you are interested in the state of the COVID pandemicā€¦
ā€¢ What data will you look for?
ā€¢ Where will you look for this data?
Data: A Definition
Data
ā€¢ facts, opinions, and figures from which conclusions can be drawn
ā€¢ typically in numerical form
ā€¢ data becomes information when it informs the decision making of the user
ā€¢ DIKW pyramid
ā€¢ Data is not information, information is not knowledge, knowledge is not
understanding, understanding is not wisdom.
Clifford Stoll
Basic Concepts of Data
ā€¢ variables are characteristics of items or individuals
ā€¢ data are the observed values of variables
ā€¢ all variables should have an operational definition ā€“ a universally accepted
meaning that is clear to all associated with an analysis
ā€¢ e.g., ā€˜country of birth of personā€™, which is the country identified as being
the one in which the person was born.
ā€¢ statistical techniques are processes that convert data into information
ā€¢ univariate data sets (one variable) ā†” univariate techniques
ā€¢ bivariate data sets (two variables) ā†” bivariate techniques
ā€¢ multivariate data sets (more than two variables) ā†” multivariate techniques
Source: Berenson, M. L. et. al. (2019). Basic Business Statistics: Concepts and Applications.
Melbourne, AU: Pearson Australia, p. 6.
ā€¦
ā€¢ Cross-sectional data contain values of a characteristic of many subjects at
the same point or approximately the same point in time.
ā€¢ variations of ice cream flavours at a particular store, heart rates of 100 patients at
the beginning of the same procedure
ā€¢ Time series data contain values of a characteristic of a subject over time.
ā€¢ e.g., patientā€˜s ECG heart data, daily closing prices over one year for a single
financial security
ā€¢ Panel data (longitudinal data) contains observations about different cross
sections across time.
ā€¢ examples of groups that may make up panel data series include countries, firms,
individuals, or demographic groups.
ā€¢ e.g., GDP per capita for Sub-saharan African (SSA) countries over the period
1960 ā€“ 2020 and stock prices of listed companies in Slovenia over the period
2000 ā€“ 2020.
ā€¦
ā€¢ Structured data generally refers to data that has a well-defined length and
format.
ā€¢ data reside in a predefined row-column format
ā€¢ fits nicely into a spreadsheet, relational database
ā€¢ numbers, dates, and groups of words and numbers called strings
ā€¢ Unstructured data (unmodeled data) do not conform to a predefined row-
column format.
ā€¢ textual (e.g., e-mail or open-ended survey responses), multimedia
contents (e.g., photographs, videos, and audio data)
ā€¢ social media data, such as those that appear on LinkedIn, Twitter,
YouTube are examples of unstructured data
ā€¢ AI is changing the landscape of unstructured data
ā€¦
ā€¢ The term big data is used to describe a massive volume of both structured
and unstructured data that are extremely difficult to manage, process, and
analyze using traditional data processing tools.
ā€¢ The availability of big data, however, does not necessarily imply complete
(population) data.
ā€¦
ā€¢ outliers: values that appear to be excessively large or small
compared with most values observed.
ā€¢ missing values: refers to when no data value is stored for one or
more variables in an observation.
Primary vs. Secondary Data
Sources of Data
ā€¢ data distributed by an
organisation or an individual
ā€¢ a designed experiment
ā€¢ a survey
ā€¢ an observational study
ā€¢ data collected by ongoing
business activities.
ā€¢ official statistics offices, e.g.:
ā€¢ www.stat.si
ā€¢ ec.europa.eu/Eurostat
ā€¢ https://sdw.ecb.europa.eu
ā€¢ ā€¦
ā€¢ statistics departments of
international organizations
stats.oecd.org/
ā€¢ unstats.un.org/
ā€¢ ā€¦
ā€¢ Other sources:
ā€¢ https://www.dataquest.io/blog
/free-datasets-for-projects/
ā€¢ ā€¦
Primary sources: provide first-hand data collected
by the data analyser.
Secondary sources: provide (already collected)
data collected by another person or organization.
Methods and Properties of Data
Collection
ā€¢ The reliability and accuracy of the data affect the validity of the results in a
statistical analysis.
ā€¢ The reliability and accuracy of the data depend on the method(s) of data
collection.
ā€¢ Three of the most popular sources of statistical data are:
ā€¢ published data
ā€¢ observational studies
ā€¢ experimental studies.
Reliability Validity Accuracy
Definition
The consistency of
repeated assessments.
How well the
assessment measures
the concept of what it
intends to measure.
How well an
assessment measures
what (i.e. the variable)
it is supposed to
measure.
Example
Measurement of
someone's weight using a
weighing scale would give
consistent results in the
same person.
The heavier a person
is the more likely they
are to be overweight.
A properly calibrated
weighing scale would
accurately measure
kilograms
Variables & Data: Qualitative vs.
Quantitative
Qualitative
ā€¢ also known as descriptive or attributive
ā€¢ labels or names are used to categorize
the distinguishing characteristics of a
qualitative variable
ā€¢ observed values (data) not numerical
in nature
ā€¢ can be quantified in the translation
process ā†” attributes may be coded
into numbers for purposes of data
processing
Quantitative
ā€¢ also known as numerical
ā€¢ a quantitative variable assumes
meaningful numerical values (data),
and can be further categorized as
either discrete or continuous
ā€¢ a discrete variable assumes a
countable number of values
ā€¢ a continuous variable is characterized
by uncountable values within an
interval
VARIABLES AND SCALES OF MEASUREMENT
Scales of Variable Measurement
ā€¢ Variables are measurement using an instrument, device, or computer.
ā€¢ The scale of the variable measured drastically affects the type of analytical
techniques that can be used on the data, and what conclusions can be
drawn from the data.
ā€¢ There are four scales of measurement, nominal, ordinal, interval, and ratio.
Measurement
ā€¢ Nominal measurement reflects classification of objects (e.g., codes A, N and P to
represent aggressive, normal, and passive drivers); the order has no meaning, and the
difference between identifiers is meaningless. In practice it is often useful to assign
numbers instead of letters to represent nominal scale variables, but the numbers
should not be treated as ordinal, interval, or ratio scale variables (e.g., bar codes).
ā€¢ Ordinal measurement reflects rank (e.g., : 1 = use often; 2 = use sometimes; 3 = never
use); order matters, but the difference between responses in not consistent across the
scale or individuals.
ā€¢ Interval measurement enables meaningful interpretation of numbers assigned to
objects (e.g., temperature in Celsius or Fahrenheit, time); difference between
measurements is the same anywhere along the scale and consistent across
measurements. Ratios of interval scale variables have limited meaning because there
is not an absolute zero for interval scale variables.
ā€¢ Ratio measurement has all the attributes of interval scale variables and one additional
attribute: an absolute ā€œzeroā€ point. For example, traffic density (vehicles per kilometre)
represents a ratio scale. The density of a link is defined as zero when there are no
vehicles in a link.
Measurement Levels & Variables
Measurement Levels & Variables: An
Alternative Approach
Measurement & Variables: A
Comparison
Provides Nominal Ordinal Interval Ratio
Categorizes and
labels values
āœ“ āœ“ āœ“ āœ“
The order of
values is known
āœ“ āœ“ āœ“
Counts aka
frequency of
distribution
āœ“ āœ“ āœ“ āœ“
Mode āœ“ āœ“ āœ“ āœ“
Median āœ“ āœ“ āœ“
Mean āœ“ āœ“
Can quantify the
difference
between each
value
āœ“ āœ“
Can add or
substract values
āœ“ āœ“
Can multiply or
divide values
āœ“
Has true zero āœ“
Types of Variables with Respect to
Data
Variables
Categorical
(qualitative)
Nominal Ordinal
Numerical
(quantitative)
Interval
Discrete or
continuous
Ratio
Discrete or
continuous
ā€¦
Discrete vs. Continuous Variables
As a general rule, counts are discrete and measurements are continuous.
Butā€¦think of a bucket filled with identical grains of sand. Although they are
countable, it might be easier to devise a weighing device that will have a
scale indicating the numbers of grains of sand.
Recap: Example 3
For this problem, state whether the variables included are
cross-sectional, time series or :
ā€¢ current GPAs of Purdue Statistics Graduate Students
ā€¢ GPA of Marko during his time at Purdue
ā€¢ value of Jordan Belfortā€˜s portfolio over the last 3 years before
pleading guilty to fraud in 1999.
ā€¢ value of all portfolioā€™s at Charles Schwaab in January 2019
ā€¢ total salary of the Dallas Mavericks throughout the 1990s
ā€¢ salaries of all Vuelta cycling teams in 2021.
ā€¦
ā€¢ cross-sectional
ā€¢ time series
ā€¢ time series
ā€¢ cross-sectional
ā€¢ time series
ā€¢ cross-sectional
Recap: Example 4
ā€¢ Labor Market Data of Cornwell and Rupert (1988) consists of the following
variables for 595 Individuals over 7 years:
ā€¢ EXP =Work experience
ā€¢ WKS =Weeks worked
ā€¢ OCC =Occupation, 1 if blue collar
ā€¢ IND =1 if manufacturing industry
ā€¢ SOUTH =1 if resides in south
ā€¢ SMSA =1 if resides in a city (SMSA)
ā€¢ MS =1 if married
ā€¢ FEM =1 if female
ā€¢ UNION =1 if wage set by union contract
ā€¢ ED =Years of education
ā€¢ BLK =1 if individual is black
ā€¢ LWAGE=Log of wage
ā€¢ This is an example of a panel data set.
ā€¢ What analysis can be made by using this data?
ā€¢ e.g., Returns to Schooling
Recap: Example 5
ā€¢ How many elements are in the data set?
ā€¢ How many variables are in the data set?
ā€¢ What type of variable is each variable in the data set (be sure to
answer both qualitative or quantitative as well as nominal, ordinal,
interval, or ratio). Define the quantitative variables as discrete or
continuous.
Grade Major GPA
Credit
Hours
Sophomore Psychology 3.14 30
Senior Spanish 2.89 105
Senior Religion 3.01 99
Freshman Philosophy 2.45 12
Source: Huang, W. (2014). Lecture 25: Types of Data, Sampling. PowerPoint Presentation. Purdue University.
ā€¦
ā€¢ 4 elements in total
ā€¢ 4 variables in this data set. They are Grade, Major, Credit
Hours, and GPA (grade point average)
ā€¢ Grade: qualitative (ordinal); Major: qualitative (nominal);
GPA: quantitative (interval, continuous); Credit hours:
quantitative (ratio, discrete)

More Related Content

Similar to Lekcija 1 - Uvod.pdf

Chapter-one.pptx
Chapter-one.pptxChapter-one.pptx
Chapter-one.pptxAbebeNega
Ā 
Chapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingChapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingetebarkhmichale
Ā 
Research and Data Analysi-1.pptx
Research and Data Analysi-1.pptxResearch and Data Analysi-1.pptx
Research and Data Analysi-1.pptxMaryamManzoor25
Ā 
Overview of the Possibilities of Quantitative Methods in Political Science
Overview of the Possibilities of Quantitative Methods in Political ScienceOverview of the Possibilities of Quantitative Methods in Political Science
Overview of the Possibilities of Quantitative Methods in Political Scienceenvironmentalconflicts
Ā 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptxUnfold1
Ā 
Presentation on research methodologies
Presentation on research methodologiesPresentation on research methodologies
Presentation on research methodologiesBilal Naqeeb
Ā 
Observation and Research: Session 1 (Blended TEFL course)
Observation and Research: Session 1 (Blended TEFL course)Observation and Research: Session 1 (Blended TEFL course)
Observation and Research: Session 1 (Blended TEFL course)Maria Mu
Ā 
Biostatisitics.pptx
Biostatisitics.pptxBiostatisitics.pptx
Biostatisitics.pptxFatima117039
Ā 
research Qualitative vs. quantitative research
research Qualitative vs. quantitative researchresearch Qualitative vs. quantitative research
research Qualitative vs. quantitative researchgagan deep
Ā 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statisticsUlster BOCES
Ā 
Powerpoint Presentation: research design using quantitative method
Powerpoint Presentation: research design using quantitative methodPowerpoint Presentation: research design using quantitative method
Powerpoint Presentation: research design using quantitative methoddianakamaruddin
Ā 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsjasondroesch
Ā 
statistics.pdf
statistics.pdfstatistics.pdf
statistics.pdfNoname274365
Ā 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionalsNadir Mehmood
Ā 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdfezaldeen2013
Ā 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collectionYogeshSorot
Ā 
Introduction to Statistics-prelimanary.pptx
Introduction to Statistics-prelimanary.pptxIntroduction to Statistics-prelimanary.pptx
Introduction to Statistics-prelimanary.pptxUthayaSuriyan4
Ā 
Session_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptSession_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptmousaderhem1
Ā 
Session_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptSession_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptGurumurthy B R
Ā 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theoryUnsa Shakir
Ā 

Similar to Lekcija 1 - Uvod.pdf (20)

Chapter-one.pptx
Chapter-one.pptxChapter-one.pptx
Chapter-one.pptx
Ā 
Chapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processingChapter 6.pptx Data Analysis and processing
Chapter 6.pptx Data Analysis and processing
Ā 
Research and Data Analysi-1.pptx
Research and Data Analysi-1.pptxResearch and Data Analysi-1.pptx
Research and Data Analysi-1.pptx
Ā 
Overview of the Possibilities of Quantitative Methods in Political Science
Overview of the Possibilities of Quantitative Methods in Political ScienceOverview of the Possibilities of Quantitative Methods in Political Science
Overview of the Possibilities of Quantitative Methods in Political Science
Ā 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
Ā 
Presentation on research methodologies
Presentation on research methodologiesPresentation on research methodologies
Presentation on research methodologies
Ā 
Observation and Research: Session 1 (Blended TEFL course)
Observation and Research: Session 1 (Blended TEFL course)Observation and Research: Session 1 (Blended TEFL course)
Observation and Research: Session 1 (Blended TEFL course)
Ā 
Biostatisitics.pptx
Biostatisitics.pptxBiostatisitics.pptx
Biostatisitics.pptx
Ā 
research Qualitative vs. quantitative research
research Qualitative vs. quantitative researchresearch Qualitative vs. quantitative research
research Qualitative vs. quantitative research
Ā 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statistics
Ā 
Powerpoint Presentation: research design using quantitative method
Powerpoint Presentation: research design using quantitative methodPowerpoint Presentation: research design using quantitative method
Powerpoint Presentation: research design using quantitative method
Ā 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
Ā 
statistics.pdf
statistics.pdfstatistics.pdf
statistics.pdf
Ā 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionals
Ā 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
Ā 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
Ā 
Introduction to Statistics-prelimanary.pptx
Introduction to Statistics-prelimanary.pptxIntroduction to Statistics-prelimanary.pptx
Introduction to Statistics-prelimanary.pptx
Ā 
Session_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptSession_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.ppt
Ā 
Session_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.pptSession_12_-_Data_Collection,_Analy_237.ppt
Session_12_-_Data_Collection,_Analy_237.ppt
Ā 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theory
Ā 

Recently uploaded

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
Ā 
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
Ā 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
Ā 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
Ā 
Nanoparticles synthesis and characterizationā€‹ ā€‹
Nanoparticles synthesis and characterizationā€‹  ā€‹Nanoparticles synthesis and characterizationā€‹  ā€‹
Nanoparticles synthesis and characterizationā€‹ ā€‹kaibalyasahoo82800
Ā 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
Ā 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
Ā 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
Ā 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
Ā 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
Ā 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
Ā 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
Ā 
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.aasikanpl
Ā 
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service šŸŖ”
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  šŸŖ”CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  šŸŖ”
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service šŸŖ”anilsa9823
Ā 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSĆ©rgio Sacani
Ā 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
Ā 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
Ā 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
Ā 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
Ā 

Recently uploaded (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Ā 
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow šŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Ā 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
Ā 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Ā 
Nanoparticles synthesis and characterizationā€‹ ā€‹
Nanoparticles synthesis and characterizationā€‹  ā€‹Nanoparticles synthesis and characterizationā€‹  ā€‹
Nanoparticles synthesis and characterizationā€‹ ā€‹
Ā 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
Ā 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ā 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Ā 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Ā 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
Ā 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
Ā 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Ā 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
Ā 
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.
Call Girls in Munirka Delhi šŸ’ÆCall Us šŸ”9953322196šŸ” šŸ’ÆEscort.
Ā 
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service šŸŖ”
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  šŸŖ”CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  šŸŖ”
CALL ON āž„8923113531 šŸ”Call Girls Kesar Bagh Lucknow best Night Fun service šŸŖ”
Ā 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Ā 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Ā 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
Ā 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
Ā 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Ā 

Lekcija 1 - Uvod.pdf

  • 1. Statistical Analysis 2021/22 Lecture 1 About the Course Introduction to Statistics & Data
  • 2. What You Need to Know Before Your Course Begins ā€¢ We begin on time! ā€¢ So please, be there on time! ā€¢ Tuesday, 08:30 ā€“ 11:00 ā€¢ We make a break after 90 min max (usually after 75 minutes), so hold on for food or other addictions until the break! ā€¢ Any special treatment / status you need (early exams, longer absencesā€¦) ā€¢ Send an email or come to my office before October 21 and weā€™ll try to make some arrangements. ā€¢ After that, the rules are set for all!
  • 3. Course Outline 1) Data management 2) Visualization 3) Descriptive statistics 4) Statistical distributions 5) Confidence intervals 6) Hypothesis testing of one population 7) Comparing two populations 8) Non-parametric testing 9) Simple linear and multiple regression 10) Regression model building 11) Time series.
  • 4. Teaching (Lectures + Exercises + Assignments) ā€¢ Offline, IRL, synchronous ā€¢ Lectures: every Tuesday (08:30 ā€“ 11:00), M. Pahor ā€¢ they will be recorded, the recording posted on Canvas ā€¢ Exercises: every Friday (09:00 ā€“ 11:00), M. Pahor ā€¢ ā€œcomputer labā€ ā€¢ exercises, examples, Google Sheets and R Studio Cloud, etc. ā€¢ Google Sheets and R Studio Cloud ā€¢ Youā€™ll need Google and an R Studio Cloud accounts (free)
  • 5. Teaching (Lectures + Exercises + Assignments) ā€¢ Online, Canvas, asynchronous ā€¢ DIY Exercises, K. Dvorski ā€¢ You will have recap quizzes with explanations for every lecture available at all times for you to practice and prepare for the exams successfully ā€¢ Extra assignments (10% of your final grade), K. Dvorski ā€¢ You will have to solve and submit a number of assignments by a certain date over the course of the semester (no make-up tests and no deadline extensions) ā€¢ The details will be announced in a timely manner and you will have enough time and TAā€˜s support for every assignment.
  • 6. Teaching (Mid-Terms) ā€¢ Canvas platform ā€¢ On-line tests (midterms) are a part of the grade (30%) ā†’ There will be no re-do tests! ā€¢ Three (3) on-line tests will be given on the dates and times specified. ā€¢ Each mid-term makes up for 10% of your final grade ā€¢ Mid-terms are here to help you learn and pass the course! Please, donā€˜t stress! ā€¢ DIY Exercises, assignments, exercises and lectures help you work towards your grade. ā€¢ Donā€˜t worry, if you take an active part in the course, you will have no problem in passing the mid-terms and the final exam!
  • 7. Exam & Grading ā€¢ Course grading: Your course grade will be based on the maximum of the following decompositions: ā€¢ Graded tests (midterms) on Canvas (30 percent) ā€¢ Bonus assignments during semester on Canvas (10 percent) ā€¢ Final exam (60 percent) ā€¢ You must answer more than 50% of questions correctly in order to pass the final exam (e.g., 25 MCQs; 13 correct answers!) ā€¢ The final exam might be on-line ā€¢ All the important details on exams and assignments will be announced shortly!
  • 8. Time & Space ā€¢ IRL at SEB: ā€¢ Lectures: Tuesdays, 8:30-11:00 ā€¢ Exercises: Friday, 09:00 ā€“ 10:30 ā€¢ Instructor: Prof. Marko Pahor ā€¢ email: marko.pahor@ef.uni-lj.si ā€¢ Office hours ā€¢ Wednesdays, 2 p.m. on Zoom (check link) ā€¢ Assistant: Ms. Katarina Dvorski ā€¢ will try to answer your questions and provide support within 24 hours from receiving your email. ā€¢ email: katdvorski@gmail.com
  • 9. Learning Objectives After this lecture you should be able to: LO 1: Describe the importance of statistics. LO 2: Differentiate between descriptive statistics and inferential statistics. LO 3: Explain the various data types LO 4: Describe variables and types of measurement scales.
  • 10. Statistics or Sadistics? ā€¢ ā€œThere are three types of lies -- lies, damn lies, and statistics.ā€ Benjamin Disraeli ā€¢ ā€œA single death is a tragedy; a million deaths is a statistic.ā€ Joseph Stalin ā€¢ ā€œMost people use statistics like a drunk man uses a lamppost; more for support than illuminationā€ Andrew Lang
  • 11. What is Statistics? Statistics ā€¢ the branch of mathematics that examines ways to process and analyse data. It provides procedures to collect and transform data in ways that are useful to business decision makers. ā€¢ is the science of data. This involves collecting, classifying, summarizing, organizing, analysing, presenting, and interpreting numerical data. ā€¢ In the broadest sense, we may define the study of statistics as the methodology of extracting useful information from a data set. Statistic ā€¢ is a numerical measure that describes a characteristic of a sample. Statistical Analysis ā€¢ used to manipulate summarize, and investigate data, for the purposes of useful decision-making.
  • 12. Statistics & The Real World ā€¢ Insurance companies use data on homeowners, drivers and many more to define the insurance premium. ā€¢ Accountants use sample data concerning a companyā€™s actual sales revenues to assess whether the companyā€™s claimed sales revenues are valid. ā€¢ Marketing experts help businesses decide which products to develop and offer by using data that reveal consumer preferences. ā€¢ Politicians focus on public opinion polls to formulate legislation and to create campaign strategies. ā€¢ Scientists use data on the effectiveness of drugs and vaccines to improve our health and advance knowledge.
  • 13. ā€¦ ā€¢ Finance ā€“ correlation and regression, index numbers, time series analysis,ā€¦ ā€¢ Marketing ā€“ hypothesis testing, chi-square tests, nonparametric statistics,ā€¦ ā€¢ HRM ā€“ hypothesis testing, chi-square tests, nonparametric tests,ā€¦ ā€¢ Operations Management ā€“ hypothesis testing, estimation,, analysis of variance, time series analysisā€¦
  • 14. Types (Branches) of Statistics ā€¢ Descriptive statistics ā€¢ utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set, and to present that information in a convenient form; ā€¢ focuses on collecting, summarizing and presenting a set of data. ā€¢ Inferential statistics ā€¢ utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data; ā€¢ uses sample data to draw conclusions about a population.
  • 15. ā€¦ Descriptive Statistics describe, organize, summarize common terminology: mode, median, mean, averages data presentation (tables, graphs) standard tools: central tendency, dispersion, skewness Inferential Statistics generalize findings from samples to populations estimation, predictions, assessing relationships between variables common terminology: margin of error, statistically significant standard tools: hypothesis testing, confidence intervals, regression analysis
  • 16. ā€¦ Four Elements of Descriptive Statistical Problems 1. The population or sample of interest 2. One or more variables (characteristics of the population or sample units) that are to be investigated 3. Tables, graphs, or numerical summary tools 4. Identification of patterns in the data Five Elements of Inferential Statistical Problems 1. The population of interest 2. One or more variables (characteristics of the population units) that are to be investigated 3. The sample of population units 4. The inference about the population based on information contained in the sample 5. A measure of the reliability of the inference
  • 17. Statistical Inference ā€¢ Estimation ā€¢ e.g., estimate the population mean weight using the sample mean weight ā€¢ Hypothesis testing ā€¢ e.g., test the claim that the population mean weight is 70 kg ā€¢ Statistical inference ā€¢ process of drawing conclusions or making decisions about a population based on sample results (data). ā€¢ making the inference is only part of the story. We need to know its reliability ā€“ how good the inference is. Statistical Inference: an estimate, prediction, or some other generalization about a population based on information contained in a sample.
  • 18. Population ā€“ Sample ā€“ Variable Population: all items of interest in a statistical problem, all elements under study. Sample: subset of population (should be representative of the population) Unit of observation: an entity about which information is collected Data set: all the data collected in a particular study Elements: individual entities of a data set (individuals) Observation: the set of measurements obtained for a particular element Individuals: every individual element in the population Variables: characteristics of interest for the elements (individuals)
  • 19. Population & Sample ā€¢ Population ā€¢ the entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest; ā€¢ A population parameter is a number calculated using the population measurements that describes some aspect of the population. That is, a population parameter is a descriptive measure of the population. ā€¢ Sample ā€¢ a portion, or part, of the population of interest; it should be representative of the population; ā€¢ A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample. That is, a sample statistic is a descriptive measure of the sample.
  • 20. Variable(s) Independent Variable ā€¢ also known as the explanatory or predictor variable ā€¢ it explains variations in the response variable ā€¢ in a study, it is manipulated by the researcher. Dependent Variable ā€¢ also known as the response or outcome variable ā€¢ its value is predicted or its variation is explained by the explanatory ā€¢ variable ā€¢ in a study, this is the outcome that is measured following manipulation of the explanatory variable. Confounding Variable ā€¢ a variable, other than the independent variable of interest, that may affect the dependent variable. ā€¢ an unforeseen or unaccounted-for factor that may call into question the finding of a relationship between two other factors or variables.
  • 22. Inferential Statistics: The Need for Sampling Resource constraints (e.g., time, money) ā€¢ obtaining information on the entire population is expensive / time- consuming ā€¢ the monthly unemployment rate in the U.S. is calculated by the Bureau of Labor Statistics (BLS): Is it reasonable to assume that the BLS counts every unemployed person each month? It is impossible to examine every member of the population ā€¢ Suppose we are interested in the average length of life of a VARTA AAA battery. If we tested the duration of each VARTA AAA battery, then in the end, all batteries would be dead and the answer to the initial question would be useless.
  • 23. Recap: Example 1 ā€¢ ā€žCola warsā€œ is the popular term for the intense competition between Coca- Cola and Pepsi displayed in their marketing campaigns. Their campaigns have featured claims of consumer preference based on taste tests. ā€¢ In 2013, the Huffington Post conducted a blind taste test of 9 cola brands that included Coca-Cola and Pepsi. (Pepsi finished 1st and Coke finished 5th). ā€¢ Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a blind taste test (i.e., a taste test in which the two brand names are disguised). Each consumer is asked to state a preference for brand A or brand B. a) Describe the population. b) Describe the variable of interest. c) Describe the sample. d) Describe the inference. Source: McClave, J. T., Benson, G. P., & Sincich, T. (2017). Statistics or Business and Economics. Harlow, UK: Pearson Education Limited, p. 32.
  • 24. ā€¦ a) Because we are interested in the responses of cola consumers in a taste test, a cola consumer (individual) is the experimental unit. Thus, the population of interest is the collection or set of all cola consumers. b) The characteristic that Pepsi wants to measure is the consumerā€™s cola preference as revealed under the conditions of a blind taste test, so cola preference is the variable of interest. c) The sample is the 1,000 cola consumers selected from the population of all cola consumers. d) The inference of interest is the generalization of the cola preferences of the 1,000 sampled consumers to the population of all cola consumers. The preferences of the consumers in the sample can be used to estimate the percentage of all cola consumers who prefer each brand. Source: McClave, J. T., Benson, G. P., & Sincich, T. (2017). Statistics or Business and Economics. Harlow, UK: Pearson Education Limited, p. 32.
  • 25. Recap: Example 2 ā€¢ Letā€™s say you want to find out how alcohol consumption affects mortality. You decide to compare the mortality rates between two groups ā€“ one consisting of heavy users of alcohol, one consisting of tee-totallers (people who never drink alcohol). ā€¢ What would be your independent and dependent variable in this case?. ā€¢ If you find that people who consume more alcohol are more likely to die, it might seem intuitive to conclude that alcohol use increases the risk of death. In reality, however, the situation might be more complex. It is possible that alcohol use is not the only mortality-affecting factor that differs between the two groups? ā€¢ What would be possible confounding variables?
  • 26. ā€¦ ā€¢ alcohol consumption is the independent variable ā€¢ mortality is the dependent variable ā€¢ age, sex, ethnicity, diet, BMI are possible confounding variables
  • 27. WHAT ARE DATA? WHERE DO WE GET DATA AN HOW?
  • 28. Data: A Quick Intro ā€¢ Letā€˜s say you are interested in the state of the COVID pandemicā€¦ ā€¢ What data will you look for? ā€¢ Where will you look for this data?
  • 29. Data: A Definition Data ā€¢ facts, opinions, and figures from which conclusions can be drawn ā€¢ typically in numerical form ā€¢ data becomes information when it informs the decision making of the user ā€¢ DIKW pyramid ā€¢ Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. Clifford Stoll
  • 30. Basic Concepts of Data ā€¢ variables are characteristics of items or individuals ā€¢ data are the observed values of variables ā€¢ all variables should have an operational definition ā€“ a universally accepted meaning that is clear to all associated with an analysis ā€¢ e.g., ā€˜country of birth of personā€™, which is the country identified as being the one in which the person was born. ā€¢ statistical techniques are processes that convert data into information ā€¢ univariate data sets (one variable) ā†” univariate techniques ā€¢ bivariate data sets (two variables) ā†” bivariate techniques ā€¢ multivariate data sets (more than two variables) ā†” multivariate techniques Source: Berenson, M. L. et. al. (2019). Basic Business Statistics: Concepts and Applications. Melbourne, AU: Pearson Australia, p. 6.
  • 31. ā€¦ ā€¢ Cross-sectional data contain values of a characteristic of many subjects at the same point or approximately the same point in time. ā€¢ variations of ice cream flavours at a particular store, heart rates of 100 patients at the beginning of the same procedure ā€¢ Time series data contain values of a characteristic of a subject over time. ā€¢ e.g., patientā€˜s ECG heart data, daily closing prices over one year for a single financial security ā€¢ Panel data (longitudinal data) contains observations about different cross sections across time. ā€¢ examples of groups that may make up panel data series include countries, firms, individuals, or demographic groups. ā€¢ e.g., GDP per capita for Sub-saharan African (SSA) countries over the period 1960 ā€“ 2020 and stock prices of listed companies in Slovenia over the period 2000 ā€“ 2020.
  • 32. ā€¦ ā€¢ Structured data generally refers to data that has a well-defined length and format. ā€¢ data reside in a predefined row-column format ā€¢ fits nicely into a spreadsheet, relational database ā€¢ numbers, dates, and groups of words and numbers called strings ā€¢ Unstructured data (unmodeled data) do not conform to a predefined row- column format. ā€¢ textual (e.g., e-mail or open-ended survey responses), multimedia contents (e.g., photographs, videos, and audio data) ā€¢ social media data, such as those that appear on LinkedIn, Twitter, YouTube are examples of unstructured data ā€¢ AI is changing the landscape of unstructured data
  • 33. ā€¦ ā€¢ The term big data is used to describe a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data processing tools. ā€¢ The availability of big data, however, does not necessarily imply complete (population) data.
  • 34. ā€¦ ā€¢ outliers: values that appear to be excessively large or small compared with most values observed. ā€¢ missing values: refers to when no data value is stored for one or more variables in an observation.
  • 36. Sources of Data ā€¢ data distributed by an organisation or an individual ā€¢ a designed experiment ā€¢ a survey ā€¢ an observational study ā€¢ data collected by ongoing business activities. ā€¢ official statistics offices, e.g.: ā€¢ www.stat.si ā€¢ ec.europa.eu/Eurostat ā€¢ https://sdw.ecb.europa.eu ā€¢ ā€¦ ā€¢ statistics departments of international organizations stats.oecd.org/ ā€¢ unstats.un.org/ ā€¢ ā€¦ ā€¢ Other sources: ā€¢ https://www.dataquest.io/blog /free-datasets-for-projects/ ā€¢ ā€¦ Primary sources: provide first-hand data collected by the data analyser. Secondary sources: provide (already collected) data collected by another person or organization.
  • 37. Methods and Properties of Data Collection ā€¢ The reliability and accuracy of the data affect the validity of the results in a statistical analysis. ā€¢ The reliability and accuracy of the data depend on the method(s) of data collection. ā€¢ Three of the most popular sources of statistical data are: ā€¢ published data ā€¢ observational studies ā€¢ experimental studies. Reliability Validity Accuracy Definition The consistency of repeated assessments. How well the assessment measures the concept of what it intends to measure. How well an assessment measures what (i.e. the variable) it is supposed to measure. Example Measurement of someone's weight using a weighing scale would give consistent results in the same person. The heavier a person is the more likely they are to be overweight. A properly calibrated weighing scale would accurately measure kilograms
  • 38. Variables & Data: Qualitative vs. Quantitative Qualitative ā€¢ also known as descriptive or attributive ā€¢ labels or names are used to categorize the distinguishing characteristics of a qualitative variable ā€¢ observed values (data) not numerical in nature ā€¢ can be quantified in the translation process ā†” attributes may be coded into numbers for purposes of data processing Quantitative ā€¢ also known as numerical ā€¢ a quantitative variable assumes meaningful numerical values (data), and can be further categorized as either discrete or continuous ā€¢ a discrete variable assumes a countable number of values ā€¢ a continuous variable is characterized by uncountable values within an interval
  • 39. VARIABLES AND SCALES OF MEASUREMENT
  • 40. Scales of Variable Measurement ā€¢ Variables are measurement using an instrument, device, or computer. ā€¢ The scale of the variable measured drastically affects the type of analytical techniques that can be used on the data, and what conclusions can be drawn from the data. ā€¢ There are four scales of measurement, nominal, ordinal, interval, and ratio.
  • 41. Measurement ā€¢ Nominal measurement reflects classification of objects (e.g., codes A, N and P to represent aggressive, normal, and passive drivers); the order has no meaning, and the difference between identifiers is meaningless. In practice it is often useful to assign numbers instead of letters to represent nominal scale variables, but the numbers should not be treated as ordinal, interval, or ratio scale variables (e.g., bar codes). ā€¢ Ordinal measurement reflects rank (e.g., : 1 = use often; 2 = use sometimes; 3 = never use); order matters, but the difference between responses in not consistent across the scale or individuals. ā€¢ Interval measurement enables meaningful interpretation of numbers assigned to objects (e.g., temperature in Celsius or Fahrenheit, time); difference between measurements is the same anywhere along the scale and consistent across measurements. Ratios of interval scale variables have limited meaning because there is not an absolute zero for interval scale variables. ā€¢ Ratio measurement has all the attributes of interval scale variables and one additional attribute: an absolute ā€œzeroā€ point. For example, traffic density (vehicles per kilometre) represents a ratio scale. The density of a link is defined as zero when there are no vehicles in a link.
  • 42. Measurement Levels & Variables
  • 43. Measurement Levels & Variables: An Alternative Approach
  • 44. Measurement & Variables: A Comparison Provides Nominal Ordinal Interval Ratio Categorizes and labels values āœ“ āœ“ āœ“ āœ“ The order of values is known āœ“ āœ“ āœ“ Counts aka frequency of distribution āœ“ āœ“ āœ“ āœ“ Mode āœ“ āœ“ āœ“ āœ“ Median āœ“ āœ“ āœ“ Mean āœ“ āœ“ Can quantify the difference between each value āœ“ āœ“ Can add or substract values āœ“ āœ“ Can multiply or divide values āœ“ Has true zero āœ“
  • 45. Types of Variables with Respect to Data Variables Categorical (qualitative) Nominal Ordinal Numerical (quantitative) Interval Discrete or continuous Ratio Discrete or continuous
  • 47. Discrete vs. Continuous Variables As a general rule, counts are discrete and measurements are continuous. Butā€¦think of a bucket filled with identical grains of sand. Although they are countable, it might be easier to devise a weighing device that will have a scale indicating the numbers of grains of sand.
  • 48. Recap: Example 3 For this problem, state whether the variables included are cross-sectional, time series or : ā€¢ current GPAs of Purdue Statistics Graduate Students ā€¢ GPA of Marko during his time at Purdue ā€¢ value of Jordan Belfortā€˜s portfolio over the last 3 years before pleading guilty to fraud in 1999. ā€¢ value of all portfolioā€™s at Charles Schwaab in January 2019 ā€¢ total salary of the Dallas Mavericks throughout the 1990s ā€¢ salaries of all Vuelta cycling teams in 2021.
  • 49. ā€¦ ā€¢ cross-sectional ā€¢ time series ā€¢ time series ā€¢ cross-sectional ā€¢ time series ā€¢ cross-sectional
  • 50. Recap: Example 4 ā€¢ Labor Market Data of Cornwell and Rupert (1988) consists of the following variables for 595 Individuals over 7 years: ā€¢ EXP =Work experience ā€¢ WKS =Weeks worked ā€¢ OCC =Occupation, 1 if blue collar ā€¢ IND =1 if manufacturing industry ā€¢ SOUTH =1 if resides in south ā€¢ SMSA =1 if resides in a city (SMSA) ā€¢ MS =1 if married ā€¢ FEM =1 if female ā€¢ UNION =1 if wage set by union contract ā€¢ ED =Years of education ā€¢ BLK =1 if individual is black ā€¢ LWAGE=Log of wage ā€¢ This is an example of a panel data set. ā€¢ What analysis can be made by using this data? ā€¢ e.g., Returns to Schooling
  • 51. Recap: Example 5 ā€¢ How many elements are in the data set? ā€¢ How many variables are in the data set? ā€¢ What type of variable is each variable in the data set (be sure to answer both qualitative or quantitative as well as nominal, ordinal, interval, or ratio). Define the quantitative variables as discrete or continuous. Grade Major GPA Credit Hours Sophomore Psychology 3.14 30 Senior Spanish 2.89 105 Senior Religion 3.01 99 Freshman Philosophy 2.45 12 Source: Huang, W. (2014). Lecture 25: Types of Data, Sampling. PowerPoint Presentation. Purdue University.
  • 52. ā€¦ ā€¢ 4 elements in total ā€¢ 4 variables in this data set. They are Grade, Major, Credit Hours, and GPA (grade point average) ā€¢ Grade: qualitative (ordinal); Major: qualitative (nominal); GPA: quantitative (interval, continuous); Credit hours: quantitative (ratio, discrete)