SlideShare a Scribd company logo
1 of 51
Week04 : Exploratory Data Analysis
ICT 706 Data Analytics
2
Learning Outcomes
• Understand key elements in exploratory data analysis (EDA)
• Explain and use common summary statistics for EDA
• Plot and explain common graphs and charts for EDA
3
Data Analytics
• What is Data Analytics?
• Study data to answer questions
• Why?
• More organizations are collecting more data than ever before
• Business, Government, Healthcare, Charity – everything!
• This data holds many insights into their operations and beyond
• Data Analytics is required to extract these insights
• Requires analytical, statistical and computational skills
4
To Answer Key Questions
• What movies (or books) customers would like to watch (or read)?
• What movies to order from studio and how many?
• Who are our best customers?
• When is there a flu epidemic in region in the country?
• Which customers are most likely not to have an accident?
• When a customer is likely to jump ship & go to a competitor?
• What is the one item you want to have in your store in case of a hurricane?
• What is the one thing that will improve a lawyer’s chance to win a case?
• What are some questions one can answer with a loyalty card?
• What is the number one reason for the success of cricket player?
• … …
5
Data Analytic Tasks
• Exploratory Data Analysis (EDA)  our focus this week!
• A preliminary exploration of the data to better understand its characteristics
• Descriptive Modelling
• Find human-interpretable patterns that describe the data. E.g.
• Association Rule Discovery
• Clustering
• Predictive Modelling
• Use some attributes to predict unknown or future values of other attributes.
E.g.
• Classification
• Regression
6
EDA
• If you are going to find out anything about a data set you must first
understand the data
• but most interesting data sets are too big to inspect manually
• EDA:
• Helps to select the right tool for preprocessing or analysis
• Makes use of humans’ abilities to recognize patterns
• Focuses on:
• Summary statistics
• Visualization
• Clustering and anomaly detection
7
1. Summary Statistics
• Getting an overall sense of the data set
• Numbers that summarize properties of the data
• Frequency
• frequency, mode, percentiles
• Location
• Min & Max, Mean & Median
• Spread
• Variance & Standard Deviation, Distribution
8
9
10
11
12
13
14
Credit card company
15
16
17
18
19
Mean or Median?
20
Mean or Median ??
21
Question: Mean or Median?
22
23
24
Frequency and Mode
• Frequency: the number of times the value occurs
• Relative frequency: the percentage that the value occurs
• Mode: the most frequent attribute value
• (3, 5, 3, 10, 4)
• Frequency and mode are typically used with categorical data
Category Apple Orange Melon Kiwi Plum Total
Frequency 10 15 8 10 7 50
Relative
Frequency
20% 30% 16% 20% 14% 100%
25
Percentiles
• For continuous data, a percentile is more useful.
• Given an ordinal or continuous attribute x, its pth percentile ([0, 100]) is
• a value xp such that p% of the observed values are less than xp.
• E.g. the 20th percentile is the value below which 20% of all values of the
observations may be found.
26
Quartiles
• Values that divide a list of numbers into quarters Q1, Q2, Q3
• Percentiles 25th,50th, 75th
• Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8
• Put the list of numbers in order
• Then cut the list into four equal parts
• If a quartile is half way, use the average
27
Quartile 1 (Q1) = 3
Quartile 2 (Q2) = 5.5
Quartile 3 (Q3) = 7
Mean and Median
• Measure of the locations of a set of points
• Mean
• average of all values
• very sensitive to outliers
• Median
• Q2 or 50th percentile: the value whose occurrence lies in the middle of a set of ordered
values
• median(1, 20, 25, 30, 31) = ?
• median(1, 2, 5, 92) = ?
𝑚𝑒𝑎𝑛 1,2,5,92 =
1 + 2 + 5 + 92
4
= 25
28
Extreme example
• Income in small town of 6 people
• $25,000 $27,000 $29,000
• $35,000 $37,000 $38,000
• Mean is $31,830 and median is $32,000
• Bill Gates moves to town
• $25,000 $27,000 $29,000
• $35,000 $37,000 $38,000 $40,000,000
• Mean is $5,741,571 median is $35,000
• Mean is pulled by the outlier while the median is not. The median is
a better of measure of center for data with extreme values.
• E.g. house prices, incomes, student exam marks, ...
29
Range and Variance
• Measures of data spread
• Range: the difference between the max and min
• (1, 2, 5, 70, 92): range = 92 – 1 = 91
• Variance: The average of the squared differences from the Mean
• Standard deviation: the square root of Variance
• Sensitive to outliers
• Inter-quartile range: Q3 – Q1
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 = 𝑠 𝑥
2
=
1
𝑛
𝑖=1
𝑛
(𝑥𝑖 − 𝑥)2
Value 2 3 3 4 Mean = 3
Diff 1 0 0 1
Diff2 1 0 0 1 Variance = 2/4 = 0.5
30
2. Visualization
• Convert data into a visual or tabular format for easier understanding
• One of the most powerful and appealing techniques for data
exploration
• Humans have a well developed ability to analyze large amounts of
information that is presented visually
• Can detect general patterns and trends
• Can detect outliers and unusual patterns
31
Visualization
• Data objects, their attributes, and the relationships are translated into
graphical elements such as points, lines, shapes, and colors
• Example:
• Objects are often represented as points
• Their attribute values can be represented as the position of the points or the
characteristics of the points, e.g., color, size, and shape
• If position is used, then the relationships of points, i.e., whether they form
groups or a point is an outlier, is easily perceived.
32
Visualization Example
• Sea Surface Temperature for July 1982
• Tens of thousands of data points are summarized in a single figure
33
EDA Strategy
• Examine variables one by one, then look at the relationships among
the different variables
• Start with graphs, then add numerical summaries of specific aspects
of the data
• Be aware of attribute types
• Categorical vs. Numeric
34
EDA: One Variable
• Summary statistics
• Categorical: frequency table
• Numeric: mean, median, standard deviation, range, etc.
• Graphical displays
• Categorical: bar chart, pie chart, etc.
• Numeric: histogram, boxplot, etc.
• Probability models
• Categorical: Binomial distribution (won’t cover)
• Numeric: Normal curve (won’t cover)
35
Example Scatter Chart: Age
0
10
20
30
40
50
60
70
80
90
Age(inyears),
84 = Maximum
2 = Minimum
28 = Mode (Occurs twice)
33 = Mean
36 = Median
Summary Statistics
Example Scatter Chart 2: Age
0
10
20
30
40
50
60
70
80
90
Age(inyears).
Example Scatter Chart 1: Age
0
10
20
30
40
50
60
70
80
90
Age(inyears),
SD = 7.6 SD = 20.4
36
Frequency Table
Class Frequency Relative Frequency
Highest Degree Obtained Number of CEOs Proportion
None 1 0.04
Bachelors 7 0.28
Masters 11 0.44
Doctorate / Law 6 0.24
Totals 25 1.00
37
Bar Graph
• The bar graph quickly compares the
degrees of the four groups
• The heights of the four bars show the
counts for the four degree categories
38
Pie Chart
• A pie chart helps us see what part of
the whole group forms
• To make a pie chart, you must
include all the categories that make
up a whole
• WARNING: pie charts are often a
poor choice – they show only relative
data. A bar chart is often better.
39
Histogram
• Split data range into equal-sized bins.
• Count the number of points in each bin.
40
The bins are:
3.0 ≤ rate < 4.0
4.0 ≤ rate < 5.0
5.0 ≤ rate < 6.0
6.0 ≤ rate < 7.0
7.0 ≤ rate < 8.0
8.0 ≤ rate < 9.0
9.0 ≤ rate < 10.0
10.0 ≤ rate < 11.0
11.0 ≤ rate < 12.0
12.0 ≤ rate < 13.0
13.0 ≤ rate < 14.0
14.0 ≤ rate < 15.0
Histograms The bins are:
3.0 ≤ rate < 4.0
4.0 ≤ rate < 5.0
5.0 ≤ rate < 6.0
6.0 ≤ rate < 7.0
7.0 ≤ rate < 8.0
8.0 ≤ rate < 9.0
9.0 ≤ rate < 10.0
10.0 ≤ rate < 11.0
11.0 ≤ rate < 12.0
12.0 ≤ rate < 13.0
13.0 ≤ rate < 14.0
14.0 ≤ rate < 15.0
41
Histograms
The bins are:
2.0 ≤ rate < 4.0
4.0 ≤ rate < 6.0
6.0 ≤ rate < 8.0
8.0 ≤ rate < 10.0
10.0 ≤ rate < 12.0
12.0 ≤ rate < 14.0
14.0 ≤ rate < 16.0
16.0 ≤ rate < 18.0
42
Histograms
• Where did the bins come from?
• They were chosen rather arbitrarily
• Does choosing other bins change the picture?
• Yes!! And sometimes dramatically
• What do we do about this?
• Some pretty smart people have come up with some “optimal” bin widths
• we will rely on their suggestions
43
Frequency Distribution Example 2
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11
Age (years)
Frequency
Distribution for variable AGE
Histogram
• The histogram can illustrate features related to
the distribution of the data, e.g.,
• Shape
• Symmetric, skewed right, skewed left, bell shaped
• presence of outliers
• presence of multiple modes
• "a camel with two humps" is a
common grade distribution for
ICT courses!
44
Box Plots
• Box Plots
• Show data distribution
• A graph of five numbers (often called the
five number summary)
• Minimum
• 1st quartile
• Median
• 3rd quartile
• Maximum
• Various definitions for min-max
• Min & Max of all data
• 9th and 91st percentile
• 2nd and 98th percentile
• 1.5 Inter-quartile range
outlier
9th percentile
25th percentile
75th percentile
50th percentile
91st percentile
45
EDA: Two Variables
• Looking at TWO variables at once helps us to explore relationships
between different attributes.
• sometimes we can see clear relationships and clusters
• next week we will look at clustering in more detail
• this week we focus on exploratory analysis of two variables
• Three combinations of variables
• 2 categorical variables
• Contingency tables
• 1 categorical and 1 numeric variable
• Side-by-side box plots, counts, etc.
• 2 numeric variables
• Scatter plots, correlations, regressions
46
Contingency Tables
• Show the frequency distribution of one variable in rows and another in
columns
• E.g. a study of sex differences in handedness with 100 samples
• Sex (male, female) and Handedness (right, left)
47
Marginal totals
Grand total
Side-by-side box plots
• Side-by-side box plots are graphical summaries of data when one
variable is categorical and the other numeric
• Used to compare the distributions associated with the numeric variable
across the levels of the categorical variable
48
Scatter Plots
• One variable on x-axis and another on y-axis
• Attributes values determine the position
• Arrays of scatter plots can compactly summarize the relationships of pairs of
attributes
49
Describing scatter plots
• Form
• Linear, quadratic, exponential
• Direction
• Positive association
• An increase in one variable is accompanied by an
increase in the other
• Negatively associated
• A decrease in one variable is accompanied by an
increase in the other
• Strength
• How closely the points follow a clear form
50
• Form: Linear
• Direction: Positive
• Strength: Strong
Course Reflection (in census week)
• How is your learning journey going ?
• Your feedback
• Areas of improvement
51

More Related Content

What's hot

Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyChie Pegollo
 
Data collection and presentation
Data collection and presentationData collection and presentation
Data collection and presentationferdaus44
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisYabebal Ayalew
 
Categorical and Numerical Variables
Categorical and Numerical VariablesCategorical and Numerical Variables
Categorical and Numerical VariablesTanirikaGodiyal
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Data Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfData Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfThanavathi C
 
Basic business statistics 2
Basic business statistics 2Basic business statistics 2
Basic business statistics 2Anwar Afridi
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAndi Koentary
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsBurak Mızrak
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
 
Survey and correlational research (1)
Survey and correlational research (1)Survey and correlational research (1)
Survey and correlational research (1)zuraiberahim
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 

What's hot (20)

Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Data collection and presentation
Data collection and presentationData collection and presentation
Data collection and presentation
 
Bivariate data
Bivariate dataBivariate data
Bivariate data
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Categorical and Numerical Variables
Categorical and Numerical VariablesCategorical and Numerical Variables
Categorical and Numerical Variables
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfData Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdf
 
Basic business statistics 2
Basic business statistics 2Basic business statistics 2
Basic business statistics 2
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
Survey and correlational research (1)
Survey and correlational research (1)Survey and correlational research (1)
Survey and correlational research (1)
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Statistics an introduction (1)
Statistics  an introduction (1)Statistics  an introduction (1)
Statistics an introduction (1)
 
DATA ANALYSIS
DATA ANALYSISDATA ANALYSIS
DATA ANALYSIS
 

Similar to Exploratory Data Analysis week 4

Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptxgina458018
 
Lecture 4 - Charts and graphs.pptx
Lecture 4 - Charts and graphs.pptxLecture 4 - Charts and graphs.pptx
Lecture 4 - Charts and graphs.pptxaazmeerrahman
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAmira Talic
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingTony Nguyen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptxjeyanthisivakumar
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture PackShaun Cochrane
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analyticsdeepika4721
 
datamining_LectureTwo(Data Pipeline) 2.pdf
datamining_LectureTwo(Data Pipeline) 2.pdfdatamining_LectureTwo(Data Pipeline) 2.pdf
datamining_LectureTwo(Data Pipeline) 2.pdfssuser3e6464
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
T7 data analysis
T7 data analysisT7 data analysis
T7 data analysiskompellark
 
OER Descriptive Statistics (University of Edinburgh)
OER Descriptive Statistics (University of Edinburgh)OER Descriptive Statistics (University of Edinburgh)
OER Descriptive Statistics (University of Edinburgh)LeonardHo7
 

Similar to Exploratory Data Analysis week 4 (20)

Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Lecture 4 - Charts and graphs.pptx
Lecture 4 - Charts and graphs.pptxLecture 4 - Charts and graphs.pptx
Lecture 4 - Charts and graphs.pptx
 
Eda sri
Eda sriEda sri
Eda sri
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
EDA.ppt
EDA.pptEDA.ppt
EDA.ppt
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
2. basic statistics
2. basic statistics2. basic statistics
2. basic statistics
 
IV STATISTICS I.pdf
IV STATISTICS I.pdfIV STATISTICS I.pdf
IV STATISTICS I.pdf
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analytics
 
datamining_LectureTwo(Data Pipeline) 2.pdf
datamining_LectureTwo(Data Pipeline) 2.pdfdatamining_LectureTwo(Data Pipeline) 2.pdf
datamining_LectureTwo(Data Pipeline) 2.pdf
 
Intro to Statistics.pptx
Intro to Statistics.pptxIntro to Statistics.pptx
Intro to Statistics.pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
T7 data analysis
T7 data analysisT7 data analysis
T7 data analysis
 
OER Descriptive Statistics (University of Edinburgh)
OER Descriptive Statistics (University of Edinburgh)OER Descriptive Statistics (University of Edinburgh)
OER Descriptive Statistics (University of Edinburgh)
 

More from Manzur Ashraf

2022-03-22 15.25.49.pdf
2022-03-22 15.25.49.pdf2022-03-22 15.25.49.pdf
2022-03-22 15.25.49.pdfManzur Ashraf
 
Dhumlin upohar ramadan
Dhumlin upohar ramadanDhumlin upohar ramadan
Dhumlin upohar ramadanManzur Ashraf
 
Java related basic tutorial
Java related basic tutorialJava related basic tutorial
Java related basic tutorialManzur Ashraf
 
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?Manzur Ashraf
 
Can Science Come Back to Islam? from New Scientist 23 Oct 1980
Can Science Come Back to Islam?  from New Scientist 23 Oct 1980Can Science Come Back to Islam?  from New Scientist 23 Oct 1980
Can Science Come Back to Islam? from New Scientist 23 Oct 1980Manzur Ashraf
 
Kakadu report-web-optimization
Kakadu report-web-optimizationKakadu report-web-optimization
Kakadu report-web-optimizationManzur Ashraf
 
Speed and Performance Optimization Technique of a website
Speed and Performance Optimization Technique of a websiteSpeed and Performance Optimization Technique of a website
Speed and Performance Optimization Technique of a websiteManzur Ashraf
 
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015Manzur Ashraf
 
website status and seo assessment altec 2015
website status and seo assessment altec 2015website status and seo assessment altec 2015
website status and seo assessment altec 2015Manzur Ashraf
 
Hajj: an experience and a guide from Dr Manzur Ashraf
Hajj: an experience and a guide from Dr Manzur AshrafHajj: an experience and a guide from Dr Manzur Ashraf
Hajj: an experience and a guide from Dr Manzur AshrafManzur Ashraf
 
Selective Duas from the book muslim-devotion
Selective Duas from the book muslim-devotionSelective Duas from the book muslim-devotion
Selective Duas from the book muslim-devotionManzur Ashraf
 
SEO Marketing / Competitive Analysis Report (sample)
  SEO Marketing / Competitive Analysis Report (sample)  SEO Marketing / Competitive Analysis Report (sample)
SEO Marketing / Competitive Analysis Report (sample)Manzur Ashraf
 
Characteristics of a Muslim (Bangla)
Characteristics of a Muslim (Bangla)Characteristics of a Muslim (Bangla)
Characteristics of a Muslim (Bangla)Manzur Ashraf
 
Bani israel 14_principles
Bani israel 14_principlesBani israel 14_principles
Bani israel 14_principlesManzur Ashraf
 
Seo basis-seminar-2015
Seo basis-seminar-2015Seo basis-seminar-2015
Seo basis-seminar-2015Manzur Ashraf
 

More from Manzur Ashraf (18)

2022-03-22 15.25.49.pdf
2022-03-22 15.25.49.pdf2022-03-22 15.25.49.pdf
2022-03-22 15.25.49.pdf
 
Dhumlin upohar ramadan
Dhumlin upohar ramadanDhumlin upohar ramadan
Dhumlin upohar ramadan
 
Java related basic tutorial
Java related basic tutorialJava related basic tutorial
Java related basic tutorial
 
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?
‘WHAT ARE JAMAAYA, LOYALTY TO LEADERS & THEIR LINK TO RELIGIOUS EXTREMISM?
 
Can Science Come Back to Islam? from New Scientist 23 Oct 1980
Can Science Come Back to Islam?  from New Scientist 23 Oct 1980Can Science Come Back to Islam?  from New Scientist 23 Oct 1980
Can Science Come Back to Islam? from New Scientist 23 Oct 1980
 
Kakadu report-web-optimization
Kakadu report-web-optimizationKakadu report-web-optimization
Kakadu report-web-optimization
 
Speed and Performance Optimization Technique of a website
Speed and Performance Optimization Technique of a websiteSpeed and Performance Optimization Technique of a website
Speed and Performance Optimization Technique of a website
 
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015
SEO for Businesses - BUET ALUMNI Melbourne Chapter Talk Nov 2015
 
website status and seo assessment altec 2015
website status and seo assessment altec 2015website status and seo assessment altec 2015
website status and seo assessment altec 2015
 
Hajj: an experience and a guide from Dr Manzur Ashraf
Hajj: an experience and a guide from Dr Manzur AshrafHajj: an experience and a guide from Dr Manzur Ashraf
Hajj: an experience and a guide from Dr Manzur Ashraf
 
Selective Duas from the book muslim-devotion
Selective Duas from the book muslim-devotionSelective Duas from the book muslim-devotion
Selective Duas from the book muslim-devotion
 
SEO Marketing / Competitive Analysis Report (sample)
  SEO Marketing / Competitive Analysis Report (sample)  SEO Marketing / Competitive Analysis Report (sample)
SEO Marketing / Competitive Analysis Report (sample)
 
Cclc manzur
Cclc manzurCclc manzur
Cclc manzur
 
Characteristics of a Muslim (Bangla)
Characteristics of a Muslim (Bangla)Characteristics of a Muslim (Bangla)
Characteristics of a Muslim (Bangla)
 
Sura Mulk
Sura MulkSura Mulk
Sura Mulk
 
Bani israel 14_principles
Bani israel 14_principlesBani israel 14_principles
Bani israel 14_principles
 
Sura kahaf-2015
Sura kahaf-2015Sura kahaf-2015
Sura kahaf-2015
 
Seo basis-seminar-2015
Seo basis-seminar-2015Seo basis-seminar-2015
Seo basis-seminar-2015
 

Recently uploaded

原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...Amil Baba Dawood bangali
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...First NO1 World Amil baba in Faisalabad
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)twfkn8xj
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...amilabibi1
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companiesprashantbhati354
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...yordanosyohannes2
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdfmar yame
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 

Recently uploaded (20)

原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
 
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)
 
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
Amil Baba In Pakistan amil baba in Lahore amil baba in Islamabad amil baba in...
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
Quantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector CompaniesQuantitative Analysis of Retail Sector Companies
Quantitative Analysis of Retail Sector Companies
 
Q1 2024 Newsletter | Financial Synergies Wealth Advisors
Q1 2024 Newsletter | Financial Synergies Wealth AdvisorsQ1 2024 Newsletter | Financial Synergies Wealth Advisors
Q1 2024 Newsletter | Financial Synergies Wealth Advisors
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdf
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 

Exploratory Data Analysis week 4

  • 1. Week04 : Exploratory Data Analysis ICT 706 Data Analytics
  • 2. 2
  • 3. Learning Outcomes • Understand key elements in exploratory data analysis (EDA) • Explain and use common summary statistics for EDA • Plot and explain common graphs and charts for EDA 3
  • 4. Data Analytics • What is Data Analytics? • Study data to answer questions • Why? • More organizations are collecting more data than ever before • Business, Government, Healthcare, Charity – everything! • This data holds many insights into their operations and beyond • Data Analytics is required to extract these insights • Requires analytical, statistical and computational skills 4
  • 5. To Answer Key Questions • What movies (or books) customers would like to watch (or read)? • What movies to order from studio and how many? • Who are our best customers? • When is there a flu epidemic in region in the country? • Which customers are most likely not to have an accident? • When a customer is likely to jump ship & go to a competitor? • What is the one item you want to have in your store in case of a hurricane? • What is the one thing that will improve a lawyer’s chance to win a case? • What are some questions one can answer with a loyalty card? • What is the number one reason for the success of cricket player? • … … 5
  • 6. Data Analytic Tasks • Exploratory Data Analysis (EDA)  our focus this week! • A preliminary exploration of the data to better understand its characteristics • Descriptive Modelling • Find human-interpretable patterns that describe the data. E.g. • Association Rule Discovery • Clustering • Predictive Modelling • Use some attributes to predict unknown or future values of other attributes. E.g. • Classification • Regression 6
  • 7. EDA • If you are going to find out anything about a data set you must first understand the data • but most interesting data sets are too big to inspect manually • EDA: • Helps to select the right tool for preprocessing or analysis • Makes use of humans’ abilities to recognize patterns • Focuses on: • Summary statistics • Visualization • Clustering and anomaly detection 7
  • 8. 1. Summary Statistics • Getting an overall sense of the data set • Numbers that summarize properties of the data • Frequency • frequency, mode, percentiles • Location • Min & Max, Mean & Median • Spread • Variance & Standard Deviation, Distribution 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 21. Mean or Median ?? 21
  • 22. Question: Mean or Median? 22
  • 23. 23
  • 24. 24
  • 25. Frequency and Mode • Frequency: the number of times the value occurs • Relative frequency: the percentage that the value occurs • Mode: the most frequent attribute value • (3, 5, 3, 10, 4) • Frequency and mode are typically used with categorical data Category Apple Orange Melon Kiwi Plum Total Frequency 10 15 8 10 7 50 Relative Frequency 20% 30% 16% 20% 14% 100% 25
  • 26. Percentiles • For continuous data, a percentile is more useful. • Given an ordinal or continuous attribute x, its pth percentile ([0, 100]) is • a value xp such that p% of the observed values are less than xp. • E.g. the 20th percentile is the value below which 20% of all values of the observations may be found. 26
  • 27. Quartiles • Values that divide a list of numbers into quarters Q1, Q2, Q3 • Percentiles 25th,50th, 75th • Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8 • Put the list of numbers in order • Then cut the list into four equal parts • If a quartile is half way, use the average 27 Quartile 1 (Q1) = 3 Quartile 2 (Q2) = 5.5 Quartile 3 (Q3) = 7
  • 28. Mean and Median • Measure of the locations of a set of points • Mean • average of all values • very sensitive to outliers • Median • Q2 or 50th percentile: the value whose occurrence lies in the middle of a set of ordered values • median(1, 20, 25, 30, 31) = ? • median(1, 2, 5, 92) = ? 𝑚𝑒𝑎𝑛 1,2,5,92 = 1 + 2 + 5 + 92 4 = 25 28
  • 29. Extreme example • Income in small town of 6 people • $25,000 $27,000 $29,000 • $35,000 $37,000 $38,000 • Mean is $31,830 and median is $32,000 • Bill Gates moves to town • $25,000 $27,000 $29,000 • $35,000 $37,000 $38,000 $40,000,000 • Mean is $5,741,571 median is $35,000 • Mean is pulled by the outlier while the median is not. The median is a better of measure of center for data with extreme values. • E.g. house prices, incomes, student exam marks, ... 29
  • 30. Range and Variance • Measures of data spread • Range: the difference between the max and min • (1, 2, 5, 70, 92): range = 92 – 1 = 91 • Variance: The average of the squared differences from the Mean • Standard deviation: the square root of Variance • Sensitive to outliers • Inter-quartile range: Q3 – Q1 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 = 𝑠 𝑥 2 = 1 𝑛 𝑖=1 𝑛 (𝑥𝑖 − 𝑥)2 Value 2 3 3 4 Mean = 3 Diff 1 0 0 1 Diff2 1 0 0 1 Variance = 2/4 = 0.5 30
  • 31. 2. Visualization • Convert data into a visual or tabular format for easier understanding • One of the most powerful and appealing techniques for data exploration • Humans have a well developed ability to analyze large amounts of information that is presented visually • Can detect general patterns and trends • Can detect outliers and unusual patterns 31
  • 32. Visualization • Data objects, their attributes, and the relationships are translated into graphical elements such as points, lines, shapes, and colors • Example: • Objects are often represented as points • Their attribute values can be represented as the position of the points or the characteristics of the points, e.g., color, size, and shape • If position is used, then the relationships of points, i.e., whether they form groups or a point is an outlier, is easily perceived. 32
  • 33. Visualization Example • Sea Surface Temperature for July 1982 • Tens of thousands of data points are summarized in a single figure 33
  • 34. EDA Strategy • Examine variables one by one, then look at the relationships among the different variables • Start with graphs, then add numerical summaries of specific aspects of the data • Be aware of attribute types • Categorical vs. Numeric 34
  • 35. EDA: One Variable • Summary statistics • Categorical: frequency table • Numeric: mean, median, standard deviation, range, etc. • Graphical displays • Categorical: bar chart, pie chart, etc. • Numeric: histogram, boxplot, etc. • Probability models • Categorical: Binomial distribution (won’t cover) • Numeric: Normal curve (won’t cover) 35
  • 36. Example Scatter Chart: Age 0 10 20 30 40 50 60 70 80 90 Age(inyears), 84 = Maximum 2 = Minimum 28 = Mode (Occurs twice) 33 = Mean 36 = Median Summary Statistics Example Scatter Chart 2: Age 0 10 20 30 40 50 60 70 80 90 Age(inyears). Example Scatter Chart 1: Age 0 10 20 30 40 50 60 70 80 90 Age(inyears), SD = 7.6 SD = 20.4 36
  • 37. Frequency Table Class Frequency Relative Frequency Highest Degree Obtained Number of CEOs Proportion None 1 0.04 Bachelors 7 0.28 Masters 11 0.44 Doctorate / Law 6 0.24 Totals 25 1.00 37
  • 38. Bar Graph • The bar graph quickly compares the degrees of the four groups • The heights of the four bars show the counts for the four degree categories 38
  • 39. Pie Chart • A pie chart helps us see what part of the whole group forms • To make a pie chart, you must include all the categories that make up a whole • WARNING: pie charts are often a poor choice – they show only relative data. A bar chart is often better. 39
  • 40. Histogram • Split data range into equal-sized bins. • Count the number of points in each bin. 40 The bins are: 3.0 ≤ rate < 4.0 4.0 ≤ rate < 5.0 5.0 ≤ rate < 6.0 6.0 ≤ rate < 7.0 7.0 ≤ rate < 8.0 8.0 ≤ rate < 9.0 9.0 ≤ rate < 10.0 10.0 ≤ rate < 11.0 11.0 ≤ rate < 12.0 12.0 ≤ rate < 13.0 13.0 ≤ rate < 14.0 14.0 ≤ rate < 15.0
  • 41. Histograms The bins are: 3.0 ≤ rate < 4.0 4.0 ≤ rate < 5.0 5.0 ≤ rate < 6.0 6.0 ≤ rate < 7.0 7.0 ≤ rate < 8.0 8.0 ≤ rate < 9.0 9.0 ≤ rate < 10.0 10.0 ≤ rate < 11.0 11.0 ≤ rate < 12.0 12.0 ≤ rate < 13.0 13.0 ≤ rate < 14.0 14.0 ≤ rate < 15.0 41
  • 42. Histograms The bins are: 2.0 ≤ rate < 4.0 4.0 ≤ rate < 6.0 6.0 ≤ rate < 8.0 8.0 ≤ rate < 10.0 10.0 ≤ rate < 12.0 12.0 ≤ rate < 14.0 14.0 ≤ rate < 16.0 16.0 ≤ rate < 18.0 42
  • 43. Histograms • Where did the bins come from? • They were chosen rather arbitrarily • Does choosing other bins change the picture? • Yes!! And sometimes dramatically • What do we do about this? • Some pretty smart people have come up with some “optimal” bin widths • we will rely on their suggestions 43
  • 44. Frequency Distribution Example 2 0 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11 Age (years) Frequency Distribution for variable AGE Histogram • The histogram can illustrate features related to the distribution of the data, e.g., • Shape • Symmetric, skewed right, skewed left, bell shaped • presence of outliers • presence of multiple modes • "a camel with two humps" is a common grade distribution for ICT courses! 44
  • 45. Box Plots • Box Plots • Show data distribution • A graph of five numbers (often called the five number summary) • Minimum • 1st quartile • Median • 3rd quartile • Maximum • Various definitions for min-max • Min & Max of all data • 9th and 91st percentile • 2nd and 98th percentile • 1.5 Inter-quartile range outlier 9th percentile 25th percentile 75th percentile 50th percentile 91st percentile 45
  • 46. EDA: Two Variables • Looking at TWO variables at once helps us to explore relationships between different attributes. • sometimes we can see clear relationships and clusters • next week we will look at clustering in more detail • this week we focus on exploratory analysis of two variables • Three combinations of variables • 2 categorical variables • Contingency tables • 1 categorical and 1 numeric variable • Side-by-side box plots, counts, etc. • 2 numeric variables • Scatter plots, correlations, regressions 46
  • 47. Contingency Tables • Show the frequency distribution of one variable in rows and another in columns • E.g. a study of sex differences in handedness with 100 samples • Sex (male, female) and Handedness (right, left) 47 Marginal totals Grand total
  • 48. Side-by-side box plots • Side-by-side box plots are graphical summaries of data when one variable is categorical and the other numeric • Used to compare the distributions associated with the numeric variable across the levels of the categorical variable 48
  • 49. Scatter Plots • One variable on x-axis and another on y-axis • Attributes values determine the position • Arrays of scatter plots can compactly summarize the relationships of pairs of attributes 49
  • 50. Describing scatter plots • Form • Linear, quadratic, exponential • Direction • Positive association • An increase in one variable is accompanied by an increase in the other • Negatively associated • A decrease in one variable is accompanied by an increase in the other • Strength • How closely the points follow a clear form 50 • Form: Linear • Direction: Positive • Strength: Strong
  • 51. Course Reflection (in census week) • How is your learning journey going ? • Your feedback • Areas of improvement 51