SlideShare a Scribd company logo
0
AI/ML APPLICATIONS
Exploratory Data Analysis
Prof. Rahul Borate
Table of Contents
• Understand the ML best practice and project roadmap
• Identify the data source(s) and Data Collection
• Machine Learning process
• Exploratory Data Analysis(EDA)
1
Understand the ML best practice and project roadmap
• When a customer wants to
implement ML(Machine
Learning) for the identified
business problem(s) after
multiple discussions along
with the following
stakeholders from both sides
– Business, Architect,
Infrastructure, Operations,
and others.
2
Identify the data source(s) and Data Collection
• Organization’s key
application(s) – it would be
Internal or External
application or web-sites
• It would be streaming data
from the web
(Twitter/Facebook – any
Social media)
• Once you’re comfortable
with the available data, you
can start work on the rest of
the Machine Learning
process model.
3
Machine Learning process
4
Machine Learning process
In the data preparation, EDA gets most of the effort and unavoidable
steps
5
What is Exploratory Data
Analysis
6
EDA is an approach for data analysis using variety
of techniques to gain insights about the data.
• Cleaning and preprocessing
• Statistical Analysis
• Visualization for trend analysis,
anomaly detection, outlier
detection (and removal).
Basic steps in any
exploratory data
analysis:
Importance of EDA
Understanding the given dataset and helps clean up the given dataset.
It gives you a clear picture of the features and the relationships between them.
Discover errors, outliers, and missing values in the data.
Identify patterns by visualizing data in graphs such as bar graphs, scatter plots,
heatmaps and histograms.
7
EDA using Pandas
Import data into workplace(Jupyter notebook, Google colab, Python IDE)
Descriptive statistics
Removal of nulls
Visualization
8
1. Packages and data import
• Step 1 : Import pandas to the workplace.
• “Import pandas”
• Step 2 : Read data/dataset into Pandas dataframe. Different input
formats include:
• Excel : read_excel
• CSV: read_csv
• JSON: read_json
• HTML and many more
9
2. Descriptive
Stats (Pandas)
• Used to make preliminary assessments about the population
distribution of the variable.
• Commonly used statistics:
1. Central tendency :
• Mean – The average value of all the data points. :
dataframe.mean()
• Median – The middle value when all the data points are
put in an ordered list: dataframe.median()
• Mode – The data point which occurs the most in the
dataset :dataframe.mode()
2. Spread : It is the measure of how far the datapoints are away
from the mean or median
• Variance - The variance is the mean of the squares of the
individual deviations: dataframe.var()
• Standard deviation - The standard deviation is the square
root of the variance:dataframe.std()
3. Skewness: It is a measure of asymmetry: dataframe.skew()
Descriptive
Stats (contd.)
Other methods to get a quick look on the data:
• Describe() : Summarizes the central tendency,
dispersion and shape of a dataset’s distribution,
excluding NaN values.
• Syntax: pandas.dataframe.describe()
• Info() :Prints a concise summary of the
dataframe. This method prints information
about a dataframe including the index dtype
and columns, non-null values and memory
usage.
• Syntax: pandas.dataframe.info()
3. Null values
12
Detecting
Detecting Null-
values:
•Isnull(): It is used as an
alias for dataframe.isna().
This function returns the
dataframe with boolean
values indicating missing
values.
•Syntax :
dataframe.isnull()
Handling
Handling null values:
•Dropping the rows with
null values: dropna()
function is used to delete
rows or columns with
null values.
•Replacing missing values:
fillna() function can fill
the missing values with a
special value value like
mean or median.
4. Visualization
• Univariate: Looking at one variable/column at a time
• Bar-graph
• Histograms
• Boxplot
• Multivariate : Looking at relationship between two or more variables
• Scatter plots
• Pie plots
• Heatmaps(seaborn)
13
Bar-Graph,Histogram
and Boxplot
• Bar graph: A bar plot is a plot that presents
data with rectangular bars with lengths
proportional to the values that they represent.
• Boxplot : Depicts numerical data graphically
through their quartiles. The box extends from
the Q1 to Q3 quartile values of the data, with
a line at the median (Q2).
• Histogram: A histogram is a representation of
the distribution of data.
14
Scatterplot, Pieplot
• Scatterplot : Shows the data as a collection of points.
• Syntax: dataframe.plot.scatter(x = 'x_column_name', y = 'y_columnn_name’)
• Pie plot : Proportional representation of the numerical data in a column.
• Syntax: dataframe.plot.pie(y=‘column_name’)
15
Outlier detection
• An outlier is a point or set of data points that lie away from the rest of
the data values of the dataset..
• Outliers are easily identified by visualizing the data.
• For e.g.
• In a boxplot, the data points which lie outside the upper and lower bound can
be considered as outliers
• In a scatterplot, the data points which lie outside the groups of datapoints can
be considered as outliers
16
Outlier removal
• Calculate the IQR as follows:
Calculate the first and third quartile (Q1 and Q3)
Calculate the interquartile range, IQR = Q3-Q1
Find the lower bound which is Q1*1.5
Find the upper bound which is Q3*1.5
Replace the data points which lie outside this range.
They can be replaced by mean or median.
17
Hope now you have some idea, let’s implement all these using
the Automobile – Predictive Analysis dataset.
18
Hands on Demonstration

More Related Content

Similar to EDA.pptx

1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
Ashok280385
 
EDA
EDAEDA
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
Maloy Manna, PMP®
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
derbew2112
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
DFDs_and_Algorithms.pptx
DFDs_and_Algorithms.pptxDFDs_and_Algorithms.pptx
DFDs_and_Algorithms.pptx
AliyahAli19
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
Dr.Florence Dayana
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
Jen Stirrup
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
ArchanaArya17
 
Exploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptxExploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptx
ZahidRiazHaans
 
ANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptxANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptx
UtkarshKumar608655
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.ppt
AlliVinay1
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptxII B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
sabithabanu83
 
Data visualization
Data visualizationData visualization
Data visualization
Moushmi Dasgupta
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
Minakshee Patil
 
Excel and research
Excel and researchExcel and research
Excel and researchNursing Path
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
RAJVEERKUMAR41
 

Similar to EDA.pptx (20)

1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
 
EDA
EDAEDA
EDA
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
DFDs_and_Algorithms.pptx
DFDs_and_Algorithms.pptxDFDs_and_Algorithms.pptx
DFDs_and_Algorithms.pptx
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
 
SQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and StatisticsSQLBits Module 2 RStats Introduction to R and Statistics
SQLBits Module 2 RStats Introduction to R and Statistics
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
Exploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptxExploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptx
 
ANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptxANALYSIS OF DATA (2).pptx
ANALYSIS OF DATA (2).pptx
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.ppt
 
Data science guide
Data science guideData science guide
Data science guide
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
II B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptxII B.Sc IT DATA STRUCTURES.pptx
II B.Sc IT DATA STRUCTURES.pptx
 
Data visualization
Data visualizationData visualization
Data visualization
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Excel and research
Excel and researchExcel and research
Excel and research
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Data Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdfData Science and Machine learning-Lect01.pdf
Data Science and Machine learning-Lect01.pdf
 

More from Rahul Borate

PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptx
Rahul Borate
 
Unit 4_Introduction to Server Farms.pptx
Unit 4_Introduction to Server Farms.pptxUnit 4_Introduction to Server Farms.pptx
Unit 4_Introduction to Server Farms.pptx
Rahul Borate
 
Unit 3_Data Center Design in storage.pptx
Unit  3_Data Center Design in storage.pptxUnit  3_Data Center Design in storage.pptx
Unit 3_Data Center Design in storage.pptx
Rahul Borate
 
Fundamentals of storage Unit III Backup and Recovery.ppt
Fundamentals of storage Unit III Backup and Recovery.pptFundamentals of storage Unit III Backup and Recovery.ppt
Fundamentals of storage Unit III Backup and Recovery.ppt
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Confusion Matrix.pptx
Confusion Matrix.pptxConfusion Matrix.pptx
Confusion Matrix.pptx
Rahul Borate
 
Unit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.pptUnit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.ppt
Rahul Borate
 
Unit I Fundamentals of Cloud Computing.pptx
Unit I Fundamentals of Cloud Computing.pptxUnit I Fundamentals of Cloud Computing.pptx
Unit I Fundamentals of Cloud Computing.pptx
Rahul Borate
 
Unit II Cloud Delivery Models.pptx
Unit II Cloud Delivery Models.pptxUnit II Cloud Delivery Models.pptx
Unit II Cloud Delivery Models.pptx
Rahul Borate
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
Rahul Borate
 
Module III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptxModule III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptx
Rahul Borate
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
2.2 Logit and Probit.pptx
2.2 Logit and Probit.pptx2.2 Logit and Probit.pptx
2.2 Logit and Probit.pptx
Rahul Borate
 
UNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptxUNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Practice_Exercises_Files_and_Exceptions.pptx
Practice_Exercises_Files_and_Exceptions.pptxPractice_Exercises_Files_and_Exceptions.pptx
Practice_Exercises_Files_and_Exceptions.pptx
Rahul Borate
 
Practice_Exercises_Data_Structures.pptx
Practice_Exercises_Data_Structures.pptxPractice_Exercises_Data_Structures.pptx
Practice_Exercises_Data_Structures.pptx
Rahul Borate
 
Practice_Exercises_Control_Flow.pptx
Practice_Exercises_Control_Flow.pptxPractice_Exercises_Control_Flow.pptx
Practice_Exercises_Control_Flow.pptx
Rahul Borate
 
blog creation.pdf
blog creation.pdfblog creation.pdf
blog creation.pdf
Rahul Borate
 
Chapter I.pptx
Chapter I.pptxChapter I.pptx
Chapter I.pptx
Rahul Borate
 

More from Rahul Borate (20)

PigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptxPigHive presentation and hive impor.pptx
PigHive presentation and hive impor.pptx
 
Unit 4_Introduction to Server Farms.pptx
Unit 4_Introduction to Server Farms.pptxUnit 4_Introduction to Server Farms.pptx
Unit 4_Introduction to Server Farms.pptx
 
Unit 3_Data Center Design in storage.pptx
Unit  3_Data Center Design in storage.pptxUnit  3_Data Center Design in storage.pptx
Unit 3_Data Center Design in storage.pptx
 
Fundamentals of storage Unit III Backup and Recovery.ppt
Fundamentals of storage Unit III Backup and Recovery.pptFundamentals of storage Unit III Backup and Recovery.ppt
Fundamentals of storage Unit III Backup and Recovery.ppt
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Confusion Matrix.pptx
Confusion Matrix.pptxConfusion Matrix.pptx
Confusion Matrix.pptx
 
Unit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.pptUnit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.ppt
 
Unit I Fundamentals of Cloud Computing.pptx
Unit I Fundamentals of Cloud Computing.pptxUnit I Fundamentals of Cloud Computing.pptx
Unit I Fundamentals of Cloud Computing.pptx
 
Unit II Cloud Delivery Models.pptx
Unit II Cloud Delivery Models.pptxUnit II Cloud Delivery Models.pptx
Unit II Cloud Delivery Models.pptx
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 
Module III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptxModule III MachineLearningSparkML.pptx
Module III MachineLearningSparkML.pptx
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
2.2 Logit and Probit.pptx
2.2 Logit and Probit.pptx2.2 Logit and Probit.pptx
2.2 Logit and Probit.pptx
 
UNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptxUNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Practice_Exercises_Files_and_Exceptions.pptx
Practice_Exercises_Files_and_Exceptions.pptxPractice_Exercises_Files_and_Exceptions.pptx
Practice_Exercises_Files_and_Exceptions.pptx
 
Practice_Exercises_Data_Structures.pptx
Practice_Exercises_Data_Structures.pptxPractice_Exercises_Data_Structures.pptx
Practice_Exercises_Data_Structures.pptx
 
Practice_Exercises_Control_Flow.pptx
Practice_Exercises_Control_Flow.pptxPractice_Exercises_Control_Flow.pptx
Practice_Exercises_Control_Flow.pptx
 
blog creation.pdf
blog creation.pdfblog creation.pdf
blog creation.pdf
 
Chapter I.pptx
Chapter I.pptxChapter I.pptx
Chapter I.pptx
 

Recently uploaded

Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 

Recently uploaded (20)

Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 

EDA.pptx

  • 1. 0 AI/ML APPLICATIONS Exploratory Data Analysis Prof. Rahul Borate
  • 2. Table of Contents • Understand the ML best practice and project roadmap • Identify the data source(s) and Data Collection • Machine Learning process • Exploratory Data Analysis(EDA) 1
  • 3. Understand the ML best practice and project roadmap • When a customer wants to implement ML(Machine Learning) for the identified business problem(s) after multiple discussions along with the following stakeholders from both sides – Business, Architect, Infrastructure, Operations, and others. 2
  • 4. Identify the data source(s) and Data Collection • Organization’s key application(s) – it would be Internal or External application or web-sites • It would be streaming data from the web (Twitter/Facebook – any Social media) • Once you’re comfortable with the available data, you can start work on the rest of the Machine Learning process model. 3
  • 6. Machine Learning process In the data preparation, EDA gets most of the effort and unavoidable steps 5
  • 7. What is Exploratory Data Analysis 6 EDA is an approach for data analysis using variety of techniques to gain insights about the data. • Cleaning and preprocessing • Statistical Analysis • Visualization for trend analysis, anomaly detection, outlier detection (and removal). Basic steps in any exploratory data analysis:
  • 8. Importance of EDA Understanding the given dataset and helps clean up the given dataset. It gives you a clear picture of the features and the relationships between them. Discover errors, outliers, and missing values in the data. Identify patterns by visualizing data in graphs such as bar graphs, scatter plots, heatmaps and histograms. 7
  • 9. EDA using Pandas Import data into workplace(Jupyter notebook, Google colab, Python IDE) Descriptive statistics Removal of nulls Visualization 8
  • 10. 1. Packages and data import • Step 1 : Import pandas to the workplace. • “Import pandas” • Step 2 : Read data/dataset into Pandas dataframe. Different input formats include: • Excel : read_excel • CSV: read_csv • JSON: read_json • HTML and many more 9
  • 11. 2. Descriptive Stats (Pandas) • Used to make preliminary assessments about the population distribution of the variable. • Commonly used statistics: 1. Central tendency : • Mean – The average value of all the data points. : dataframe.mean() • Median – The middle value when all the data points are put in an ordered list: dataframe.median() • Mode – The data point which occurs the most in the dataset :dataframe.mode() 2. Spread : It is the measure of how far the datapoints are away from the mean or median • Variance - The variance is the mean of the squares of the individual deviations: dataframe.var() • Standard deviation - The standard deviation is the square root of the variance:dataframe.std() 3. Skewness: It is a measure of asymmetry: dataframe.skew()
  • 12. Descriptive Stats (contd.) Other methods to get a quick look on the data: • Describe() : Summarizes the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values. • Syntax: pandas.dataframe.describe() • Info() :Prints a concise summary of the dataframe. This method prints information about a dataframe including the index dtype and columns, non-null values and memory usage. • Syntax: pandas.dataframe.info()
  • 13. 3. Null values 12 Detecting Detecting Null- values: •Isnull(): It is used as an alias for dataframe.isna(). This function returns the dataframe with boolean values indicating missing values. •Syntax : dataframe.isnull() Handling Handling null values: •Dropping the rows with null values: dropna() function is used to delete rows or columns with null values. •Replacing missing values: fillna() function can fill the missing values with a special value value like mean or median.
  • 14. 4. Visualization • Univariate: Looking at one variable/column at a time • Bar-graph • Histograms • Boxplot • Multivariate : Looking at relationship between two or more variables • Scatter plots • Pie plots • Heatmaps(seaborn) 13
  • 15. Bar-Graph,Histogram and Boxplot • Bar graph: A bar plot is a plot that presents data with rectangular bars with lengths proportional to the values that they represent. • Boxplot : Depicts numerical data graphically through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). • Histogram: A histogram is a representation of the distribution of data. 14
  • 16. Scatterplot, Pieplot • Scatterplot : Shows the data as a collection of points. • Syntax: dataframe.plot.scatter(x = 'x_column_name', y = 'y_columnn_name’) • Pie plot : Proportional representation of the numerical data in a column. • Syntax: dataframe.plot.pie(y=‘column_name’) 15
  • 17. Outlier detection • An outlier is a point or set of data points that lie away from the rest of the data values of the dataset.. • Outliers are easily identified by visualizing the data. • For e.g. • In a boxplot, the data points which lie outside the upper and lower bound can be considered as outliers • In a scatterplot, the data points which lie outside the groups of datapoints can be considered as outliers 16
  • 18. Outlier removal • Calculate the IQR as follows: Calculate the first and third quartile (Q1 and Q3) Calculate the interquartile range, IQR = Q3-Q1 Find the lower bound which is Q1*1.5 Find the upper bound which is Q3*1.5 Replace the data points which lie outside this range. They can be replaced by mean or median. 17
  • 19. Hope now you have some idea, let’s implement all these using the Automobile – Predictive Analysis dataset. 18 Hands on Demonstration