SlideShare a Scribd company logo
1/13
Exploratory Data Analysis - A Comprehensive Guide to
EDA
leewayhertz.com/what-is-exploratory-data-analysis
In today’s data-driven world, the ability to effectively analyze data is a key factor in the
success of many enterprises. By leveraging data analysis tools and techniques, businesses
can gain insights, identify trends, and confidently make informed decisions based on their
data, improving efficiency and gaining an edge in the highly competitive business landscape.
Exploratory data analysis (EDA), which is a preliminary method used for interpreting data
before undertaking any formal modeling or hypothesis testing, is one of the most crucial
procedures involved in data analysis.
Exploratory Data Analysis is the process of detailing the key features of a dataset, frequently
employing visual techniques. It entails exploring and analyzing data to comprehend its
underlying patterns, connections and trends. EDA is important because it helps identify any
issues or anomalies in the data that could affect the reliability of the subsequent analysis.
Many industries benefit from EDA, including finance, healthcare, retail, and marketing, since
it serves as a foundation for data analysis, pinpoints potential shortcomings in the data, and
provides insightful analysis of customer behavior, market trends and business performance.
In data analysis, EDA can assist data analysts in identifying missing or incomplete data,
outliers and inconsistencies that can impact the statistical analysis of data. Conducting an
EDA can also help determine which variables are crucial to explaining the outcome variable
2/13
and which ones can be excluded. Hence, EDA often serves as the first step in developing a
data model because it provides insights into the characteristics of the data.
In this article, we will explore EDA and its importance, process, tools, techniques and more.
What is Exploratory Data Analysis (EDA)?
Importance of EDA in data science
EDA methods and techniques
The EDA process
Types of EDA techniques
Visualization techniques in EDA
Exploratory data analysis tools
What is Exploratory Data Analysis (EDA)?
EDA or Exploratory Data Analysis is a method of examining and understanding data using
multiple techniques like visualization, summary statistics and data transformation to abstract
its core characteristics. EDA is done to get a sense of data and discover any potential
problems or issues which need to be addressed and is generally performed before formal
modeling or hypothesis testing. It aims to identify patterns, relationships, and trends in the
data and use this information to facilitate further analysis or decision-making. Data of
different types, including numerical, categorical and text, can be analyzed using EDA. It is
typically done before data analysis to identify and correct the errors in data and visualize the
key attributes of the data.
3/13
Business
Question/
Problem
Data
Collection
& Storage
EDA
Data
Analysis
Data
Visualisation &
Storytelling
Business
Decision
LeewayHertz
EDA is a scientific approach to understanding the storing of data. Data scientists can use it
to discover patterns, spot anomalies, test hypotheses, or verify assumptions by manipulating
data sources effectively.
Importance of EDA in data science
Exploratory data analysis is an important phase in the data science process because it
enables data scientists to comprehend the data they are working with on a deeper level. Let
us find out why EDA is important in data science by defining its objectives:
1. Conducting an EDA can confirm if the collected data is practicable in the context of the
business problem at hand. If not, the data or the strategy adopted by the data analysts
needs to be changed.
2. It can reveal and resolve data quality issues, like duplicates, missing data, incorrect
values and data types and anomalies.
3. Exploratory data analysis plays a vital role in extracting meaningful insights from data
by revealing key statistical measures such as mean, median, and standard deviation.
4/13
4. Oftentimes, some values deviate significantly from the standard set of values; these
are anomalies that must be verified before the data is analyzed. If unchecked, they can
create havoc in the analysis, leading to miscalculations. As such, one of the objectives
of EDA is to locate outliers and anomalies in the data.
5. EDA unveils the behavior of variables when clubbed together, assisting data scientists
in finding patterns, correlations, and interactions between these variables by visualizing
and analyzing the data. This information is helpful in creating AI models.
6. EDA helps find and drop unwanted columns and derive new variables. It can, thus,
assist in determining which features are most crucial for forecasting the target variable,
assisting in the choice of features to be included in modeling.
7. Based on the characteristics of the data, EDA can assist in identifying appropriate
modeling techniques.
EDA methods and techniques
Some of the common techniques and methods used in Exploratory Data Analysis include the
following:
Data visualization
Data visualization involves generating visual representations of the data using graphs,
charts, and other graphical techniques. Data visualization enables a quick and easy
understanding of patterns and relationships within data. Visualization techniques include
scatter plots, histograms, heatmaps and box plots.
Correlation analysis
Using correlation analysis, one can analyze the relationships between pairs of variables to
identify any correlations or dependencies between them. Correlation analysis helps in
feature selection and in building predictive models. Common correlation techniques include
Pearson’s correlation coefficient, Spearman’s rank correlation coefficient and Kendall’s tau
correlation coefficient.
Dimensionality reduction
In dimensionality reduction, techniques like principal component analysis (PCA) and linear
discriminant analysis (LDA) are used to decrease the number of variables in the data while
keeping as many details as possible.
Descriptive statistics
It involves calculating summary statistics such as mean, median, mode, standard deviation
and variance to gain insights into the distribution of data. The mean is the average value of
the data set and provides an idea of the central tendency of the data. The median is the mid-
5/13
value in a sorted list of values and provides another measure of central tendency. The mode
is the most common value in the data set.
Clustering
Clustering techniques such as K-means clustering, hierarchical clustering, and DBSCAN
clustering help identify patterns and relationships within a dataset by grouping similar data
points together based on their characteristics.
Outlier detection
Outliers are data points that vary or deviate significantly from the rest of the data and can
have a crucial impact on the accuracy of models. Identifying and removing outliers from data
using methods like Z-score, interquartile range (IQR) and box plots method can help improve
the data quality and the models’ accuracy.
The EDA process
Conducting EDA requires expertise in multiple tools and programming languages. In the
example below, we would perform EDA using Python on the open-source web application,
Jupyter Notebook.
The EDA process can be summed up in three steps, which are:
1. Understanding the data
2. Cleaning the data
3. Analysis of the relationship between variables
Let us understand the process of Exploratory Data Analysis (EDA) step-by-step:
Understanding the data
Import necessary libraries
The first step is to import the required libraries. In this code block, the Pandas library is used
to read and manipulate data and the Pandas-profiling library is used for EDA. The datasets
module from the scikit-learn library is used to load the Iris dataset.
import pandas as pd
import pandas_profiling
from sklearn import datasets
Loading the dataset
6/13
Next, we have to load the dataset. Here, we will be using the multivariate dataset named the
Iris dataset.
iris = datasets.load_iris()
Converting to Pandas DataFrame
The scikit-learn dataset is loaded as a Bunch object, similar to a dictionary. To use this
dataset with Pandas, we must convert it to a Pandas DataFrame.
iris_data = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_data['target'] = iris['target']
Checking data attributes
It is always a good approach to check the attributes of the data, like its shape or the number
of rows and columns in the dataset. To check the shape of the data, run the following code:
iris_data.shape
To check the column names of the DataFrame, we use the columns attribute.
iris_data.columns
If your dataset is big, you can view the first few records of the DataFrame by running the
below code:
iris_data.head()
Cleaning the data
Once you are done with scanning the attributes of the data, you can make any necessary
modifications in the dataset, like changing the name of the columns or the raws. Remember
not to change the dataset’s variables, which can significantly impact the final result.
Check for null values
To clean the data, first, you must check for any null values in the variables. If any of the
variables in a dataset have null values, it can affect the analysis results. If your dataset has
missing data, handle them through approaches like imputation, deletion of observations or
variables, or using models that can handle missing data.
Dropping the redundant data and removing outliers
Next, if you find any redundant data in your dataset that does not add value to the output,
you can also remove them from the table. All the columns and rows are important in the iris
dataset we have taken. So we would not be dropping the data. In this step, we must also find
7/13
any outliers in the data.
Analysis of the relationship between variables
The final step in the process of EDA is to analyze the relationship between variables. It
involves the following:
Correlation analysis: The analyst computes the correlation matrix between variables
to identify which variables are strongly correlated with each other.
Visualization: The data analyst creates visualizations to explore the relationship
between variables. This includes scatter plots, heatmaps, etc.
Hypothesis testing: The analyst performs statistical tests to test hypotheses about the
relationship between variables.
Run the following code to generate a report that includes various relationship analyses
between variables.
pandas_profiling.ProfileReport(iris_data)
You can view the output of the above code from this Github repository.
Types of EDA techniques
Several types of exploratory data analysis techniques can be used to gain insights into data.
Some common types of EDA include:
Univariate non-graphical
Univariate non-graphical exploratory data analysis is a simple yet fundamental method for
examining information that includes utilizing only one variable to analyze the data. Univariate
non-graphical EDA focuses on figuring out the underlying distribution or pattern in the data
and mentions objective facts about the population. This procedure includes the examination
of the attributes of the population distribution, including spread, central tendency, skewness
and kurtosis.
An average or middle value of a distribution is called the central tendency. A common
measure of central tendency is the mean, followed by the median and mode. As a
measure of central tendency, the median may be preferred if the distribution is skewed
or concerns are raised about outliers.
Spread shows how far off the information values are from the central tendency. The
standard deviation and variance are two valuable proportions of the spread. The
variance is the mean of the square of the individual deviations, and the standard
deviation is the foundation of the variance.
8/13
Skewness and kurtosis are two more helpful univariate descriptors of the distribution.
Skewness is a metric of the asymmetry of the distribution, while kurtosis is a proportion
of the peakedness of the distribution contrasted with an ordinary dispersion.
Outlier detection is also important in univariate non-graphical EDA, as outliers can
significantly impact the distribution and distort statistical analysis results.
Multivariate non-graphical
Multivariate non-graphical EDA is a technique used to explore the relationship between two
or more variables through cross-tabulation or statistics. It is useful for identifying patterns and
relationships between variables. This analysis is particularly useful when multiple variables
exist in a dataset, and you want to see how they relate.
Cross-tabulation is a helpful extension of tabulation for categorical data. Cross-tabulation is
preferable when there are two variables involved. To do this, create a two-way table with
column headings corresponding to the number of one variable and row headings
corresponding to the number of the other two variables. Next, fill the counts with all subjects
with the same pair of levels.
We produce statistics for quantitative variables individually for each level of each categorical
variable and one quantitative variable, and then we compare the statistics across all
categorical variables. The purpose of multivariate non-graphical EDA is to identify
relationships between variables and understand how they are related. Examining the
relationship between variables makes it possible to discover patterns and trends that may
not be immediately obvious from examining individual variables in isolation.
Univariate graphical
A univariate graphical EDA technique employs a variety of graphs to gain insight into a single
variable’s distribution. These graphical techniques enable us to gain a quick understanding
of shapes, central tendencies, spreads, modalities, skewnesses, and outliers of the data we
are studying. The following are some of the most commonly used univariate graphical EDA
techniques:
1. Histogram: This is one of the most basic graphs used in EDA. A histogram is a bar
plot that displays the frequency or proportion of cases in each of several intervals (bins)
of a variable’s values. The height of each bar represents the count or proportion of
observations that fall within each interval. Histograms provide an intuitive sense of the
shape and spread of the distribution, as well as any outliers.
9/13
2. Stem-and-leaf plots: A stem-and-leaf plot is an alternative to a histogram that displays
each data value along with its magnitude. In a stem-and-leaf plot, each data value is
split into a stem and leaf, with the stem representing the leading digits and the leaf
representing the trailing digits. This type of plot provides a visual representation of the
data’s distribution and can highlight features such as symmetry and skewness.
3. Boxplots: Boxplots, also known as box-and-whisker plots, provide a visual summary of
the distribution’s central tendency, spread and outliers. The box in a boxplot represents
the data’s interquartile range (IQR), with the median line inside the box. The whiskers
extend from the box to the smallest and largest observations within 1.5 times the IQR
from the box. Data points outside of the whiskers are considered outliers.
4. Quantile-normal plots: A quantile-normal plot, also known as a Q-Q plot, assesses
the data distribution by comparing the observed values to the expected values from a
normal distribution. In a Q-Q plot, the observed data is plotted against the quantiles of
a normal distribution. The points should lie along a straight line if the data is normally
distributed. If the data deviates from normality, the plot will reveal any skewness,
kurtosis, or outliers.
Multivariate graphical
A multivariate graphical EDA displays relationships between two or more data sets using
graphics. When examining relationships between variables beyond two, this technique is
used to gain a more comprehensive understanding of the data. A grouped barplot is one of
the most commonly used multivariate graphical techniques, with each group representing
one level of one variable and each bar representing its amount.
Multivariate graphics can also be represented in scatterplots, run charts, heat maps,
multivariate charts, and bubble charts.
Scatterplots are graphical representations displaying the relationship between two
quantitative/numerical variables. It consists of plotting one variable on the x-axis and
another on the y-axis. On the plot, each point represents an observation. Scatterplots
make it possible to identify outliers or patterns in the data and the direction and
strength of the relationship between any two variables.
A run chart is a line graph that shows how data changes over time. It is a simple but
powerful tool for tracking changes and monitoring trends in data. Run charts can be
used to detect trends, cycles, or shifts in a process over time.
A multivariate chart illustrates the relationship between factors and responses. It is a
type of scatterplot that depicts relationships between several variables simultaneously.
A multivariate chart depicts the relationship between variables and identifies patterns or
clusters in the data.
10/13
Bubble chart is a data visualization that displays multiple circles (bubbles) in a two-
dimensional plot. The size of each circle represents a value of a third variable. Bubble
charts are often used to compare data sets with three variables, as they provide an
easy way to visualize the relationships between these variables.
Visualization techniques in EDA
Scatterplot
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Histogram
25
20
15
10
5
0
-3 -2 -1 0 1 2 3
Line Chart
JAN
100
90
80
70
60
50
40
30
20
10
0 FEB MAR APR MAY JUN
Pie Chart
68%
18%
18%
45%
35%
Bubble Chart
10 20 30 40 50
50
40
30
20
10
0
Bar Chart
600
500
400
300
200
100
0
Heatmap
1
1 2 3 4 5
2
3
4
5
Boxplot
Jan 2015 Jan 2016 Jan 2017 Jan 2018
100
80
60
40
20
LeewayHertz
Visualization techniques play an essential role in EDA, enabling us to explore and
understand complex data structures and relationships visually. Some common visualization
techniques used in EDA are:
1. Histograms: Histograms are graphical representations that show the distribution of
numerical variables. They help understand the central tendency and spread of the data
by visualizing the frequency distribution.
2. Boxplots: A boxplot is a graph showing the distribution of a numerical variable. This
visualization technique helps identify any outliers and understand the spread of the
data by visualizing its quartiles.
3. Heatmaps: They are graphical representations of data in which colors represent
values. They are often used to display complex data sets, providing a quick and easy
way to visualize patterns and trends in large amounts of data.
4. Bar charts: A bar chart is a graph that shows the distribution of a categorical variable.
It is used to visualize the frequency distribution of the data, which helps to understand
the relative frequency of each category.
11/13
5. Line charts: A line chart is a graph that shows the trend of a numerical variable over
time. It is used to visualize the changes in the data over time and to identify any
patterns or trends.
6. Pie charts: Pie charts are a graph that showcases the proportion of a categorical
variable. It is used to visualize each category’s relative proportion and understand the
data distribution.
Exploratory data analysis tools
Spreadsheet software
Due to its simplicity, familiar interface and basic statistical analysis capabilities, spreadsheet
software such as Microsoft Excel, Google Sheets, or LibreOffice Calc is commonly used for
EDA. Using them, users can sort, filter, manipulate data and perform basic statistical
analysis, like calculating the mean, median and standard deviation.
Statistical software
Specialized statistical software such as R or Python and their various libraries and packages
offer more advanced statistical analysis tools, including regression analysis, hypothesis
testing, and time series analysis. This software allows users to write customized functions
and perform complex statistical analyses on large datasets.
Data visualization software
Visualization software like Tableau, Power BI, or QlikView enables users to create interactive
and dynamic data visualizations. These tools help users to identify patterns and relationships
in the data, allowing for more informed decision-making. They also offer various types of
charts and graphs, as well as the ability to create dashboards and reports. The software
allows data to be easily shared and published, making it useful for collaborative projects or
presentations.
Programming languages
Programming languages such as R, Python, Julia and MATLAB offer powerful numerical
computing capabilities and provide access to various statistical analysis tools. These
languages can be used to write customized functions for specific analysis needs and are
particularly useful when working with large datasets. They also enable the automation of
repetitive tasks, besides bringing flexibility in data handling and manipulation.
Business Intelligence (BI) tools
12/13
BI tools like SAP BusinessObjects, IBM Cognos or Oracle BI offer a range of functionalities,
including data exploration, dashboards and reports. They allow users to visualize and
analyze data from various sources, including databases and spreadsheets. They provide
data preparation tools and quality management tools that can be used in business settings to
help organizations make data-driven decisions.
Data mining tools
Data mining tools such as KNIME, RapidMiner or Weka provide a range of functionalities,
including data preprocessing, clustering, classification and association rule mining. These
tools are particularly useful for identifying patterns and relationships in large datasets and
building predictive models. Data mining tools are used in various industries, including
finance, healthcare and retail.
Cloud-based tools
Cloud-based platforms such as Google Cloud, Amazon Web Services (AWS) and Microsoft
Azure offer a range of tools and services for data analysis. They provide a scalable and
flexible infrastructure for storing and processing data and offer a range of data analysis and
visualization tools. Cloud-based tools are particularly useful for working with large and
complex datasets, as they offer high-performance computing resources and the ability to
scale up or down depending on the project’s needs.
Text analytics tools
Text analytics tools like RapidMiner and SAS Text Analytics are used to analyze unstructured
data, such as text documents or social media posts. They use natural language processing
(NLP) techniques to extract insights from text data, such as sentiment analysis, entity
recognition and topic modeling. Text analytics tools are used in a range of industries,
including marketing, customer service and political analysis.
Geographic Information System (GIS) tools
GIS tools such as ArcGIS and QGIS are used to analyze and visualize geospatial data. They
allow users to map data and perform spatial analysis, such as identifying patterns and trends
in geographical data or performing spatial queries. GIS tools are used in a range of
industries, including urban planning, environmental management and transportation.
Endnote
Exploratory data analysis, or EDA, is an essential step that must be conducted before
moving forward with data analysis. It helps data scientists and analysts to understand and
gain insights into the data they are working on. It helps discover missing or wrong data that
may lead to bias or fault in the final analysis. Analysts can guarantee that the data used for
13/13
analysis is accurate and reliable by cleaning and preprocessing the data during the EDA
process. EDA methods can also facilitate feature selection, identifying the vital features to be
included in machine learning models and improving model performance. Overall, EDA allows
for detecting anomalies, patterns and relationships within the data, which can help
businesses make informed decisions and acquire a competitive edge in the fast-evolving
tech sphere.
Take data analysis to the next level! Contact our Data Scientists and Data Analysts to
perform a comprehensive Exploratory Data Analysis (EDA) and unlock valuable insights from
your data.

More Related Content

What's hot

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
Eva Durall
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
DATAVERSITY
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
Tuba Yaman Him
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
Vrishit Saraswat
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
SadhanaParameswaran
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
Mohammed Barakat
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
Ahtesham Ullah khan
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Umair Shafique
 
Data science
Data scienceData science
Data science
Sreejith c
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
Centerline Digital
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
Data science
Data science Data science
Data science
SouravSadhukhan6
 
Data Quality
Data QualityData Quality
Data Quality
Vijaya K
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
Ritika Barethia
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
Harsh Kishore Mishra
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17
Eugene O'Loughlin
 

What's hot (20)

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Data science
Data scienceData science
Data science
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data science
Data science Data science
Data science
 
Data Quality
Data QualityData Quality
Data Quality
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17
 

Similar to Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf

Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
Aivada
 
Exploratory Data Analysis_ Uncovering Patterns in Data.pdf
Exploratory Data Analysis_ Uncovering Patterns in Data.pdfExploratory Data Analysis_ Uncovering Patterns in Data.pdf
Exploratory Data Analysis_ Uncovering Patterns in Data.pdf
archijain931
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
priyanka rajput
 
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptxEXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
rakeshreghu98
 
Basics of Data Science Foundation Explained | IABAC
Basics of Data Science Foundation Explained | IABACBasics of Data Science Foundation Explained | IABAC
Basics of Data Science Foundation Explained | IABAC
shanithava
 
data science course in hyderabad
data science course in hyderabaddata science course in hyderabad
data science course in hyderabad
maneesha2312
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
ShaikSikindar1
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
Nirmalavenkatachalam
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Neha Singh
 
Data analytics Course for Beginners (1).pptx
Data analytics Course for Beginners (1).pptxData analytics Course for Beginners (1).pptx
Data analytics Course for Beginners (1).pptx
SumitAgarwal65690
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
JANNU VINAY
 
Moh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptxMoh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptx
AbdullahEmam4
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
ertekg
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
PratikshaSurve4
 
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdfMaster Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
proitbridgePvtLtd
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
 
Defining Data Science: A Comprehensive Overview
Defining Data Science: A Comprehensive OverviewDefining Data Science: A Comprehensive Overview
Defining Data Science: A Comprehensive Overview
IABAC
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
VISHALMARWADE1
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
Sukirti Garg
 

Similar to Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf (20)

Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
Exploratory Data Analysis_ Uncovering Patterns in Data.pdf
Exploratory Data Analysis_ Uncovering Patterns in Data.pdfExploratory Data Analysis_ Uncovering Patterns in Data.pdf
Exploratory Data Analysis_ Uncovering Patterns in Data.pdf
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptxEXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
EXPLORATORY DATA ANALYSIS IN STATISTICAL MODeLING.pptx
 
Basics of Data Science Foundation Explained | IABAC
Basics of Data Science Foundation Explained | IABACBasics of Data Science Foundation Explained | IABAC
Basics of Data Science Foundation Explained | IABAC
 
data science course in hyderabad
data science course in hyderabaddata science course in hyderabad
data science course in hyderabad
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
 
Data analytics Course for Beginners (1).pptx
Data analytics Course for Beginners (1).pptxData analytics Course for Beginners (1).pptx
Data analytics Course for Beginners (1).pptx
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Moh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptxMoh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptx
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdfMaster Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
Master Data Analyst Course in Bangalore with ProITBridge's Expert Course.pdf
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
 
Defining Data Science: A Comprehensive Overview
Defining Data Science: A Comprehensive OverviewDefining Data Science: A Comprehensive Overview
Defining Data Science: A Comprehensive Overview
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 

More from JamieDornan2

Shaping the future of ai for management consulting.pdf
Shaping the future of ai for management consulting.pdfShaping the future of ai for management consulting.pdf
Shaping the future of ai for management consulting.pdf
JamieDornan2
 
Blockchain in Healthcare - An Overview.pdf
Blockchain in Healthcare - An Overview.pdfBlockchain in Healthcare - An Overview.pdf
Blockchain in Healthcare - An Overview.pdf
JamieDornan2
 
Blockchain Use Cases and Applications by Industry.pdf
Blockchain Use Cases and Applications by Industry.pdfBlockchain Use Cases and Applications by Industry.pdf
Blockchain Use Cases and Applications by Industry.pdf
JamieDornan2
 
Web3 Use Cases in Real-World Applications.pdf
Web3 Use Cases in Real-World Applications.pdfWeb3 Use Cases in Real-World Applications.pdf
Web3 Use Cases in Real-World Applications.pdf
JamieDornan2
 
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdfHow Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
JamieDornan2
 
Blockchain in Identity Management - An Overview.pdf
Blockchain in Identity Management - An Overview.pdfBlockchain in Identity Management - An Overview.pdf
Blockchain in Identity Management - An Overview.pdf
JamieDornan2
 
AI use cases in legal research - An Overview.pdf
AI use cases in legal research - An Overview.pdfAI use cases in legal research - An Overview.pdf
AI use cases in legal research - An Overview.pdf
JamieDornan2
 
The impact of AI in construction - An Overview.pdf
The impact of AI in construction - An Overview.pdfThe impact of AI in construction - An Overview.pdf
The impact of AI in construction - An Overview.pdf
JamieDornan2
 
AI in Project Management.pdf
AI in Project Management.pdfAI in Project Management.pdf
AI in Project Management.pdf
JamieDornan2
 
AI in Market Research.pdf
AI in Market Research.pdfAI in Market Research.pdf
AI in Market Research.pdf
JamieDornan2
 
Conversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdfConversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdf
JamieDornan2
 
generative AI in healthcare.pdf
generative AI in healthcare.pdfgenerative AI in healthcare.pdf
generative AI in healthcare.pdf
JamieDornan2
 
Generative AI in telecom.pdf
Generative AI in telecom.pdfGenerative AI in telecom.pdf
Generative AI in telecom.pdf
JamieDornan2
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
JamieDornan2
 
How AI in business process automation is changing the game (1).pdf
How AI in business process automation is changing the game (1).pdfHow AI in business process automation is changing the game (1).pdf
How AI in business process automation is changing the game (1).pdf
JamieDornan2
 
AI in trade promotion optimization.pdf
AI in trade promotion optimization.pdfAI in trade promotion optimization.pdf
AI in trade promotion optimization.pdf
JamieDornan2
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdf
JamieDornan2
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
JamieDornan2
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdf
JamieDornan2
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdf
JamieDornan2
 

More from JamieDornan2 (20)

Shaping the future of ai for management consulting.pdf
Shaping the future of ai for management consulting.pdfShaping the future of ai for management consulting.pdf
Shaping the future of ai for management consulting.pdf
 
Blockchain in Healthcare - An Overview.pdf
Blockchain in Healthcare - An Overview.pdfBlockchain in Healthcare - An Overview.pdf
Blockchain in Healthcare - An Overview.pdf
 
Blockchain Use Cases and Applications by Industry.pdf
Blockchain Use Cases and Applications by Industry.pdfBlockchain Use Cases and Applications by Industry.pdf
Blockchain Use Cases and Applications by Industry.pdf
 
Web3 Use Cases in Real-World Applications.pdf
Web3 Use Cases in Real-World Applications.pdfWeb3 Use Cases in Real-World Applications.pdf
Web3 Use Cases in Real-World Applications.pdf
 
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdfHow Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
How Does Blockchain Identity Management Revolutionise Financial Sectors.pdf
 
Blockchain in Identity Management - An Overview.pdf
Blockchain in Identity Management - An Overview.pdfBlockchain in Identity Management - An Overview.pdf
Blockchain in Identity Management - An Overview.pdf
 
AI use cases in legal research - An Overview.pdf
AI use cases in legal research - An Overview.pdfAI use cases in legal research - An Overview.pdf
AI use cases in legal research - An Overview.pdf
 
The impact of AI in construction - An Overview.pdf
The impact of AI in construction - An Overview.pdfThe impact of AI in construction - An Overview.pdf
The impact of AI in construction - An Overview.pdf
 
AI in Project Management.pdf
AI in Project Management.pdfAI in Project Management.pdf
AI in Project Management.pdf
 
AI in Market Research.pdf
AI in Market Research.pdfAI in Market Research.pdf
AI in Market Research.pdf
 
Conversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdfConversational AI Transforming human-machine interaction.pdf
Conversational AI Transforming human-machine interaction.pdf
 
generative AI in healthcare.pdf
generative AI in healthcare.pdfgenerative AI in healthcare.pdf
generative AI in healthcare.pdf
 
Generative AI in telecom.pdf
Generative AI in telecom.pdfGenerative AI in telecom.pdf
Generative AI in telecom.pdf
 
A comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdfA comprehensive guide to prompt engineering.pdf
A comprehensive guide to prompt engineering.pdf
 
How AI in business process automation is changing the game (1).pdf
How AI in business process automation is changing the game (1).pdfHow AI in business process automation is changing the game (1).pdf
How AI in business process automation is changing the game (1).pdf
 
AI in trade promotion optimization.pdf
AI in trade promotion optimization.pdfAI in trade promotion optimization.pdf
AI in trade promotion optimization.pdf
 
Decision Transformers Model.pdf
Decision Transformers Model.pdfDecision Transformers Model.pdf
Decision Transformers Model.pdf
 
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdfExploratory Data Analysis - A Comprehensive Guide to EDA.pdf
Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdf
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdf
 

Recently uploaded

Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
Management Institute of Skills Development
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 

Recently uploaded (20)

Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Figma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdfFigma AI Design Generator_ In-Depth Review.pdf
Figma AI Design Generator_ In-Depth Review.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 

Exploratory Data Analysis - A Comprehensive Guide to EDA.pdf

  • 1. 1/13 Exploratory Data Analysis - A Comprehensive Guide to EDA leewayhertz.com/what-is-exploratory-data-analysis In today’s data-driven world, the ability to effectively analyze data is a key factor in the success of many enterprises. By leveraging data analysis tools and techniques, businesses can gain insights, identify trends, and confidently make informed decisions based on their data, improving efficiency and gaining an edge in the highly competitive business landscape. Exploratory data analysis (EDA), which is a preliminary method used for interpreting data before undertaking any formal modeling or hypothesis testing, is one of the most crucial procedures involved in data analysis. Exploratory Data Analysis is the process of detailing the key features of a dataset, frequently employing visual techniques. It entails exploring and analyzing data to comprehend its underlying patterns, connections and trends. EDA is important because it helps identify any issues or anomalies in the data that could affect the reliability of the subsequent analysis. Many industries benefit from EDA, including finance, healthcare, retail, and marketing, since it serves as a foundation for data analysis, pinpoints potential shortcomings in the data, and provides insightful analysis of customer behavior, market trends and business performance. In data analysis, EDA can assist data analysts in identifying missing or incomplete data, outliers and inconsistencies that can impact the statistical analysis of data. Conducting an EDA can also help determine which variables are crucial to explaining the outcome variable
  • 2. 2/13 and which ones can be excluded. Hence, EDA often serves as the first step in developing a data model because it provides insights into the characteristics of the data. In this article, we will explore EDA and its importance, process, tools, techniques and more. What is Exploratory Data Analysis (EDA)? Importance of EDA in data science EDA methods and techniques The EDA process Types of EDA techniques Visualization techniques in EDA Exploratory data analysis tools What is Exploratory Data Analysis (EDA)? EDA or Exploratory Data Analysis is a method of examining and understanding data using multiple techniques like visualization, summary statistics and data transformation to abstract its core characteristics. EDA is done to get a sense of data and discover any potential problems or issues which need to be addressed and is generally performed before formal modeling or hypothesis testing. It aims to identify patterns, relationships, and trends in the data and use this information to facilitate further analysis or decision-making. Data of different types, including numerical, categorical and text, can be analyzed using EDA. It is typically done before data analysis to identify and correct the errors in data and visualize the key attributes of the data.
  • 3. 3/13 Business Question/ Problem Data Collection & Storage EDA Data Analysis Data Visualisation & Storytelling Business Decision LeewayHertz EDA is a scientific approach to understanding the storing of data. Data scientists can use it to discover patterns, spot anomalies, test hypotheses, or verify assumptions by manipulating data sources effectively. Importance of EDA in data science Exploratory data analysis is an important phase in the data science process because it enables data scientists to comprehend the data they are working with on a deeper level. Let us find out why EDA is important in data science by defining its objectives: 1. Conducting an EDA can confirm if the collected data is practicable in the context of the business problem at hand. If not, the data or the strategy adopted by the data analysts needs to be changed. 2. It can reveal and resolve data quality issues, like duplicates, missing data, incorrect values and data types and anomalies. 3. Exploratory data analysis plays a vital role in extracting meaningful insights from data by revealing key statistical measures such as mean, median, and standard deviation.
  • 4. 4/13 4. Oftentimes, some values deviate significantly from the standard set of values; these are anomalies that must be verified before the data is analyzed. If unchecked, they can create havoc in the analysis, leading to miscalculations. As such, one of the objectives of EDA is to locate outliers and anomalies in the data. 5. EDA unveils the behavior of variables when clubbed together, assisting data scientists in finding patterns, correlations, and interactions between these variables by visualizing and analyzing the data. This information is helpful in creating AI models. 6. EDA helps find and drop unwanted columns and derive new variables. It can, thus, assist in determining which features are most crucial for forecasting the target variable, assisting in the choice of features to be included in modeling. 7. Based on the characteristics of the data, EDA can assist in identifying appropriate modeling techniques. EDA methods and techniques Some of the common techniques and methods used in Exploratory Data Analysis include the following: Data visualization Data visualization involves generating visual representations of the data using graphs, charts, and other graphical techniques. Data visualization enables a quick and easy understanding of patterns and relationships within data. Visualization techniques include scatter plots, histograms, heatmaps and box plots. Correlation analysis Using correlation analysis, one can analyze the relationships between pairs of variables to identify any correlations or dependencies between them. Correlation analysis helps in feature selection and in building predictive models. Common correlation techniques include Pearson’s correlation coefficient, Spearman’s rank correlation coefficient and Kendall’s tau correlation coefficient. Dimensionality reduction In dimensionality reduction, techniques like principal component analysis (PCA) and linear discriminant analysis (LDA) are used to decrease the number of variables in the data while keeping as many details as possible. Descriptive statistics It involves calculating summary statistics such as mean, median, mode, standard deviation and variance to gain insights into the distribution of data. The mean is the average value of the data set and provides an idea of the central tendency of the data. The median is the mid-
  • 5. 5/13 value in a sorted list of values and provides another measure of central tendency. The mode is the most common value in the data set. Clustering Clustering techniques such as K-means clustering, hierarchical clustering, and DBSCAN clustering help identify patterns and relationships within a dataset by grouping similar data points together based on their characteristics. Outlier detection Outliers are data points that vary or deviate significantly from the rest of the data and can have a crucial impact on the accuracy of models. Identifying and removing outliers from data using methods like Z-score, interquartile range (IQR) and box plots method can help improve the data quality and the models’ accuracy. The EDA process Conducting EDA requires expertise in multiple tools and programming languages. In the example below, we would perform EDA using Python on the open-source web application, Jupyter Notebook. The EDA process can be summed up in three steps, which are: 1. Understanding the data 2. Cleaning the data 3. Analysis of the relationship between variables Let us understand the process of Exploratory Data Analysis (EDA) step-by-step: Understanding the data Import necessary libraries The first step is to import the required libraries. In this code block, the Pandas library is used to read and manipulate data and the Pandas-profiling library is used for EDA. The datasets module from the scikit-learn library is used to load the Iris dataset. import pandas as pd import pandas_profiling from sklearn import datasets Loading the dataset
  • 6. 6/13 Next, we have to load the dataset. Here, we will be using the multivariate dataset named the Iris dataset. iris = datasets.load_iris() Converting to Pandas DataFrame The scikit-learn dataset is loaded as a Bunch object, similar to a dictionary. To use this dataset with Pandas, we must convert it to a Pandas DataFrame. iris_data = pd.DataFrame(iris.data, columns=iris.feature_names) iris_data['target'] = iris['target'] Checking data attributes It is always a good approach to check the attributes of the data, like its shape or the number of rows and columns in the dataset. To check the shape of the data, run the following code: iris_data.shape To check the column names of the DataFrame, we use the columns attribute. iris_data.columns If your dataset is big, you can view the first few records of the DataFrame by running the below code: iris_data.head() Cleaning the data Once you are done with scanning the attributes of the data, you can make any necessary modifications in the dataset, like changing the name of the columns or the raws. Remember not to change the dataset’s variables, which can significantly impact the final result. Check for null values To clean the data, first, you must check for any null values in the variables. If any of the variables in a dataset have null values, it can affect the analysis results. If your dataset has missing data, handle them through approaches like imputation, deletion of observations or variables, or using models that can handle missing data. Dropping the redundant data and removing outliers Next, if you find any redundant data in your dataset that does not add value to the output, you can also remove them from the table. All the columns and rows are important in the iris dataset we have taken. So we would not be dropping the data. In this step, we must also find
  • 7. 7/13 any outliers in the data. Analysis of the relationship between variables The final step in the process of EDA is to analyze the relationship between variables. It involves the following: Correlation analysis: The analyst computes the correlation matrix between variables to identify which variables are strongly correlated with each other. Visualization: The data analyst creates visualizations to explore the relationship between variables. This includes scatter plots, heatmaps, etc. Hypothesis testing: The analyst performs statistical tests to test hypotheses about the relationship between variables. Run the following code to generate a report that includes various relationship analyses between variables. pandas_profiling.ProfileReport(iris_data) You can view the output of the above code from this Github repository. Types of EDA techniques Several types of exploratory data analysis techniques can be used to gain insights into data. Some common types of EDA include: Univariate non-graphical Univariate non-graphical exploratory data analysis is a simple yet fundamental method for examining information that includes utilizing only one variable to analyze the data. Univariate non-graphical EDA focuses on figuring out the underlying distribution or pattern in the data and mentions objective facts about the population. This procedure includes the examination of the attributes of the population distribution, including spread, central tendency, skewness and kurtosis. An average or middle value of a distribution is called the central tendency. A common measure of central tendency is the mean, followed by the median and mode. As a measure of central tendency, the median may be preferred if the distribution is skewed or concerns are raised about outliers. Spread shows how far off the information values are from the central tendency. The standard deviation and variance are two valuable proportions of the spread. The variance is the mean of the square of the individual deviations, and the standard deviation is the foundation of the variance.
  • 8. 8/13 Skewness and kurtosis are two more helpful univariate descriptors of the distribution. Skewness is a metric of the asymmetry of the distribution, while kurtosis is a proportion of the peakedness of the distribution contrasted with an ordinary dispersion. Outlier detection is also important in univariate non-graphical EDA, as outliers can significantly impact the distribution and distort statistical analysis results. Multivariate non-graphical Multivariate non-graphical EDA is a technique used to explore the relationship between two or more variables through cross-tabulation or statistics. It is useful for identifying patterns and relationships between variables. This analysis is particularly useful when multiple variables exist in a dataset, and you want to see how they relate. Cross-tabulation is a helpful extension of tabulation for categorical data. Cross-tabulation is preferable when there are two variables involved. To do this, create a two-way table with column headings corresponding to the number of one variable and row headings corresponding to the number of the other two variables. Next, fill the counts with all subjects with the same pair of levels. We produce statistics for quantitative variables individually for each level of each categorical variable and one quantitative variable, and then we compare the statistics across all categorical variables. The purpose of multivariate non-graphical EDA is to identify relationships between variables and understand how they are related. Examining the relationship between variables makes it possible to discover patterns and trends that may not be immediately obvious from examining individual variables in isolation. Univariate graphical A univariate graphical EDA technique employs a variety of graphs to gain insight into a single variable’s distribution. These graphical techniques enable us to gain a quick understanding of shapes, central tendencies, spreads, modalities, skewnesses, and outliers of the data we are studying. The following are some of the most commonly used univariate graphical EDA techniques: 1. Histogram: This is one of the most basic graphs used in EDA. A histogram is a bar plot that displays the frequency or proportion of cases in each of several intervals (bins) of a variable’s values. The height of each bar represents the count or proportion of observations that fall within each interval. Histograms provide an intuitive sense of the shape and spread of the distribution, as well as any outliers.
  • 9. 9/13 2. Stem-and-leaf plots: A stem-and-leaf plot is an alternative to a histogram that displays each data value along with its magnitude. In a stem-and-leaf plot, each data value is split into a stem and leaf, with the stem representing the leading digits and the leaf representing the trailing digits. This type of plot provides a visual representation of the data’s distribution and can highlight features such as symmetry and skewness. 3. Boxplots: Boxplots, also known as box-and-whisker plots, provide a visual summary of the distribution’s central tendency, spread and outliers. The box in a boxplot represents the data’s interquartile range (IQR), with the median line inside the box. The whiskers extend from the box to the smallest and largest observations within 1.5 times the IQR from the box. Data points outside of the whiskers are considered outliers. 4. Quantile-normal plots: A quantile-normal plot, also known as a Q-Q plot, assesses the data distribution by comparing the observed values to the expected values from a normal distribution. In a Q-Q plot, the observed data is plotted against the quantiles of a normal distribution. The points should lie along a straight line if the data is normally distributed. If the data deviates from normality, the plot will reveal any skewness, kurtosis, or outliers. Multivariate graphical A multivariate graphical EDA displays relationships between two or more data sets using graphics. When examining relationships between variables beyond two, this technique is used to gain a more comprehensive understanding of the data. A grouped barplot is one of the most commonly used multivariate graphical techniques, with each group representing one level of one variable and each bar representing its amount. Multivariate graphics can also be represented in scatterplots, run charts, heat maps, multivariate charts, and bubble charts. Scatterplots are graphical representations displaying the relationship between two quantitative/numerical variables. It consists of plotting one variable on the x-axis and another on the y-axis. On the plot, each point represents an observation. Scatterplots make it possible to identify outliers or patterns in the data and the direction and strength of the relationship between any two variables. A run chart is a line graph that shows how data changes over time. It is a simple but powerful tool for tracking changes and monitoring trends in data. Run charts can be used to detect trends, cycles, or shifts in a process over time. A multivariate chart illustrates the relationship between factors and responses. It is a type of scatterplot that depicts relationships between several variables simultaneously. A multivariate chart depicts the relationship between variables and identifies patterns or clusters in the data.
  • 10. 10/13 Bubble chart is a data visualization that displays multiple circles (bubbles) in a two- dimensional plot. The size of each circle represents a value of a third variable. Bubble charts are often used to compare data sets with three variables, as they provide an easy way to visualize the relationships between these variables. Visualization techniques in EDA Scatterplot 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Histogram 25 20 15 10 5 0 -3 -2 -1 0 1 2 3 Line Chart JAN 100 90 80 70 60 50 40 30 20 10 0 FEB MAR APR MAY JUN Pie Chart 68% 18% 18% 45% 35% Bubble Chart 10 20 30 40 50 50 40 30 20 10 0 Bar Chart 600 500 400 300 200 100 0 Heatmap 1 1 2 3 4 5 2 3 4 5 Boxplot Jan 2015 Jan 2016 Jan 2017 Jan 2018 100 80 60 40 20 LeewayHertz Visualization techniques play an essential role in EDA, enabling us to explore and understand complex data structures and relationships visually. Some common visualization techniques used in EDA are: 1. Histograms: Histograms are graphical representations that show the distribution of numerical variables. They help understand the central tendency and spread of the data by visualizing the frequency distribution. 2. Boxplots: A boxplot is a graph showing the distribution of a numerical variable. This visualization technique helps identify any outliers and understand the spread of the data by visualizing its quartiles. 3. Heatmaps: They are graphical representations of data in which colors represent values. They are often used to display complex data sets, providing a quick and easy way to visualize patterns and trends in large amounts of data. 4. Bar charts: A bar chart is a graph that shows the distribution of a categorical variable. It is used to visualize the frequency distribution of the data, which helps to understand the relative frequency of each category.
  • 11. 11/13 5. Line charts: A line chart is a graph that shows the trend of a numerical variable over time. It is used to visualize the changes in the data over time and to identify any patterns or trends. 6. Pie charts: Pie charts are a graph that showcases the proportion of a categorical variable. It is used to visualize each category’s relative proportion and understand the data distribution. Exploratory data analysis tools Spreadsheet software Due to its simplicity, familiar interface and basic statistical analysis capabilities, spreadsheet software such as Microsoft Excel, Google Sheets, or LibreOffice Calc is commonly used for EDA. Using them, users can sort, filter, manipulate data and perform basic statistical analysis, like calculating the mean, median and standard deviation. Statistical software Specialized statistical software such as R or Python and their various libraries and packages offer more advanced statistical analysis tools, including regression analysis, hypothesis testing, and time series analysis. This software allows users to write customized functions and perform complex statistical analyses on large datasets. Data visualization software Visualization software like Tableau, Power BI, or QlikView enables users to create interactive and dynamic data visualizations. These tools help users to identify patterns and relationships in the data, allowing for more informed decision-making. They also offer various types of charts and graphs, as well as the ability to create dashboards and reports. The software allows data to be easily shared and published, making it useful for collaborative projects or presentations. Programming languages Programming languages such as R, Python, Julia and MATLAB offer powerful numerical computing capabilities and provide access to various statistical analysis tools. These languages can be used to write customized functions for specific analysis needs and are particularly useful when working with large datasets. They also enable the automation of repetitive tasks, besides bringing flexibility in data handling and manipulation. Business Intelligence (BI) tools
  • 12. 12/13 BI tools like SAP BusinessObjects, IBM Cognos or Oracle BI offer a range of functionalities, including data exploration, dashboards and reports. They allow users to visualize and analyze data from various sources, including databases and spreadsheets. They provide data preparation tools and quality management tools that can be used in business settings to help organizations make data-driven decisions. Data mining tools Data mining tools such as KNIME, RapidMiner or Weka provide a range of functionalities, including data preprocessing, clustering, classification and association rule mining. These tools are particularly useful for identifying patterns and relationships in large datasets and building predictive models. Data mining tools are used in various industries, including finance, healthcare and retail. Cloud-based tools Cloud-based platforms such as Google Cloud, Amazon Web Services (AWS) and Microsoft Azure offer a range of tools and services for data analysis. They provide a scalable and flexible infrastructure for storing and processing data and offer a range of data analysis and visualization tools. Cloud-based tools are particularly useful for working with large and complex datasets, as they offer high-performance computing resources and the ability to scale up or down depending on the project’s needs. Text analytics tools Text analytics tools like RapidMiner and SAS Text Analytics are used to analyze unstructured data, such as text documents or social media posts. They use natural language processing (NLP) techniques to extract insights from text data, such as sentiment analysis, entity recognition and topic modeling. Text analytics tools are used in a range of industries, including marketing, customer service and political analysis. Geographic Information System (GIS) tools GIS tools such as ArcGIS and QGIS are used to analyze and visualize geospatial data. They allow users to map data and perform spatial analysis, such as identifying patterns and trends in geographical data or performing spatial queries. GIS tools are used in a range of industries, including urban planning, environmental management and transportation. Endnote Exploratory data analysis, or EDA, is an essential step that must be conducted before moving forward with data analysis. It helps data scientists and analysts to understand and gain insights into the data they are working on. It helps discover missing or wrong data that may lead to bias or fault in the final analysis. Analysts can guarantee that the data used for
  • 13. 13/13 analysis is accurate and reliable by cleaning and preprocessing the data during the EDA process. EDA methods can also facilitate feature selection, identifying the vital features to be included in machine learning models and improving model performance. Overall, EDA allows for detecting anomalies, patterns and relationships within the data, which can help businesses make informed decisions and acquire a competitive edge in the fast-evolving tech sphere. Take data analysis to the next level! Contact our Data Scientists and Data Analysts to perform a comprehensive Exploratory Data Analysis (EDA) and unlock valuable insights from your data.