This document provides an overview of exploratory data analysis (EDA) for machine learning applications. It discusses identifying data sources, collecting data, and the machine learning process. The main part of EDA involves cleaning, preprocessing, and visualizing data to gain insights through descriptive statistics and data visualizations like histograms, scatter plots, and boxplots. This allows discovering patterns, errors, outliers and missing values to understand the dataset before building models.
Data visualization in data science: exploratory EDA, explanatory. Anscobe's quartet, design principles, visual encoding, design engineering and journalism, choosing the right graph, narrative structures, technology and tools.
Introduction to Data Science, Prerequisites (tidyverse), Import Data (readr), Data Tyding (tidyr),
pivot_longer(), pivot_wider(), separate(), unite(), Data Transformation (dplyr - Grammar of Manipulation): arrange(), filter(),
select(), mutate(), summarise()m
Data Visualization (ggplot - Grammar of Graphics): Column Chart, Stacked Column Graph, Bar Graph, Line Graph, Dual Axis Chart, Area Chart, Pie Chart, Heat Map, Scatter Chart, Bubble Chart
SQLBits Module 2 RStats Introduction to R and StatisticsJen Stirrup
SQLBits Module 2 RStats Introduction to R and Statistics. This is a 90 minute segment of a full preconference workshop, focusing on data analytics with R.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
Data Science, Statistical Analysis and R... Learn what those mean, how they can help you find answers to your questions and complement the existing toolsets and processes you are currently using to make sense of data. We will explore R and the RStudio development environment, installing and using R packages, basic and essential data structures and data types, plotting graphics, manipulating data frames and how to connect R and SQL Server.
A walk through the maze of understanding Data Visualization using several tools such as Python, R, Knime and Google Data Studio.
This workshop is hands-on and this set of presentations is designed to be an agenda to the workshop
Data visualization in data science: exploratory EDA, explanatory. Anscobe's quartet, design principles, visual encoding, design engineering and journalism, choosing the right graph, narrative structures, technology and tools.
Introduction to Data Science, Prerequisites (tidyverse), Import Data (readr), Data Tyding (tidyr),
pivot_longer(), pivot_wider(), separate(), unite(), Data Transformation (dplyr - Grammar of Manipulation): arrange(), filter(),
select(), mutate(), summarise()m
Data Visualization (ggplot - Grammar of Graphics): Column Chart, Stacked Column Graph, Bar Graph, Line Graph, Dual Axis Chart, Area Chart, Pie Chart, Heat Map, Scatter Chart, Bubble Chart
SQLBits Module 2 RStats Introduction to R and StatisticsJen Stirrup
SQLBits Module 2 RStats Introduction to R and Statistics. This is a 90 minute segment of a full preconference workshop, focusing on data analytics with R.
How can I become a data scientist? What are the most valuable skills to learn for a data scientist now? Could I learn how to be a data scientist by going through online tutorials? What does a data scientist do?
These are only some of the questions that are being discussed online, on blogs, on forums and on knowledge-sharing platforms like Quora.
Let me share the Beginner's Guide to Data Science which will be really helpful to you.
Also Checkout: http://bit.ly/2Mub6xP
Data Science, Statistical Analysis and R... Learn what those mean, how they can help you find answers to your questions and complement the existing toolsets and processes you are currently using to make sense of data. We will explore R and the RStudio development environment, installing and using R packages, basic and essential data structures and data types, plotting graphics, manipulating data frames and how to connect R and SQL Server.
A walk through the maze of understanding Data Visualization using several tools such as Python, R, Knime and Google Data Studio.
This workshop is hands-on and this set of presentations is designed to be an agenda to the workshop
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
2. Table of Contents
• Understand the ML best practice and project roadmap
• Identify the data source(s) and Data Collection
• Machine Learning process
• Exploratory Data Analysis(EDA)
1
3. Understand the ML best practice and project roadmap
• When a customer wants to
implement ML(Machine
Learning) for the identified
business problem(s) after
multiple discussions along
with the following
stakeholders from both sides
– Business, Architect,
Infrastructure, Operations,
and others.
2
4. Identify the data source(s) and Data Collection
• Organization’s key
application(s) – it would be
Internal or External
application or web-sites
• It would be streaming data
from the web
(Twitter/Facebook – any
Social media)
• Once you’re comfortable
with the available data, you
can start work on the rest of
the Machine Learning
process model.
3
7. What is Exploratory Data
Analysis
6
EDA is an approach for data analysis using variety
of techniques to gain insights about the data.
• Cleaning and preprocessing
• Statistical Analysis
• Visualization for trend analysis,
anomaly detection, outlier
detection (and removal).
Basic steps in any
exploratory data
analysis:
8. Importance of EDA
Understanding the given dataset and helps clean up the given dataset.
It gives you a clear picture of the features and the relationships between them.
Discover errors, outliers, and missing values in the data.
Identify patterns by visualizing data in graphs such as bar graphs, scatter plots,
heatmaps and histograms.
7
9. EDA using Pandas
Import data into workplace(Jupyter notebook, Google colab, Python IDE)
Descriptive statistics
Removal of nulls
Visualization
8
10. 1. Packages and data import
• Step 1 : Import pandas to the workplace.
• “Import pandas”
• Step 2 : Read data/dataset into Pandas dataframe. Different input
formats include:
• Excel : read_excel
• CSV: read_csv
• JSON: read_json
• HTML and many more
9
11. 2. Descriptive
Stats (Pandas)
• Used to make preliminary assessments about the population
distribution of the variable.
• Commonly used statistics:
1. Central tendency :
• Mean – The average value of all the data points. :
dataframe.mean()
• Median – The middle value when all the data points are
put in an ordered list: dataframe.median()
• Mode – The data point which occurs the most in the
dataset :dataframe.mode()
2. Spread : It is the measure of how far the datapoints are away
from the mean or median
• Variance - The variance is the mean of the squares of the
individual deviations: dataframe.var()
• Standard deviation - The standard deviation is the square
root of the variance:dataframe.std()
3. Skewness: It is a measure of asymmetry: dataframe.skew()
12. Descriptive
Stats (contd.)
Other methods to get a quick look on the data:
• Describe() : Summarizes the central tendency,
dispersion and shape of a dataset’s distribution,
excluding NaN values.
• Syntax: pandas.dataframe.describe()
• Info() :Prints a concise summary of the
dataframe. This method prints information
about a dataframe including the index dtype
and columns, non-null values and memory
usage.
• Syntax: pandas.dataframe.info()
13. 3. Null values
12
Detecting
Detecting Null-
values:
•Isnull(): It is used as an
alias for dataframe.isna().
This function returns the
dataframe with boolean
values indicating missing
values.
•Syntax :
dataframe.isnull()
Handling
Handling null values:
•Dropping the rows with
null values: dropna()
function is used to delete
rows or columns with
null values.
•Replacing missing values:
fillna() function can fill
the missing values with a
special value value like
mean or median.
14. 4. Visualization
• Univariate: Looking at one variable/column at a time
• Bar-graph
• Histograms
• Boxplot
• Multivariate : Looking at relationship between two or more variables
• Scatter plots
• Pie plots
• Heatmaps(seaborn)
13
15. Bar-Graph,Histogram
and Boxplot
• Bar graph: A bar plot is a plot that presents
data with rectangular bars with lengths
proportional to the values that they represent.
• Boxplot : Depicts numerical data graphically
through their quartiles. The box extends from
the Q1 to Q3 quartile values of the data, with
a line at the median (Q2).
• Histogram: A histogram is a representation of
the distribution of data.
14
16. Scatterplot, Pieplot
• Scatterplot : Shows the data as a collection of points.
• Syntax: dataframe.plot.scatter(x = 'x_column_name', y = 'y_columnn_name’)
• Pie plot : Proportional representation of the numerical data in a column.
• Syntax: dataframe.plot.pie(y=‘column_name’)
15
17. Outlier detection
• An outlier is a point or set of data points that lie away from the rest of
the data values of the dataset..
• Outliers are easily identified by visualizing the data.
• For e.g.
• In a boxplot, the data points which lie outside the upper and lower bound can
be considered as outliers
• In a scatterplot, the data points which lie outside the groups of datapoints can
be considered as outliers
16
18. Outlier removal
• Calculate the IQR as follows:
Calculate the first and third quartile (Q1 and Q3)
Calculate the interquartile range, IQR = Q3-Q1
Find the lower bound which is Q1*1.5
Find the upper bound which is Q3*1.5
Replace the data points which lie outside this range.
They can be replaced by mean or median.
17
19. Hope now you have some idea, let’s implement all these using
the Automobile – Predictive Analysis dataset.
18
Hands on Demonstration