SlideShare a Scribd company logo
MEET
OUR
TEAM
WRITE HERE SOMETHING
DATA EXPLORATION METHODS &
PRACTISES
Martin Bago | Instarea
8.10.2018
2nd Data Science Club, 18/19 Winter
MEET
OUR
TEAM
WRITE HERE SOMETHINGTABLE OF CONTENT
INTRO
FIRST DEEP INTO DATASET
GOING DEEPER
CORRELATIONS
BONUS
D A T A S C I E N C E C L U B
Martin Bago
Data Scientist | Instarea
Ing. @ Process Automation and Informatization in Industry (2016, MTF STU BA)
Bc. @ Applied Informatics (2014, FEI STU BA)
2017- now Data Scientist, Instarea s.r.o., Market Locator
2015-2016 Head of Analyst, News and Media Holding a.s.
2014-2015 SEO Analyst, Centrum Holdings a.s.
2011-2014 Automix.sk, Centrum Holdings a.s.
2010-2013 Editor-in-chief OKO Casopis (FEI STU BA)
Passionate driver, beer&coffee&football lover
Something for you
Download this presentation +
source code here:
http://bit.ly/2QybvNV
The Data journey…always the
same
Dataset
>> install.packages("datasets") #installing datasets package in R
>> library(datasets)
For studying there is an unique library consisting of many real-life dataset examples (from Monthly
Airline Passenger Numbers, thru Weight versus age of chicks on different diets to Monthly Deaths from
Lung Diseases in the UK) .
For this presentation we will use mtcars dataset.
How to find&use
Baby steps
head(), tail(), nrow() and ncol()
To understand, what are you working with is very important to see dimensions of dataset a number/count
of values.
>> head(mtcars)
>> tail(mtcars)
>> head(mtcars, 25)
>> nrow(mtcars)
>> ncol(mtcars)
Input: Output:
Deeper insight
str(), summary()
To deeper understanding of dataset use detailed views of metrics and
dimensions.
>> str(mtcars)
>> summary(mtcars)
Input: Output:
Always check data types!!!
Source
Unique and missing values
unique(), is.na()
Is crucial to find, how many values are missing from the dataset. If there is 2/3 missing,
you got wrong dataset.
>> unique(mtcars$cyl)
>> is.na(mtcars)
Input: Output:
If there is something missing, you can
use old&good method to treat that –
filling with mean.
>> mtcars$smt[is.na(mtcars$smt)] <-
mean(mtcars$smt, na.rm = TRUE)
Histograms
hist()
The best way to learn and understand, is visual
>> hist(mtcars$mpg)
>> hist(mtcars$hp)
Input: Output:
Output:
Transforming and recalculating
Often you need to calculate your own metrics. In R, it’s really
easy.
>> mtcars2 <- mtcars
>> mtcars2$disp_l <- mtcars$mpg/61.024
>> mtcars2$kml <- 235/mtcars$mpg
>> hist(mtcars2$disp_l)
Input: Output:
Understand the scope of
variablesboxplot()
>> boxplot(mtcars)
>> boxplot(mtcars2$disp_l, mtcars2$kml)
>> boxplot(mtcars2$kml, main = "mtcars dataset",
xlab = "Comsumption per 100km", ylab = "Liters")
Input:
Output:
Output:
How to read boxplot?
boxplot()
Does it correlate?
Library(corplot), cor()
>> install.packages("corrplot")
>> library(corrplot)
>> #cor(x, method = "pearson", use = "complete.obs")
>> cor(mtcars)
Input:
Output: Not very intuitive…
Does it correlate?
Library(corplot), cor()
>> res <- cor(mtcars)
>> round(res, 2)
>> corrplot(res, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 25)
Input: Output:
! Becareful !
Correlation is not causality
Heatmap via corrplot library
>> library(corrplot)
>> col<- colorRampPalette(c("blue", "white", "red"))(20)
>> heatmap(x = res, col = col, symm = TRUE)
Input: Output:
Does it correlate?
Or even deeper insight…
>>require(graphics)
pairs(mtcars2, main = "mtcars2 data", gap = 1/4)
coplot(kml ~ disp_l | as.factor(cyl), data = mtcars2,
panel = panel.smooth, rows = 1)
## possibly more meaningful, e.g., for summary() or
bivariate plots:
mtcars2 <- within(mtcars2, {
vs <- factor(vs, labels = c("V", "S"))
am <- factor(am, labels = c("automatic", "manual"))
cyl <- ordered(cyl)
gear <- ordered(gear)
carb <- ordered(carb)
})
summary(mtcars2)
Input: Output:
Library(corplot), cor()
Or even deeper insight…
>> install.packages("PerformanceAnalytics")
>> library(PerformanceAnalytics)
>> chart.Correlation(mtcars, histogram=TRUE, pch=19)
>> mtcars_small <- mtcars[,1:4]
>> chart.Correlation(mtcars_small, histogram=TRUE, pch=19)
Input: Output:
Library Performance Analytics
Bonus - anomaliesDetection
AnomalyDetectionTs()
As input in considered time-series or vector, at least two periods.
Madeby Twitter
What next?
To create customizable dashboards try
Shiny: Tableau-like Drag and Drop GUI Visualization in R use esquisse:
Something for you
Download this presentation +
source code here:
http://bit.ly/2QybvNV
Stay in touch
Instarea s.r.o.
29. Augusta 36/A
811 09 Bratislava
www.instarea.com
Martin Bago
Data Scientist
Instarea
martin.bago@instarea.com
+421 905 255 852
https://www.linkedin.com/in/martinbago/
Thank you!

More Related Content

What's hot

Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Edureka!
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
FAO
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Srishti44
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Edureka!
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
Douglas Joubert
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
izahn
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Dr. C.V. Suresh Babu
 
Data Preprocessing
Data PreprocessingData Preprocessing
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
zekeLabs Technologies
 
Data Management in R
Data Management in RData Management in R
Data Management in R
Sankhya_Analytics
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Simplilearn
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
Gang Tao
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
Sarojkumari55
 

What's hot (20)

Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
 

Similar to Exploratory data analysis in R - Data Science Club

A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
Peadar Coyle
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
Amazon Web Services
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
ITCamp
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
Databricks
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
Andrew Lovett-Barron
 
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Databricks
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
ICTeam S.p.A.
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Data Science Milan
 
Monzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the worldMonzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the world
Ryan Bateman
 
How to calculate a broadcast address ?
How to calculate a broadcast address ?How to calculate a broadcast address ?
How to calculate a broadcast address ?
Miguel Delamontagne
 
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
Emanuele Falzone
 
MUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case StudyMUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case Study
Fajar Nugroho
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environments
Shankar M S
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO
 
My Favorite Calc Code
My Favorite Calc CodeMy Favorite Calc Code
My Favorite Calc Code
Alithya
 
TabPy Presentation
TabPy PresentationTabPy Presentation
TabPy Presentation
Sanjana Jami
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summitOpen Analytics
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
univalence
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
PaulSombat
 
Decoupling Official Statistics
Decoupling Official StatisticsDecoupling Official Statistics
Decoupling Official Statistics
Xavier Badosa
 

Similar to Exploratory data analysis in R - Data Science Club (20)

A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
 
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
 
Monzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the worldMonzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the world
 
How to calculate a broadcast address ?
How to calculate a broadcast address ?How to calculate a broadcast address ?
How to calculate a broadcast address ?
 
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
 
MUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case StudyMUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case Study
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environments
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
 
My Favorite Calc Code
My Favorite Calc CodeMy Favorite Calc Code
My Favorite Calc Code
 
TabPy Presentation
TabPy PresentationTabPy Presentation
TabPy Presentation
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
Decoupling Official Statistics
Decoupling Official StatisticsDecoupling Official Statistics
Decoupling Official Statistics
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 

Exploratory data analysis in R - Data Science Club

  • 1. MEET OUR TEAM WRITE HERE SOMETHING DATA EXPLORATION METHODS & PRACTISES Martin Bago | Instarea 8.10.2018 2nd Data Science Club, 18/19 Winter
  • 2. MEET OUR TEAM WRITE HERE SOMETHINGTABLE OF CONTENT INTRO FIRST DEEP INTO DATASET GOING DEEPER CORRELATIONS BONUS D A T A S C I E N C E C L U B
  • 3. Martin Bago Data Scientist | Instarea Ing. @ Process Automation and Informatization in Industry (2016, MTF STU BA) Bc. @ Applied Informatics (2014, FEI STU BA) 2017- now Data Scientist, Instarea s.r.o., Market Locator 2015-2016 Head of Analyst, News and Media Holding a.s. 2014-2015 SEO Analyst, Centrum Holdings a.s. 2011-2014 Automix.sk, Centrum Holdings a.s. 2010-2013 Editor-in-chief OKO Casopis (FEI STU BA) Passionate driver, beer&coffee&football lover
  • 4. Something for you Download this presentation + source code here: http://bit.ly/2QybvNV
  • 6. Dataset >> install.packages("datasets") #installing datasets package in R >> library(datasets) For studying there is an unique library consisting of many real-life dataset examples (from Monthly Airline Passenger Numbers, thru Weight versus age of chicks on different diets to Monthly Deaths from Lung Diseases in the UK) . For this presentation we will use mtcars dataset. How to find&use
  • 7. Baby steps head(), tail(), nrow() and ncol() To understand, what are you working with is very important to see dimensions of dataset a number/count of values. >> head(mtcars) >> tail(mtcars) >> head(mtcars, 25) >> nrow(mtcars) >> ncol(mtcars) Input: Output:
  • 8. Deeper insight str(), summary() To deeper understanding of dataset use detailed views of metrics and dimensions. >> str(mtcars) >> summary(mtcars) Input: Output: Always check data types!!! Source
  • 9. Unique and missing values unique(), is.na() Is crucial to find, how many values are missing from the dataset. If there is 2/3 missing, you got wrong dataset. >> unique(mtcars$cyl) >> is.na(mtcars) Input: Output: If there is something missing, you can use old&good method to treat that – filling with mean. >> mtcars$smt[is.na(mtcars$smt)] <- mean(mtcars$smt, na.rm = TRUE)
  • 10. Histograms hist() The best way to learn and understand, is visual >> hist(mtcars$mpg) >> hist(mtcars$hp) Input: Output: Output:
  • 11. Transforming and recalculating Often you need to calculate your own metrics. In R, it’s really easy. >> mtcars2 <- mtcars >> mtcars2$disp_l <- mtcars$mpg/61.024 >> mtcars2$kml <- 235/mtcars$mpg >> hist(mtcars2$disp_l) Input: Output:
  • 12. Understand the scope of variablesboxplot() >> boxplot(mtcars) >> boxplot(mtcars2$disp_l, mtcars2$kml) >> boxplot(mtcars2$kml, main = "mtcars dataset", xlab = "Comsumption per 100km", ylab = "Liters") Input: Output: Output:
  • 13. How to read boxplot? boxplot()
  • 14. Does it correlate? Library(corplot), cor() >> install.packages("corrplot") >> library(corrplot) >> #cor(x, method = "pearson", use = "complete.obs") >> cor(mtcars) Input: Output: Not very intuitive…
  • 15. Does it correlate? Library(corplot), cor() >> res <- cor(mtcars) >> round(res, 2) >> corrplot(res, type = "upper", order = "hclust", tl.col = "black", tl.srt = 25) Input: Output: ! Becareful ! Correlation is not causality
  • 16. Heatmap via corrplot library >> library(corrplot) >> col<- colorRampPalette(c("blue", "white", "red"))(20) >> heatmap(x = res, col = col, symm = TRUE) Input: Output: Does it correlate?
  • 17. Or even deeper insight… >>require(graphics) pairs(mtcars2, main = "mtcars2 data", gap = 1/4) coplot(kml ~ disp_l | as.factor(cyl), data = mtcars2, panel = panel.smooth, rows = 1) ## possibly more meaningful, e.g., for summary() or bivariate plots: mtcars2 <- within(mtcars2, { vs <- factor(vs, labels = c("V", "S")) am <- factor(am, labels = c("automatic", "manual")) cyl <- ordered(cyl) gear <- ordered(gear) carb <- ordered(carb) }) summary(mtcars2) Input: Output: Library(corplot), cor()
  • 18. Or even deeper insight… >> install.packages("PerformanceAnalytics") >> library(PerformanceAnalytics) >> chart.Correlation(mtcars, histogram=TRUE, pch=19) >> mtcars_small <- mtcars[,1:4] >> chart.Correlation(mtcars_small, histogram=TRUE, pch=19) Input: Output: Library Performance Analytics
  • 19. Bonus - anomaliesDetection AnomalyDetectionTs() As input in considered time-series or vector, at least two periods. Madeby Twitter
  • 20. What next? To create customizable dashboards try Shiny: Tableau-like Drag and Drop GUI Visualization in R use esquisse:
  • 21. Something for you Download this presentation + source code here: http://bit.ly/2QybvNV
  • 22. Stay in touch Instarea s.r.o. 29. Augusta 36/A 811 09 Bratislava www.instarea.com Martin Bago Data Scientist Instarea martin.bago@instarea.com +421 905 255 852 https://www.linkedin.com/in/martinbago/ Thank you!