R and Python, A Code Demo

R and Python, A Code Demo
Side by side code compare, basic introduction for one to
another switcher
Vineet Jaiswal
https://www.linkedin.com/in/jaiswalvineet/
R Programming
R is a programming language and free
software environment for statistical computing
and graphics
https://en.wikipedia.org/wiki/R_(programming_language)
Vineet | 2
Python Programming
Python is an open source interpreted high-level
programming language for general-purpose
programming.
https://en.wikipedia.org/wiki/Python_(programming_language)
Vineet | 3
Vineet | 4
I am skipping all the theoretical part, which
is important for your basic understanding
and recommended to learn.
The main focus of this presentation,
practical and real life problems, expected
that you know any of the both languages.
About this presentation
IDE
5 Vineet | 5
R Python
RStudio,
Visual Studio,
Jupyter,
Eclipse
Spyder,
Jupyter,
PyCharm,
Rodeo,
Visual Studio,
Eclipse
Key Notes :
RStudio is clear winner in R, one of the best IDE in any
language, Jupyter is very innovative, work on browser and
suitable for both
Data Operations
Arrays
Indexing, Slicing, and Iterating
Reshaping
Shallow vs deep copy
Broadcasting
Indexing (advanced)
Matrices
Matrix decompositions
Vineet | 6
R Python
R programming is specifically
for statistical computing, so
its naturally supporting all
aspect of it
NumPy + SciPy
Data Manipulation
Reading data
Selecting columns and rows
Filtering
Vectorized string operations
Missing values
Handling time
Time series
Vineet | 7
R Python
R Naturally support it but few
packages are awesome
plyr
Data.table
dplyr
Pandas (extraordinary) , On
top of NumPy
Data Visualization
Creating a Graph
Histograms and Density Plots
Dot Plots
Bar Plots
Line Charts
Pie Charts
Boxplots
Scatterplots
And a lot
Vineet | 8
R Python
R Naturally support it but its
classical, the most used one
Ggplot2(best), lattice, ggvis,
ggally
MatPlotLib, bokeh, plot.ly,
Google Visualization, D3
Key Notes :
Most of the libraries are
available in both, specially
which render on browser, still
R is most preferable for data
visualization
Machine Learning
Feature extraction
Classification
Regression
Clustering
Dimension reduction
Model selection
Decision Making
Deep learning
And a lot of mathematics
Vineet | 9
R Python
Caret, mlr, XGBoost, gbm,
H2O, CNTK
TensorFlow, SCIKIT-LEARN,
Keras, Theano, CNTK
Key Notes :
TensorFlow, a product by
google, which naturally support
Python kind of syntax but also
support R, is best for machine
learning and deep learning. A
winner without second thought
Text Mining
Natural language Processing
Recommendation Engine
Talking robot
Web Scraping
Chatbot
Or any thing which deal with
languages
Next level, Computer vision
Vineet | 10
R Python
TidyText, OpenNLP, tm,
Rweka
NLTK, TextBlob, textmining,
SpaCy
Key Notes :
NLTK is best for understanding all the basics
Most of the big players are giving ready to use API based
service
Install and use Libraries/Packages and working directories
Vineet | 11
R Python
install.packages(‘[Package Name’) in Rstudio
library([Package Name]) full package
dplyr::select selective package
Working Directory
getwd()
setwd(‘C://file/path’)
python -m pip install [package-name] in Window
CMD
import pandas as pd full package
from textblob.classifiers import
NaiveBayesClassifier selective package
Working Directory
import os
os.path.abspath('')
cd C:FilePath
Assignment and taking help
Vineet | 12
R Python
‘<-’ recommend but ‘=’ also working
one <- 1
?mean from base R
help.search(‘weighted mean’) like kind of search
help(package = ‘dplyr’) package search
??mean global search
‘=‘
one = 1
help(len) help on build in function 'len'
dir(len) list of all help
len? help on build in function 'len'
help(pd.unique) a package specific help, here pd
is pandas
Vector, List and Array
Vineet | 13
R Python
Vector
> c(1,2,3,4,5)
[1] 1 2 3 4 5
> c('a','b', 'c','d')
[1] "a" "b" "c" "d"
Multidimensional
two <- list(c(1.5,2,3), c(4,5,6))
[[1]]
[1] 1.5 2.0 3.0
[[2]]
[1] 4 5 6
import numpy as np
Array
np.array([1,2,3,4,5])
np.array(['a','b','c','d'])
List
[1,2,3,4,5]
['a','b','c','d']
Multidimensional
two = np.array([(1.5,2,3), (4,5,6)], dtype = float)
array([[1.5, 2. , 3. ],
[4. , 5. , 6. ]])
Multi dimensional list | Subsetting
Vineet | 14
R Python
> two[1]
[[1]]
[1] 1.5 2.0 3.0
> two[[2:3]]
[1] 6
> lapply(two, `[[`, 1)
[[1]]
[1] 1.5
[[2]]
[1] 4
two[1]
array([4., 5., 6.])
two[1,2]
6
two[0:2,0:1]
array([[1.5],
[4. ]])
Must remember : R index start from 1 and Python index start from 0
Data Frame, the most used one
Vineet | 15
R Python
> name <- c('vineet', 'jaiswal')
> id <- c(28,30)
> df = data.frame(id, name)
> df
id name
1 28 vineet
2 30 jaiswal
From List
df = pd.DataFrame(
[[4, 7],
[5, 8]],
index=[1, 2],
columns=['a', 'b'])
From Dictionary
data = {'Name':['Vineet', 'Jaiswal'],'Id':[28,30]}
df = pd.DataFrame(data)
Pandas also supporting multi index DataFrame
and I don't find anything in R, but yes, work around
available
Common command and Missing values
Vineet | 16
R Python
summary(df)
str(df)
dim(df)
head(df)
tail(df)
length(df)
Missing values, represented as NA,
is.na(x)
is.nullI(x) for null values
df.describe()
df.info()
df.head()
df.tail()
df.shape
len(df)
Missing values, represented as NaN
math.isnan(x)
df.isnull().any() for data frame null values
Querying, Filtering and Sampling
Vineet | 17
R Python
slice(df, 1:10)
filter(df, col1 == 1, col2 == 1)
df[df$col1 == 1 & df$col2 == 1,]
select(df, col1, col2)
select(df, col1:col3)
select(df, -(col1:col3))
distinct(select(df, col1))
distinct(select(df, col1, col2))
sample_n(df, 10)
sample_frac(df, 0.01)
df.iloc[:9]
df.query('col1 == 1 & col2 == 1')
df[(df.col1 == 1) & (df.col2 == 1)]
df[['col1', 'col2']]
df.loc[:, 'col1':'col3']
df.drop(cols_to_drop, axis=1) but see [1]
df[['col1']].drop_duplicates()
df[['col1', 'col2']].drop_duplicates()
df.sample(n=10)
df.sample(frac=0.01)
Sorting, Transforming, Grouping and Summarizing
Vineet | 18
R Python
arrange(df, col1, col2)
arrange(df, desc(col1))
select(df, col_one = col1)
rename(df, col_one = col1)
mutate(df, c=a-b)
gdf <- group_by(df, col1)
summarise(gdf, avg=mean(col1, na.rm=TRUE))
summarise(gdf, total=sum(col1))
df.sort_values(['col1', 'col2'])
df.sort_values('col1', ascending=False)
df.rename(columns={'col1': 'col_one'})['col_one']
df.rename(columns={'col1': 'col_one'})
df.assign(c=df.a-df.b)
gdf = df.groupby('col1')
df.groupby('col1').agg({'col1': 'mean'})
df.groupby('col1').sum()
Apply and big operations
Vineet | 19
R Python
Apply a function on matrix
apply(), lapply() , sapply(), vapply(), mapply(),
rapply(), and tapply()
The most popular way to perform bigger operation
with data in SQL kind of syntax : Piping
iris %>%
group_by(Species) %>%
summarise(avg = mean(Sepal.Width)) %>%
arrange(avg)
As python is general purpose programming
language, you kind perform any kind of
collection/generic operation but the most of time
we are using
Lambda , map, reduce, filter and list
comprehension
It is advisable to learn both
R and Python
Vineet | 20
All the images, external links, etc are taken from internet for
non-business purposes, It helps to explain the concept, all
credit goes to their respective owners.
All the content are based on my best knowledge, please
validate before use and let me know if any error, feedback.
21
Thanks!
Contact me:
Vineet Jaiswal
https://www.linkedin.com/in/jais
walvineet/
1 of 22

Recommended

Introduction to Python Pandas for Data Analytics by
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
1.7K views115 slides
Python and Machine Learning by
Python and Machine LearningPython and Machine Learning
Python and Machine Learningtrygub
9.8K views58 slides
What is Python? An overview of Python for science. by
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
2K views23 slides
Advanced Python Tutorial | Learn Advanced Python Concepts | Python Programmin... by
Advanced Python Tutorial | Learn Advanced Python Concepts | Python Programmin...Advanced Python Tutorial | Learn Advanced Python Concepts | Python Programmin...
Advanced Python Tutorial | Learn Advanced Python Concepts | Python Programmin...Edureka!
2.2K views36 slides
Python basics by
Python basicsPython basics
Python basicsRANAALIMAJEEDRAJPUT
2.5K views37 slides
Python - the basics by
Python - the basicsPython - the basics
Python - the basicsUniversity of Technology
5.1K views130 slides

More Related Content

What's hot

Programming with Python - Basic by
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - BasicMosky Liu
8.2K views87 slides
Introduction to Python by
Introduction to Python Introduction to Python
Introduction to Python amiable_indian
36.8K views62 slides
Overview of python 2019 by
Overview of python 2019Overview of python 2019
Overview of python 2019Samir Mohanty
709 views199 slides
Python Tutorial Part 2 by
Python Tutorial Part 2Python Tutorial Part 2
Python Tutorial Part 2Haitham El-Ghareeb
1.8K views50 slides
Python Presentation by
Python PresentationPython Presentation
Python PresentationNarendra Sisodiya
89.3K views47 slides
Python Tutorial Part 1 by
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1Haitham El-Ghareeb
3.8K views76 slides

What's hot(20)

Programming with Python - Basic by Mosky Liu
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - Basic
Mosky Liu8.2K views
Introduction to Python by amiable_indian
Introduction to Python Introduction to Python
Introduction to Python
amiable_indian36.8K views
The Benefits of Type Hints by masahitojp
The Benefits of Type HintsThe Benefits of Type Hints
The Benefits of Type Hints
masahitojp2.2K views
Why Python? by Adam Pah
Why Python?Why Python?
Why Python?
Adam Pah1.8K views
Python Programming Language by Laxman Puri
Python Programming LanguagePython Programming Language
Python Programming Language
Laxman Puri15.6K views
Fundamentals of Python Programming by Kamal Acharya
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
Kamal Acharya9.8K views
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA by Maulik Borsaniya
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAChapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Maulik Borsaniya16.2K views
PYTHON CURRENT TREND APPLICATIONS- AN OVERVIEW by EditorIJAERD
PYTHON CURRENT TREND APPLICATIONS- AN OVERVIEWPYTHON CURRENT TREND APPLICATIONS- AN OVERVIEW
PYTHON CURRENT TREND APPLICATIONS- AN OVERVIEW
EditorIJAERD675 views
Python basics_ part1 by Elaf A.Saeed
Python basics_ part1Python basics_ part1
Python basics_ part1
Elaf A.Saeed130 views

Similar to R and Python, A Code Demo

Tableau integration with R by
Tableau integration with RTableau integration with R
Tableau integration with RRaghu Kalyan Anna
302 views21 slides
Docopt, beautiful command-line options for R, user2014 by
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
9.4K views17 slides
Language-agnostic data analysis workflows and reproducible research by
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
197 views49 slides
Best corporate-r-programming-training-in-mumbai by
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
1K views203 slides
R programming Language , Rahul Singh by
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul SinghRavi Basil
1.3K views22 slides
Introduction to R and R Studio by
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
503 views14 slides

Similar to R and Python, A Code Demo(20)

Docopt, beautiful command-line options for R, user2014 by Edwin de Jonge
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
Edwin de Jonge9.4K views
Language-agnostic data analysis workflows and reproducible research by Andrew Lowe
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe197 views
Best corporate-r-programming-training-in-mumbai by Unmesh Baile
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
Unmesh Baile1K views
R programming Language , Rahul Singh by Ravi Basil
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
Ravi Basil1.3K views
Introduction to R and R Studio by Rupak Roy
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
Rupak Roy503 views
What we can learn from Rebol? by lichtkind
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
lichtkind1.4K views
Data Science - Part II - Working with R & R studio by Derek Kane
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane6.2K views
R programming language by Keerti Verma
R programming languageR programming language
R programming language
Keerti Verma9.4K views
Brief Introduction to R and RStudio.pdf by HarryRobinson22
Brief Introduction to R and RStudio.pdfBrief Introduction to R and RStudio.pdf
Brief Introduction to R and RStudio.pdf
HarryRobinson2224 views

Recently uploaded

Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesShapeBlue
178 views15 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
103 views59 slides
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...ShapeBlue
121 views15 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
50 views69 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
74 views38 slides
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
74 views17 slides

Recently uploaded(20)

Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue178 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash103 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue121 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue74 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10110 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue48 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue114 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue68 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue154 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue93 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc130 views

R and Python, A Code Demo

  • 1. R and Python, A Code Demo Side by side code compare, basic introduction for one to another switcher Vineet Jaiswal https://www.linkedin.com/in/jaiswalvineet/
  • 2. R Programming R is a programming language and free software environment for statistical computing and graphics https://en.wikipedia.org/wiki/R_(programming_language) Vineet | 2
  • 3. Python Programming Python is an open source interpreted high-level programming language for general-purpose programming. https://en.wikipedia.org/wiki/Python_(programming_language) Vineet | 3
  • 4. Vineet | 4 I am skipping all the theoretical part, which is important for your basic understanding and recommended to learn. The main focus of this presentation, practical and real life problems, expected that you know any of the both languages. About this presentation
  • 5. IDE 5 Vineet | 5 R Python RStudio, Visual Studio, Jupyter, Eclipse Spyder, Jupyter, PyCharm, Rodeo, Visual Studio, Eclipse Key Notes : RStudio is clear winner in R, one of the best IDE in any language, Jupyter is very innovative, work on browser and suitable for both
  • 6. Data Operations Arrays Indexing, Slicing, and Iterating Reshaping Shallow vs deep copy Broadcasting Indexing (advanced) Matrices Matrix decompositions Vineet | 6 R Python R programming is specifically for statistical computing, so its naturally supporting all aspect of it NumPy + SciPy
  • 7. Data Manipulation Reading data Selecting columns and rows Filtering Vectorized string operations Missing values Handling time Time series Vineet | 7 R Python R Naturally support it but few packages are awesome plyr Data.table dplyr Pandas (extraordinary) , On top of NumPy
  • 8. Data Visualization Creating a Graph Histograms and Density Plots Dot Plots Bar Plots Line Charts Pie Charts Boxplots Scatterplots And a lot Vineet | 8 R Python R Naturally support it but its classical, the most used one Ggplot2(best), lattice, ggvis, ggally MatPlotLib, bokeh, plot.ly, Google Visualization, D3 Key Notes : Most of the libraries are available in both, specially which render on browser, still R is most preferable for data visualization
  • 9. Machine Learning Feature extraction Classification Regression Clustering Dimension reduction Model selection Decision Making Deep learning And a lot of mathematics Vineet | 9 R Python Caret, mlr, XGBoost, gbm, H2O, CNTK TensorFlow, SCIKIT-LEARN, Keras, Theano, CNTK Key Notes : TensorFlow, a product by google, which naturally support Python kind of syntax but also support R, is best for machine learning and deep learning. A winner without second thought
  • 10. Text Mining Natural language Processing Recommendation Engine Talking robot Web Scraping Chatbot Or any thing which deal with languages Next level, Computer vision Vineet | 10 R Python TidyText, OpenNLP, tm, Rweka NLTK, TextBlob, textmining, SpaCy Key Notes : NLTK is best for understanding all the basics Most of the big players are giving ready to use API based service
  • 11. Install and use Libraries/Packages and working directories Vineet | 11 R Python install.packages(‘[Package Name’) in Rstudio library([Package Name]) full package dplyr::select selective package Working Directory getwd() setwd(‘C://file/path’) python -m pip install [package-name] in Window CMD import pandas as pd full package from textblob.classifiers import NaiveBayesClassifier selective package Working Directory import os os.path.abspath('') cd C:FilePath
  • 12. Assignment and taking help Vineet | 12 R Python ‘<-’ recommend but ‘=’ also working one <- 1 ?mean from base R help.search(‘weighted mean’) like kind of search help(package = ‘dplyr’) package search ??mean global search ‘=‘ one = 1 help(len) help on build in function 'len' dir(len) list of all help len? help on build in function 'len' help(pd.unique) a package specific help, here pd is pandas
  • 13. Vector, List and Array Vineet | 13 R Python Vector > c(1,2,3,4,5) [1] 1 2 3 4 5 > c('a','b', 'c','d') [1] "a" "b" "c" "d" Multidimensional two <- list(c(1.5,2,3), c(4,5,6)) [[1]] [1] 1.5 2.0 3.0 [[2]] [1] 4 5 6 import numpy as np Array np.array([1,2,3,4,5]) np.array(['a','b','c','d']) List [1,2,3,4,5] ['a','b','c','d'] Multidimensional two = np.array([(1.5,2,3), (4,5,6)], dtype = float) array([[1.5, 2. , 3. ], [4. , 5. , 6. ]])
  • 14. Multi dimensional list | Subsetting Vineet | 14 R Python > two[1] [[1]] [1] 1.5 2.0 3.0 > two[[2:3]] [1] 6 > lapply(two, `[[`, 1) [[1]] [1] 1.5 [[2]] [1] 4 two[1] array([4., 5., 6.]) two[1,2] 6 two[0:2,0:1] array([[1.5], [4. ]]) Must remember : R index start from 1 and Python index start from 0
  • 15. Data Frame, the most used one Vineet | 15 R Python > name <- c('vineet', 'jaiswal') > id <- c(28,30) > df = data.frame(id, name) > df id name 1 28 vineet 2 30 jaiswal From List df = pd.DataFrame( [[4, 7], [5, 8]], index=[1, 2], columns=['a', 'b']) From Dictionary data = {'Name':['Vineet', 'Jaiswal'],'Id':[28,30]} df = pd.DataFrame(data) Pandas also supporting multi index DataFrame and I don't find anything in R, but yes, work around available
  • 16. Common command and Missing values Vineet | 16 R Python summary(df) str(df) dim(df) head(df) tail(df) length(df) Missing values, represented as NA, is.na(x) is.nullI(x) for null values df.describe() df.info() df.head() df.tail() df.shape len(df) Missing values, represented as NaN math.isnan(x) df.isnull().any() for data frame null values
  • 17. Querying, Filtering and Sampling Vineet | 17 R Python slice(df, 1:10) filter(df, col1 == 1, col2 == 1) df[df$col1 == 1 & df$col2 == 1,] select(df, col1, col2) select(df, col1:col3) select(df, -(col1:col3)) distinct(select(df, col1)) distinct(select(df, col1, col2)) sample_n(df, 10) sample_frac(df, 0.01) df.iloc[:9] df.query('col1 == 1 & col2 == 1') df[(df.col1 == 1) & (df.col2 == 1)] df[['col1', 'col2']] df.loc[:, 'col1':'col3'] df.drop(cols_to_drop, axis=1) but see [1] df[['col1']].drop_duplicates() df[['col1', 'col2']].drop_duplicates() df.sample(n=10) df.sample(frac=0.01)
  • 18. Sorting, Transforming, Grouping and Summarizing Vineet | 18 R Python arrange(df, col1, col2) arrange(df, desc(col1)) select(df, col_one = col1) rename(df, col_one = col1) mutate(df, c=a-b) gdf <- group_by(df, col1) summarise(gdf, avg=mean(col1, na.rm=TRUE)) summarise(gdf, total=sum(col1)) df.sort_values(['col1', 'col2']) df.sort_values('col1', ascending=False) df.rename(columns={'col1': 'col_one'})['col_one'] df.rename(columns={'col1': 'col_one'}) df.assign(c=df.a-df.b) gdf = df.groupby('col1') df.groupby('col1').agg({'col1': 'mean'}) df.groupby('col1').sum()
  • 19. Apply and big operations Vineet | 19 R Python Apply a function on matrix apply(), lapply() , sapply(), vapply(), mapply(), rapply(), and tapply() The most popular way to perform bigger operation with data in SQL kind of syntax : Piping iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width)) %>% arrange(avg) As python is general purpose programming language, you kind perform any kind of collection/generic operation but the most of time we are using Lambda , map, reduce, filter and list comprehension
  • 20. It is advisable to learn both R and Python Vineet | 20
  • 21. All the images, external links, etc are taken from internet for non-business purposes, It helps to explain the concept, all credit goes to their respective owners. All the content are based on my best knowledge, please validate before use and let me know if any error, feedback. 21