1. R and Python, A Code Demo
Side by side code compare, basic introduction for one to
another switcher
Vineet Jaiswal
https://www.linkedin.com/in/jaiswalvineet/
2. R Programming
R is a programming language and free
software environment for statistical computing
and graphics
https://en.wikipedia.org/wiki/R_(programming_language)
Vineet | 2
3. Python Programming
Python is an open source interpreted high-level
programming language for general-purpose
programming.
https://en.wikipedia.org/wiki/Python_(programming_language)
Vineet | 3
4. Vineet | 4
I am skipping all the theoretical part, which
is important for your basic understanding
and recommended to learn.
The main focus of this presentation,
practical and real life problems, expected
that you know any of the both languages.
About this presentation
5. IDE
5 Vineet | 5
R Python
RStudio,
Visual Studio,
Jupyter,
Eclipse
Spyder,
Jupyter,
PyCharm,
Rodeo,
Visual Studio,
Eclipse
Key Notes :
RStudio is clear winner in R, one of the best IDE in any
language, Jupyter is very innovative, work on browser and
suitable for both
6. Data Operations
Arrays
Indexing, Slicing, and Iterating
Reshaping
Shallow vs deep copy
Broadcasting
Indexing (advanced)
Matrices
Matrix decompositions
Vineet | 6
R Python
R programming is specifically
for statistical computing, so
its naturally supporting all
aspect of it
NumPy + SciPy
7. Data Manipulation
Reading data
Selecting columns and rows
Filtering
Vectorized string operations
Missing values
Handling time
Time series
Vineet | 7
R Python
R Naturally support it but few
packages are awesome
plyr
Data.table
dplyr
Pandas (extraordinary) , On
top of NumPy
8. Data Visualization
Creating a Graph
Histograms and Density Plots
Dot Plots
Bar Plots
Line Charts
Pie Charts
Boxplots
Scatterplots
And a lot
Vineet | 8
R Python
R Naturally support it but its
classical, the most used one
Ggplot2(best), lattice, ggvis,
ggally
MatPlotLib, bokeh, plot.ly,
Google Visualization, D3
Key Notes :
Most of the libraries are
available in both, specially
which render on browser, still
R is most preferable for data
visualization
9. Machine Learning
Feature extraction
Classification
Regression
Clustering
Dimension reduction
Model selection
Decision Making
Deep learning
And a lot of mathematics
Vineet | 9
R Python
Caret, mlr, XGBoost, gbm,
H2O, CNTK
TensorFlow, SCIKIT-LEARN,
Keras, Theano, CNTK
Key Notes :
TensorFlow, a product by
google, which naturally support
Python kind of syntax but also
support R, is best for machine
learning and deep learning. A
winner without second thought
10. Text Mining
Natural language Processing
Recommendation Engine
Talking robot
Web Scraping
Chatbot
Or any thing which deal with
languages
Next level, Computer vision
Vineet | 10
R Python
TidyText, OpenNLP, tm,
Rweka
NLTK, TextBlob, textmining,
SpaCy
Key Notes :
NLTK is best for understanding all the basics
Most of the big players are giving ready to use API based
service
11. Install and use Libraries/Packages and working directories
Vineet | 11
R Python
install.packages(‘[Package Name’) in Rstudio
library([Package Name]) full package
dplyr::select selective package
Working Directory
getwd()
setwd(‘C://file/path’)
python -m pip install [package-name] in Window
CMD
import pandas as pd full package
from textblob.classifiers import
NaiveBayesClassifier selective package
Working Directory
import os
os.path.abspath('')
cd C:FilePath
12. Assignment and taking help
Vineet | 12
R Python
‘<-’ recommend but ‘=’ also working
one <- 1
?mean from base R
help.search(‘weighted mean’) like kind of search
help(package = ‘dplyr’) package search
??mean global search
‘=‘
one = 1
help(len) help on build in function 'len'
dir(len) list of all help
len? help on build in function 'len'
help(pd.unique) a package specific help, here pd
is pandas
13. Vector, List and Array
Vineet | 13
R Python
Vector
> c(1,2,3,4,5)
[1] 1 2 3 4 5
> c('a','b', 'c','d')
[1] "a" "b" "c" "d"
Multidimensional
two <- list(c(1.5,2,3), c(4,5,6))
[[1]]
[1] 1.5 2.0 3.0
[[2]]
[1] 4 5 6
import numpy as np
Array
np.array([1,2,3,4,5])
np.array(['a','b','c','d'])
List
[1,2,3,4,5]
['a','b','c','d']
Multidimensional
two = np.array([(1.5,2,3), (4,5,6)], dtype = float)
array([[1.5, 2. , 3. ],
[4. , 5. , 6. ]])
14. Multi dimensional list | Subsetting
Vineet | 14
R Python
> two[1]
[[1]]
[1] 1.5 2.0 3.0
> two[[2:3]]
[1] 6
> lapply(two, `[[`, 1)
[[1]]
[1] 1.5
[[2]]
[1] 4
two[1]
array([4., 5., 6.])
two[1,2]
6
two[0:2,0:1]
array([[1.5],
[4. ]])
Must remember : R index start from 1 and Python index start from 0
15. Data Frame, the most used one
Vineet | 15
R Python
> name <- c('vineet', 'jaiswal')
> id <- c(28,30)
> df = data.frame(id, name)
> df
id name
1 28 vineet
2 30 jaiswal
From List
df = pd.DataFrame(
[[4, 7],
[5, 8]],
index=[1, 2],
columns=['a', 'b'])
From Dictionary
data = {'Name':['Vineet', 'Jaiswal'],'Id':[28,30]}
df = pd.DataFrame(data)
Pandas also supporting multi index DataFrame
and I don't find anything in R, but yes, work around
available
16. Common command and Missing values
Vineet | 16
R Python
summary(df)
str(df)
dim(df)
head(df)
tail(df)
length(df)
Missing values, represented as NA,
is.na(x)
is.nullI(x) for null values
df.describe()
df.info()
df.head()
df.tail()
df.shape
len(df)
Missing values, represented as NaN
math.isnan(x)
df.isnull().any() for data frame null values
19. Apply and big operations
Vineet | 19
R Python
Apply a function on matrix
apply(), lapply() , sapply(), vapply(), mapply(),
rapply(), and tapply()
The most popular way to perform bigger operation
with data in SQL kind of syntax : Piping
iris %>%
group_by(Species) %>%
summarise(avg = mean(Sepal.Width)) %>%
arrange(avg)
As python is general purpose programming
language, you kind perform any kind of
collection/generic operation but the most of time
we are using
Lambda , map, reduce, filter and list
comprehension
21. All the images, external links, etc are taken from internet for
non-business purposes, It helps to explain the concept, all
credit goes to their respective owners.
All the content are based on my best knowledge, please
validate before use and let me know if any error, feedback.
21