Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

R and Python, A Code Demo


Published on

Side by side code compare, basic introduction for one to another switcher

Published in: Technology

R and Python, A Code Demo

  1. 1. R and Python, A Code Demo Side by side code compare, basic introduction for one to another switcher Vineet Jaiswal
  2. 2. R Programming R is a programming language and free software environment for statistical computing and graphics Vineet | 2
  3. 3. Python Programming Python is an open source interpreted high-level programming language for general-purpose programming. Vineet | 3
  4. 4. Vineet | 4 I am skipping all the theoretical part, which is important for your basic understanding and recommended to learn. The main focus of this presentation, practical and real life problems, expected that you know any of the both languages. About this presentation
  5. 5. IDE 5 Vineet | 5 R Python RStudio, Visual Studio, Jupyter, Eclipse Spyder, Jupyter, PyCharm, Rodeo, Visual Studio, Eclipse Key Notes : RStudio is clear winner in R, one of the best IDE in any language, Jupyter is very innovative, work on browser and suitable for both
  6. 6. Data Operations Arrays Indexing, Slicing, and Iterating Reshaping Shallow vs deep copy Broadcasting Indexing (advanced) Matrices Matrix decompositions Vineet | 6 R Python R programming is specifically for statistical computing, so its naturally supporting all aspect of it NumPy + SciPy
  7. 7. Data Manipulation Reading data Selecting columns and rows Filtering Vectorized string operations Missing values Handling time Time series Vineet | 7 R Python R Naturally support it but few packages are awesome plyr Data.table dplyr Pandas (extraordinary) , On top of NumPy
  8. 8. Data Visualization Creating a Graph Histograms and Density Plots Dot Plots Bar Plots Line Charts Pie Charts Boxplots Scatterplots And a lot Vineet | 8 R Python R Naturally support it but its classical, the most used one Ggplot2(best), lattice, ggvis, ggally MatPlotLib, bokeh,, Google Visualization, D3 Key Notes : Most of the libraries are available in both, specially which render on browser, still R is most preferable for data visualization
  9. 9. Machine Learning Feature extraction Classification Regression Clustering Dimension reduction Model selection Decision Making Deep learning And a lot of mathematics Vineet | 9 R Python Caret, mlr, XGBoost, gbm, H2O, CNTK TensorFlow, SCIKIT-LEARN, Keras, Theano, CNTK Key Notes : TensorFlow, a product by google, which naturally support Python kind of syntax but also support R, is best for machine learning and deep learning. A winner without second thought
  10. 10. Text Mining Natural language Processing Recommendation Engine Talking robot Web Scraping Chatbot Or any thing which deal with languages Next level, Computer vision Vineet | 10 R Python TidyText, OpenNLP, tm, Rweka NLTK, TextBlob, textmining, SpaCy Key Notes : NLTK is best for understanding all the basics Most of the big players are giving ready to use API based service
  11. 11. Install and use Libraries/Packages and working directories Vineet | 11 R Python install.packages(‘[Package Name’) in Rstudio library([Package Name]) full package dplyr::select selective package Working Directory getwd() setwd(‘C://file/path’) python -m pip install [package-name] in Window CMD import pandas as pd full package from textblob.classifiers import NaiveBayesClassifier selective package Working Directory import os os.path.abspath('') cd C:FilePath
  12. 12. Assignment and taking help Vineet | 12 R Python ‘<-’ recommend but ‘=’ also working one <- 1 ?mean from base R‘weighted mean’) like kind of search help(package = ‘dplyr’) package search ??mean global search ‘=‘ one = 1 help(len) help on build in function 'len' dir(len) list of all help len? help on build in function 'len' help(pd.unique) a package specific help, here pd is pandas
  13. 13. Vector, List and Array Vineet | 13 R Python Vector > c(1,2,3,4,5) [1] 1 2 3 4 5 > c('a','b', 'c','d') [1] "a" "b" "c" "d" Multidimensional two <- list(c(1.5,2,3), c(4,5,6)) [[1]] [1] 1.5 2.0 3.0 [[2]] [1] 4 5 6 import numpy as np Array np.array([1,2,3,4,5]) np.array(['a','b','c','d']) List [1,2,3,4,5] ['a','b','c','d'] Multidimensional two = np.array([(1.5,2,3), (4,5,6)], dtype = float) array([[1.5, 2. , 3. ], [4. , 5. , 6. ]])
  14. 14. Multi dimensional list | Subsetting Vineet | 14 R Python > two[1] [[1]] [1] 1.5 2.0 3.0 > two[[2:3]] [1] 6 > lapply(two, `[[`, 1) [[1]] [1] 1.5 [[2]] [1] 4 two[1] array([4., 5., 6.]) two[1,2] 6 two[0:2,0:1] array([[1.5], [4. ]]) Must remember : R index start from 1 and Python index start from 0
  15. 15. Data Frame, the most used one Vineet | 15 R Python > name <- c('vineet', 'jaiswal') > id <- c(28,30) > df = data.frame(id, name) > df id name 1 28 vineet 2 30 jaiswal From List df = pd.DataFrame( [[4, 7], [5, 8]], index=[1, 2], columns=['a', 'b']) From Dictionary data = {'Name':['Vineet', 'Jaiswal'],'Id':[28,30]} df = pd.DataFrame(data) Pandas also supporting multi index DataFrame and I don't find anything in R, but yes, work around available
  16. 16. Common command and Missing values Vineet | 16 R Python summary(df) str(df) dim(df) head(df) tail(df) length(df) Missing values, represented as NA, is.nullI(x) for null values df.describe() df.head() df.tail() df.shape len(df) Missing values, represented as NaN math.isnan(x) df.isnull().any() for data frame null values
  17. 17. Querying, Filtering and Sampling Vineet | 17 R Python slice(df, 1:10) filter(df, col1 == 1, col2 == 1) df[df$col1 == 1 & df$col2 == 1,] select(df, col1, col2) select(df, col1:col3) select(df, -(col1:col3)) distinct(select(df, col1)) distinct(select(df, col1, col2)) sample_n(df, 10) sample_frac(df, 0.01) df.iloc[:9] df.query('col1 == 1 & col2 == 1') df[(df.col1 == 1) & (df.col2 == 1)] df[['col1', 'col2']] df.loc[:, 'col1':'col3'] df.drop(cols_to_drop, axis=1) but see [1] df[['col1']].drop_duplicates() df[['col1', 'col2']].drop_duplicates() df.sample(n=10) df.sample(frac=0.01)
  18. 18. Sorting, Transforming, Grouping and Summarizing Vineet | 18 R Python arrange(df, col1, col2) arrange(df, desc(col1)) select(df, col_one = col1) rename(df, col_one = col1) mutate(df, c=a-b) gdf <- group_by(df, col1) summarise(gdf, avg=mean(col1, na.rm=TRUE)) summarise(gdf, total=sum(col1)) df.sort_values(['col1', 'col2']) df.sort_values('col1', ascending=False) df.rename(columns={'col1': 'col_one'})['col_one'] df.rename(columns={'col1': 'col_one'}) df.assign(c=df.a-df.b) gdf = df.groupby('col1') df.groupby('col1').agg({'col1': 'mean'}) df.groupby('col1').sum()
  19. 19. Apply and big operations Vineet | 19 R Python Apply a function on matrix apply(), lapply() , sapply(), vapply(), mapply(), rapply(), and tapply() The most popular way to perform bigger operation with data in SQL kind of syntax : Piping iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width)) %>% arrange(avg) As python is general purpose programming language, you kind perform any kind of collection/generic operation but the most of time we are using Lambda , map, reduce, filter and list comprehension
  20. 20. It is advisable to learn both R and Python Vineet | 20
  21. 21. All the images, external links, etc are taken from internet for non-business purposes, It helps to explain the concept, all credit goes to their respective owners. All the content are based on my best knowledge, please validate before use and let me know if any error, feedback. 21
  22. 22. Thanks! Contact me: Vineet Jaiswal walvineet/