Göteborg university(condensed)

Advanced Analytics,
Big Data
and
Being a Data Scientist
Zenodia Charpy

1. Introduction to data science – where did it come from
2. Why did I become a data scientist ?
3. Definition of data science
4. Data science skillset map
5. Data science process – one off vs. production pipeline
6. Data science process breakdown – a bit more detail
7. Various Data Science tools
8. Q&A
Agenda of today

Data Science – where did it come from ?

Google trend – what people are searching
1 2 3 4
Source : https://www.google.com/trends/explore?q=cloud%20computing,virtualization,big%20data,data%20science
1
2
3
4

Google trend
1 2 3 4
1
2
3
4

Cloud computing Virtualization Big Data Data Science

Cloud computing
Virtualization
Data Science
Big Data
what people are searching – top 5 keywords

Examples of what make
the data so big
Source: http://cloud-dba-journey.blogspot.se/2013/10/demystifying-hadoop-for-data-architects.html

Data Science can help
to reveal these insights
Data Value from
business’s perspective

complexity
Time find patterns
. . .
Data Science

Why did I become a data scientist ?

WHY ?
As an analyst for many years…
I realise …

Act on
Customer
Time (weekly) Time!
Time (weekly)
Time
(+6 months) Time (monthly)
Insight to action – too slow !
Request insights
The Analysts
Issues discovered
1. Data is not centralized /syncronized
2. Data quality is bad
3. Organization’s hierarchy slow down
decision making process
4. NO Common KPIs (isolated measurement)
5. Marketing Strategy strongly
depending on gut-feelings ( historical
reason )
6. Knowledge gaps & misconceptions
(focus on visualization, not necessary facts)
7. Insufficient information
( insufficient data sources to answer to
the given question)
monitor
marketers
Answering , usually in a
dashboard/reports … format
Analysing

How did it happened ?
Fragmented data view
1. Focus on Database as the only truth
2. Limited data sources ( mostly DB +
clickstreams)
3. Central data repository non-existed
4. Common definiton of a customer
non-existed
5. Customers’ ever-changing behavior
( historical vs real time behavioural data )
6. Marketers’ believes vs. real
evidence about the customers

Skewed data view –
example : seeing is believing, really ?

Data Science can at least answer to SOME of those concerns !
But . . .
it heavily depends on how mature is the organization

Organization
Maturity
Data Maturity
Resistance to change
Isolated acceptance
Growing importance
Embracing throughout
business disciplines
Data-driven
product & organization
Fragmented data
(Ad-hoc reports focused)
Central Data lake
(exploratory analysis)
360 data view
In real time
(predictive analytics)
Data governance
(Data quality control)
Data driven enterprise
strategy
(recommender system)
Source : https://datafloq.com/read/five-levels-big-data-maturity-organisation/259

Data Scientist – definition !

Data science is a "concept to unify statistics, data analysis and their
related methods" in order to "understand and analyze actual phenomena"
with data. It employs techniques and theories drawn from many fields
within the broad areas of mathematics, statistics, information science,
and computer science, in particular from the subdomains of machine
learning, classification, cluster analysis, data mining, databases,
and visualization.
Short definition (wikipedia)

Typical characteristics :
Is question specific
Bias-Variance tradeoff + over/under fitting
Split data into training , testing ( validation ) sets
Can be combined with other algorithms
Can utilize parallelization
Deal with all kinds of data (incl. unstructured)
Data mining technique ( for big data) is applied
Machine Learning(ML)
Predictive analytics
(Supervised Learning)
Typical Characteristic:
Focus on feature engineering ( variables selection)
Exploration vs exploitation
Prediction preformance decade quickly with time
Mostly ad-hoc | one-off based
Deal with all kinds of data ( when applying machine
learning) or else mostly structured|semi-structured data
Typical characteristics:
Ad-hoc based
Limited data blending
Mostly structured data ( from database)
Focus on historical statistic models
Modelling focus on finding correlation or
describing existed datasets
Inferential
+ Exploratory
+ Descriptive
Data Science synonyms … what includes what

Data Science knowledge-domains overlays

Data Scientist – the mytical creature ?

Fire-breathing dragon Real-life dragon (relaxed version)

Data Scientist – The skillset map
Unicorn version vs your own path !

Not on the map but equally important
Teamwork essentials -
• Story-telling
• visualization
• Cooperation/team building
• Inter-personal / inspiration coach
• Open mind
• Knowledge sharing
Personality traits –
• Extreme Curiosity
• Detective spirit
• Naive and stupid
• Strong ethic (data protection / privacy
law)

My journey – my own version
Tree Trunk :
Skillsets yet to
be acquired
Math
(University)
Statistic
(University)
Computer
Science
(Master)
The ground
Data Science threshold
Specialization areas/
Further development
• Programming : R & Python
• Machine Learning Algorithms
• Data mining techniques
• Cloud services (Virtualization concepts)
• Big Data Eco systems
• Bayesian Statistics
• Graph Theory (option)
• Text mining techniques(option)
Analyst
(work experience)
Roots :
Your initial foundation
• Leadership /Team building
• Recommender system
• Experimental design
• Game theory
• Story-telling/presentation skills
• New model development
• Deep Learning  artificial
Intelligence
Tree branches & leaves :
Specialized interests
Motivation
is the key !

Waterfall (M. C. Escher)
Monument valley

What motivate you ?
What would your path look like ?
(15 mins Break)

Refresh our memory from previous section -
• Relationship between data science and big data
• What motivate me to become a data scientist
• The definition of data science and it’s closely related
synonym
• The skillset map for becoming a data scientist ( unicorn
version vs. your own)
• Motivation is the key !

WHY teamwork approach
Ask yourself the follow questions . . .

Do you have unlimited amount of time ?
Knowledge bank
Do you think that you know absolutely EVERYTHING there is to know on earth ?

Source : https://www.datacamp.com/community/tutorials/data-science-industry-infographic#gs.Y=gqm9w

A Data Science Dream Team
In REALITY . . .

data science process
one-off (POC) vs. production pipeline

Where are these two approaches came from ?
due to organization maturity . . .

Traditional
BI
Data- Driven
Organization
& Products
Data silos –
Fragmented data views
Resistance to
Change
Isolated
Acceptance
DataLake Acquisition
Growing
Importance
Data Quality and Governance
Embrace throughout
Business Disciplines
Automated data management &
administration
Organization maturity
Phase 1
(Infancy)
Phase 2
(Technical adoption)
Phase 3
(Business adoption)
Phase 4
(Data&Analytic as a Service)
Phase component
Real-time
dashboard(s)
Algorithm embedded
dashboard(s)
Algorithm Performance
dashboard(s)
Visualization of deliveries
Pattern
detecting
Unsupervised
learning
Supervised Learning
Recommender
System(s)
Deep Learning
Possible type of ML used in each phase
Data exploration
Experimental
design
Map data sources vs
customers touch points
Acquire solution for
architecture
Control data
Quality
merge data sources and
automise processing
Design experiment – extract
preference data
Platform maturity
(data + technology)
Pipe-line data processing &
application flow

Traditional
BI
Data- Driven
Organization
& Products
Data silos –
Fragmented data views
Resistance to
Change
Isolated
Acceptance
DataLake Acquisition
Growing
Importance
Data Quality and Governance
Embrace throughout
Business Disciplines
Automated data management &
administration
Organization maturity
Phase 1
(Infancy)
Phase 2
(Technical adoption)
Phase 3
(Business adoption)
Phase 4
(Data&Analytic as a Service)
Phase component
Real-time
dashboard(s)
Algorithm embedded
dashboard(s)
Algorithm Performance
dashboard(s)
Visualization of deliveries
Pattern
detecting
Unsupervised
learning
Supervised Learning
Recommender
System(s)
Deep Learning
Possible type of ML used in each phase
Data exploration
Experimental
design
Map data sources vs
customers touch points
Acquire solution for
architecture
Control data
Quality
merge data sources and
automise processing
Design experiment – extract
preference data
Platform maturity
(data + technology)
Pipe-line data processing &
application flow
One-off
(Proof Of Concepts=POC)
Production PipeLine

The two approaches -
one-off (POC) vs. production pipeline

Data engineer
Business
knowledge Data scientist IT support
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
deliverables
One-Off
iterations
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deployment
Apply to
Application
Production
Pipelines
Performance Optimization
Enable
automization

Compare the two approach

Data engineer
Business
knowledge Data scientist IT support
One-Off
iterations
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deployment
Apply to
Application
Production
Pipelines
Performance Optimization
Enable
automization
70-80% 10%~20%

comparison
Oragnization maturity
What are they looking for
Project scope
Platform & technology
Data source availbility
Data quality
Deliverbles
One-off
phrase 1 phrase 2
To understand how data science
work (baby step)
Small 4 -8 weeks
Do not change anything existed
inhouse
Mainly DB + 1 or 2 additional
datasource
Poor, need lots of clearning
Focus in intepretation(visualized)
Production
Pipeline
Phrase +2 and forward
Participate in data science
process
At least 3 months and above
Consider or already migrate to
new platform/technology
Start to map out all available
datasources
Start to sort out data quality
Focus on code( hence limitation
on programming language)

Data Science Process –
Box-in the activities overview
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

Define
Business
Question
Define the
goal
Decompose
the question
Verify
understanding
Project
Scoping
Map data
sources
Establish
performance
measure
Data scientist
Workspace
Task Force
Business
limitation
Define project
scope
Data
acquisition &
Preparation
Environment
set up
Languages:
SQL, R,
Python…etc
Data sources
merging
Data pre-scan
Q&A
Data Quality
review
Descriptive
statistics
(data
exploration)
Explore data
(plots)
Data
manipulation
Outliers/NA s
summary
statistics
Data explore
review
Features
Engineering
Establish
performance
threshold
Features
engineering
Algorithms
selection
Bueinss sign off
Model
building &
validation
Type of models
Model selection
criteria
Build and
Validate the
model
Review results
Deploy
/deliverables
To whom
On what platform
Update
Frequency
Performance
review
Infographic(visual
ization)
Deployment
review

Step-wised Data Science Process :
from Business Question  Scoping
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

questions
How to get
the data
(access)
done
Datalake
Environment
set up
issues
Extract
Next : About Data
SpecifyNot
ready
?
The Scope
1. thresholds
2. Data scope
3. Resource
4. taskforce
5. Limitations
6. Budget &
timeline…etc
define
NOT done
Ready
Question  Scope

Data acquisition  data preparation
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

Main table
(PK= Transaction ID
FK=StoreID )
Acquire data – merge the data sources
Customer Interests
(PK=email address)
6.Joined by email
Data source type : social
3.Joined by StoreID
Promotions : campaign name,
campaign duration, in which store,
discout level…etc
(PK=CampaignID,
FK=StoreID)
Data source type : campaign tool
1.Joined by
TrasactionID
Customer Purchase informaiton
(PK=CID
FK=Transaction ID)
Customer Database
(PK=CID
FK=email)Joined by CID
Data source type : DatabaseData source type : Database
4.Joined by
StoreID
Store Survey : questions, scale of
satisfaction, product rating..etc
(PK=SurveyID,
FK=StoreID)
Data source type : Survey tool
Store Geo Info: location, km to center, km
to customer’s address, kms to competitor’s
store in the same postcode region…etc
(PK=StoreID)
5.Joined by
StoreID
Data source type : API calls
2.Joined by
Transaction IDWebsite Browsing :
Pages viewed, avg time on site ,
product browsed..etc
(PK=CookieID,
FK=TrasactionID)
Data source type : clickstream
The GOAL

Descriptive Statistics
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

A flower called iris
3
Sentosa Virginica Versicolor
Source: https://en.wikipedia.org/wiki/Iris_flower_data_set

width
LengthSepal
Petal
Sepal Petal

Step-wised Data Science Process:
Features engineering
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

- Observation from Descriptive Statistics
- Remove highly correlated columns/parameters
(example slides further down the presentation)
- Candidate models’ requirement ?
- Some model requires you to do One-Hot-Encoding ( example Neural Network, PCA , Kmeans clusering )
- Outliers sensitive or not ? ( example: regression models are more sensitive to outliers than tree models)
- Forward stepwised /Backward stepwised / shrinkage selection concepts vs.
Blackbox model rank features importance ?
- Computing time vs. response
- Business limitations
( example, business equire to shink the features to <=20 )
Feature Selection ( things to consider)

Example (justifying selected features)
Background :
you’ve done an exploratory analysis about correlation,
you have the result and now you need to explain it in a 5-
year’s-old-can-understand way and use the exploratory
results to do your feature selection !

explain Correlation with a metaphor
Interval of distance
Direction to the right
A B

Observation Interval of distance
Direction to the right
A B
Highly correlated(0.75~1) : Tesla car and Volvo car moving almost at the same speed and toward the same
direction
Negatively correlated(<0) : Tesla car and Volvo car moving toward different directions
Positively correlated (0.5 ~0.75) : Tesla car move a bit faster than Volvo car but they are still both heading
at the same direction
explain Correlation with a metaphor continued

Linear
Correlation
In the following slides, for intuitive
convenience purpose we rescale
and map the correlation coefficient
into the % format - - -
Example :
Strong positive correlation :
1  100%
where:
is the covariance of varible X and Y
is th standard deviation of X
is th standard deviation of Y
Pearson’s correlation :

The result of the analysis
Externalsheettempexhaustpipe
External sheet temp exhaust pipe
Actual exhaust temperature exhaust pipe
Actualexhausttemperatureexhaustpipe
Process value regulator under pressure
Processvalueregulatorunderpressure
Process value regulator hood damper
Processvalueregulatorhooddamper
Negative pressure exhaust pipe
Negativepressureexhaustpipe
Regulator value hood damper
Regulator value exhaust damper
Actualvaluedamperexhaustpipe
Regulatorvalueexhaustdamper
Regulatorprocessvalue
Actual value damper exhaust pipe

Before we leave this metaphor –
one last thing :
” correlation does not impley causation ! ”

Correlation does not imply causation !
Question : Why did these two cars (Tesla car and Volvo car) move toward the same direction in the first place?
Guess 1 : husband and wife
I drive
Tesla car
I drive
Volvo car
Guess 2 : racing track
A B
A B
Guess 3 : coincidence

Before diving into training your model(s) …
ask yourself :
what type of model should I use ?
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
deployment

Question :
Do you have the correct
answer to a given
business question ?
Supervised learning
Regressions
Classes
Unsupervised learning
Deep learning
Clustering
Association analysis
What type of models are suitable ?
YES
NO

Before diving into training your model(s) …
Models landscape
1. Supervised
2. Unsupervised
3.Deep learning
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
deployment

Supervised Learning
Regressions:
Linear Regression
Step-wised Regression
Piecewise Polynomials and splines
Smoothing Splines
Logistic Regression
Multivariate Adaptive Regression Splines
Least Absolute Shrinkage and Selection Operator (LASSO)
Ridge Regression
Linear Discriminant Analysis (LDA)
Trees :
Decision trees
Gradient Boosted Regression trees
Adaptive Boosting trees (AdaBoost)
Conditional Inference trees (CI trees)
Bootstrap Aggregation (Bagging) trees
Gradient Boosted Machines(GBM)
Random Forest (RF)
Support Vector Machines (SVM) :
Support vector classifier (two class)
Support vector classifier (multiclass)
Kernels and support vector machines
Dimensionality reduction:
Principal Component Analysis(PCA)
Singular Value Decomposition (SVD)
MinHash
Locality Sensitive Hashing(LSH)
t-Distributed Stochastic Neighbor embedding (t-SNE)
Clustering :
Kmeans Clustering
Hierarchical Clustering
Bradley-Fayyad-Reina (BFR) clustering
Clustering Using REpresentatives CURE clustering
Bayesian networks
Topic modelling
Market Basket :
Apriori (association rules)
Park Chen and Yu algorithm (PCY)
Savasere, Omiecinski and Navathe (SON)
Toivonen’s algorithm
Stream Analysis :
Bloom filters
Flajolet-Martin Algorithm
Alon-Matias-Szegedy
Datar-Gionis-Indyk-Motwani algorithm
Unsupervised Learning
NeuralNetwork families
Deep Learning
Perceptrons
Simple Neural Networks (fully connected )
Deep Boltzmann machines
Convolutional neural networks
Recurrent neural networks
Hierarchical temporal memory
Genetic algorithm (chromosome)
Multi-arm bandit
K’s Nearest Neighbors (KNN)
Content based recommender
User-User recommender
Item-item recommender
Hybrid recommender
Latent Dirichlet Allocation recommender
Recommender Systems
Others
Others

Data Science Process :
Model training Model Validation
( example : supervised learning)
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
deployment

pre-processed data
Validation
set
Training set Test set
Split into
Train
ML models
Check
Select one winning model
Models that pass the
testing set
Winning
model
Monitor model
performance
Re-train
the
models ?
Yes
No
decide
Sampling from
live data
streams
If we want to be REALLY picky
Live testing the
winning model

Model selection criteria
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

Example ( justifying how you select the model)
Background:
you built a prediction model (let’s say to classify customer
purchase=Yes/No), now you need to explain why did you
pick THAT alrogithm in the first place !

criterias logistic trees RF GBM weights
Performance
=Accuracy 86,5% 86,7% 86,8% 85,8% 10%
Sensitivity
4,6% 12,5% 8,4% 21,4% 20%
interpretability 1 0,8 0,4 0,2 30%
Time to
compute 1 0,8 0,2 0,2 20%
# of
parameters 2,4 2,4 1,89 2,38 10%
Conflict to use
regression Yes partial minimum minimum 10%
Ranking 1,016 1,063 0,625 0,894 100%
Performance=(true positive+true negative)/test set’s population  the model correctly predicted on Both whether you are a Purchaser or NonPurchaser
Sensitivity =True positive/all positives on test set  the model correctly predicted that you are going to purchase
Construct criteria for model selection – input both from business as well as data characteristicsNone of the Numerical data is normally distributed

explain your model
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

Example (explaion the selected the model)
Background:
Now I have select a model called recursive Partition tree (rPart),
the stakeholders asked me to explain how this model works …

High level - Conceptually
Medium level - a bit more detail
Recurssive Partitioning Tree (rPart)–
How does it work ?
Explained in 2 levels . . .

High level – rPart how does it work ?
Parent node
Use both criteria 1 & 2
to decide whether to
split or not
Child node 2.1
(repeat the same
thing)
Child node 2.2
(repeat the same
thing)
For every parameters Pi , check
1) Is spliting on Pi with value Xi
gives me more information ?
2) Is split on Pi with value Xi
gives me better accuracy for prediction?
Note: information is defined by inforamtion theory and have the
option of Gini index and information gain( link )
• Minisplit - the minimum number of
observations that must exist in a node in order for a
split to be attempted
• Minibucket-minimum observation in terminal
node =minsplit/3
• cp- complexity parameter,punish the model if too
many parameters will used and not much of
increasing of accuracy/information were achieved
Criteria 1 Criteria 2
Split on Parameter Pi
with value Xi
YESNO
… …
Tree Split nodes on : Hyper-parameters

Medium Level – a bit more detail
1) information gain 2) accuracy improvement

Scenario 2 :
If the end nodes have 100 percent of the chance to say that a class to be
Purchaser or noPurchaser, it is perfect classification, hence this node is said to
be reaching minimum impurity (entropy=0)
calculation formula :
-P1(Purchase)log(P1(Purchase)) - P1(noPurchase)log(P1(noPurchase))
=0 -(1)*log2(1) +0 =0 minimum impurity
Scenario 1 :
If the end nodes have 50-50 percent of the chance to say that a class to be
Purchaser or noPurchaser, it is as good as ’guess’, hence this node is said to be
reaching maximum impurity (entropy=1)
calculation formula :
=0 -(1/2)*log2(1/2)-(1/2)*log(1/2)+0 =1 maximum impurity
1) Information gain by checking the Impurity of the end nodes calculated by entropy
Total: 10 data points
Label :
5 Purchase+ 5 noPurchase
(end node1)
Label :
0 Purchase+5 noPurchase
(end node2)
Total: data points
Label :
Spliting condition 1Yes No
Scenario 1
P1(Purchase)=0
P1(noPurchase)=5/10 =1/2
P2(Purchase)=5/10
P(noPurchase)=0/10
Label :
(end node1)
Label :
0 Purchase+10 noPurchase
(end node2)
Total: data points
Label :
Spliting condition 1Yes No
Scenario 2
P1(Purchase)=0
P1(noPurchase)=10/10=1
P2(Purchase)=0
P(noPurchase)=0
0

2) how rpart calculating misclassification rate on parameter Pi with value Xi
20 data
points
10 data
points
10 data
points
Age <45?Yes No
Predict
noPurc
hase
=7
Predict
Purchase
=3
cntTotal <110?Yes No
Correct
classified
rate =1/7
Correct
classified
rate =1/3
Predict
noPurc
hase
=5
Predict
Purchase
=5
cntTotal <75 ?Yes No
Correct
classified
rate =1/5
Correct
classified
rate =1/5
rPart model will ask for each and every value Xi in
a parameter Pi
Was it a good idea´(via calculate the
missclassification rate) to split on this value and it
will do so for all parameter Pi on all possible value
Xi associated with Pi (see image on the left as an
example )
Overall misclassification rate
(True Purchase + true noPurchase) / total population
= 4/20 =20%
Misclassified =1- 20% =80%

deployment
Business
question
understanding
Data sources
scoping
Data
acquisition
Data
preparation
Descriptive
statistic
Features
engineering
Model training
Model
validation
Deploy
/deliverables

Board members /
CTO, CEO, CFO..etc
Marketing directors, Marketers
Processed data for
visualization
Data
scientist
Model Performance
Matrices & output
prediction
pass
business
owner’s
vision
Deliverables: One-off (POC)
Interpretability
Lesson learned - Final
reports or prototype
dashboards for internal
sales
WoW-effect Visualization

IT + Content
creators +
marketers
Processed data for
visualization
Data
scientist
Code for
embeddedment into
applications
Model Performance
Matrices & output
prediction
Pass
integration
test
Deployment : Production Pipeline
Reproducibility
Add to organization-wide
dashboards&reporting
pipeline (automated)
Embedded code directly
into applications
( content recommender, product mix vs
customer segments matching..etc)
Use the output of model
prediction for further
marketing purpose
( such as segmentation, customer
profiling..etc)
Process efficiency

Refresh our memory from previous sections
• Relationship between data science and big data
• What motivate me to become a data scientist
• The definition of data science and it’s closely related
synonym
• The skillset map for becoming a data scientist ( unicorn
version vs your own)
o Why team work approach
o Dream team mates
o Data science process : two approach ( why , compare ,
boxed-in activities)
o Data science process breakdown in details (step-wised)

Data Science Tools –
SPSS Modeler

SPSS modeler – visualized programming

Microsoft Azure ML
(demo)
URL : https://studio.azureml.net/

IBM data science experience/workbench
(Python+Jupyter Notebook demo)
URL : https://datascientistworkbench.com/

R+RStudio(demo)

Python and R cheatsheet

https://www.analyticsvidhya.com/blog/2016/12/cheatsheet-scikit-learn-caret-package-for-python-r-respectively/?utm_content=buffer3140b&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer

Göteborg university(condensed)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Göteborg university(condensed)

Similar to Göteborg university(condensed) (20)

More from Zenodia Charpy

More from Zenodia Charpy (7)

Recently uploaded

Recently uploaded (20)

Göteborg university(condensed)

Editor's Notes