By Rahul Gulab Singh
 What is Data Science and its
Application
 Stages of Data science and Project
roles
 Modelling methods namely
classification , decision tree ,
random forest
 Demo on technology using R studio
and programming
 Data science is managing the process that can
transform hypotheses and data into actionable
predictions.
Acquire
Data
Manage
Data
Choose
Modelling
Method
Write
Code
Verify
Result
 Amazon’s product recommendation systems
 Google’s advertisement valuation systems
 LinkedIn’s contact recommendation system
 Twitter’s trending topics
 Walmart’s consumer demand projection
systems
• Statistics, Linear Algebra, Optimization,
Time Series, etc.Math and Theory
• Machine Learning, Data Structures,
Parallel Algorithms, etc.
Applied
Algorithms
• Storage and computing platforms,
statistical tools ,etc.Technologies
• Text, Finance, Images, Econometrics etc.Domain Expertise
• Visualization, InfographicsArt
•Represents the business interestsProject sponsor
•Represents end users’ interestsClient
•Sets and executes analytic strategyData scientist
•Manages data and data storageData architect
•Manages infrastructureOperations
Define the
Goal
Collect
and
manage
data
Build the
model
Evaluate
the model
Present
results
Deploy
the model
 Prediction of customer buying pattern
 Identifying fraudulent transactions
 Determining price elasticity
 Best way to present product listings when a
customer searches
 Customer segmentation
 Evaluating marketing campaigns
 Organizing new products into a product
catalog
Linear
Discriminant
Analysis (LDA)
Classification and
Regression Trees
(CART)
k-Nearest
Neighbors (kNN)
Support Vector
Machines (SVM)
with a linear
kernel
Random Forest
(RF)
Training , Test
and Validation
 Training , Test and Validation
 Loan application prediction example
DAT
A
Test/
Train
Split
Trainin
g
DATA
Test
DATA
Training
Process
Model
Predictio
ns
 Example :- Finding bad loan applications
 Input variables :-
Age, salary , any other loan , address, other
income , education , background data
 1000 applications exist out of which 200 have
been defaulted
 Decision Tree for identifying Potential
defaulters
Durat
ion>5
0
Amo
unt>
4
millio
n
Amo
unt>
1mil
Amo
unt<
5 mil
Bad
(0.68)
Dura
tion
>120
Good
(0.75)
Good
(0.56)
Bad
(1.0)
Good
(0.61)
Bad
(0.88)
Input Variables
Input Variables
Tree 1 Tree 3Tree 2
Input
All Trees
Prediction
Tree1:
Tree2:
Tree3:
Random
Forest
Predicts:
Application where random forest algorithm is
widely used:
 Banking -loyal customer and fraud customers
 Medicine-Disease (patient’s medical records)
 Stock Market- Stock behavior, loss , Profit
 E-commerce- Similar customer , segmentation
 Example : Male , Female distribution
Hair
Length
(cms)
60
40
20
0/ 140 150 160 170 180 190 200
Height (cms)
 Example : Male , Female distribution
Hair
Length
(cms)
60
40
20
0/ 140 150 160 170 180 190 200
Height (cms)
 Installing the R platform.
 Loading the dataset.
 Summarizing the dataset.
 Visualizing the dataset.
 Evaluating some algorithms.
 Making some predictions
 Practical Data Science with R
 Demo commands
 R and R Studio installation files

Data scientist Methods | Artificial Intelligence | Rahul Gulab Singh

  • 1.
  • 2.
     What isData Science and its Application  Stages of Data science and Project roles  Modelling methods namely classification , decision tree , random forest  Demo on technology using R studio and programming
  • 3.
     Data scienceis managing the process that can transform hypotheses and data into actionable predictions. Acquire Data Manage Data Choose Modelling Method Write Code Verify Result
  • 4.
     Amazon’s productrecommendation systems  Google’s advertisement valuation systems  LinkedIn’s contact recommendation system  Twitter’s trending topics  Walmart’s consumer demand projection systems
  • 5.
    • Statistics, LinearAlgebra, Optimization, Time Series, etc.Math and Theory • Machine Learning, Data Structures, Parallel Algorithms, etc. Applied Algorithms • Storage and computing platforms, statistical tools ,etc.Technologies • Text, Finance, Images, Econometrics etc.Domain Expertise • Visualization, InfographicsArt
  • 6.
    •Represents the businessinterestsProject sponsor •Represents end users’ interestsClient •Sets and executes analytic strategyData scientist •Manages data and data storageData architect •Manages infrastructureOperations
  • 7.
  • 8.
     Prediction ofcustomer buying pattern  Identifying fraudulent transactions  Determining price elasticity  Best way to present product listings when a customer searches  Customer segmentation  Evaluating marketing campaigns  Organizing new products into a product catalog
  • 9.
    Linear Discriminant Analysis (LDA) Classification and RegressionTrees (CART) k-Nearest Neighbors (kNN) Support Vector Machines (SVM) with a linear kernel Random Forest (RF) Training , Test and Validation
  • 10.
     Training ,Test and Validation  Loan application prediction example DAT A Test/ Train Split Trainin g DATA Test DATA Training Process Model Predictio ns
  • 11.
     Example :-Finding bad loan applications  Input variables :- Age, salary , any other loan , address, other income , education , background data  1000 applications exist out of which 200 have been defaulted  Decision Tree for identifying Potential defaulters
  • 12.
  • 13.
  • 14.
  • 15.
    Tree 1 Tree3Tree 2
  • 16.
  • 17.
    Application where randomforest algorithm is widely used:  Banking -loyal customer and fraud customers  Medicine-Disease (patient’s medical records)  Stock Market- Stock behavior, loss , Profit  E-commerce- Similar customer , segmentation
  • 18.
     Example :Male , Female distribution Hair Length (cms) 60 40 20 0/ 140 150 160 170 180 190 200 Height (cms)
  • 19.
     Example :Male , Female distribution Hair Length (cms) 60 40 20 0/ 140 150 160 170 180 190 200 Height (cms)
  • 21.
     Installing theR platform.  Loading the dataset.  Summarizing the dataset.  Visualizing the dataset.  Evaluating some algorithms.  Making some predictions
  • 22.
     Practical DataScience with R  Demo commands  R and R Studio installation files