SlideShare a Scribd company logo
1 of 72
Machine Learning
By
B.JAYARAM
Assistant Professor
Department of Computer Science and Engineering
Malla Reddy Institute of Technology
Hyderabad - 500055
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 1
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 2
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 3
Machine Learning
• What is data?
• Where the data is available?
• Types of Data.
• What is data analytics?
• What way data is related to machine
learning?.
• Architecture of Machine learning Model.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 4
What is Data?
• Collection of information stored in a particular
file.
– Structured form
• Any form of relational database structure where
relation between attributes is possible. Eg: using
database programming languages (SQL, Oracle, Mysql
etc).
– Unstructured form.
• Any form of data that does not have predefined
structure. Eg: video, images, Comments, posts, few
websites such as blogs and wikipedia
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 5
Machine Learning
• Where the data is available?
– There are lot of sources of data available.
– Primary source of data
• Eg: data created by individual or a business concern on
their own.
– Secondary source of data
• Eg: data can be extracted from cloud servers, website
sources (kaggle, UCI, AWS, google cloud, Twitter,
Facebook, youtube, Github etc..)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 6
Machine Learning
• What is data?
• Where the data is available?
• Types of Data.
• What is data analytics?
• What way data is related to machine
learning?.
• Architecture of Machine learning Model.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 7
Machine Learning
• Types of data
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 8
Qualitative Data
• Nominal Data.
– There is no natural ordering in values in the
attribute of the dataset.
– Eg: color, Gender, nouns ( name, place, animal,
thing)
• Ordinal Data.
– Has natural ordering in values in the attribute of
the dataset.
– Eg: size (S,M,L,XL,XXL ), rating (excellent, good,
better,worst)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 9
Quantitative Data
• Discrete Attribute:
– It takes only finite number of numerical values
(integers).
– Eg: number of buttons, no of days for product
delivery etc..
• Continuous Attribute:
– It can take finite number of fractional values.
– Eg: price, discount, height, weight, length,
temperature, speed etc…
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 10
Sample Dataset
• Covid 19 Dataset (statewise in India)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 11
Machine Learning
• What is data?
• Where the data is available?
• Types of Data.
• What is data analytics?
• What way data is related to machine
learning?.
• Architecture of Machine learning Model.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 12
Machine Learning
• What is data analytics?
– Data analytics is the science of analyzing raw data in
order to make conclusions about that information. ...
This information can then be used to optimize
processes to increase the overall efficiency of a
business or system.
Types:
– Descriptive analytics. Eg: (observation, case-study,
surveys)
– Predictive analytics. Eg: Healthcare, sports, weather,
insurance, social media analysis.
– Prescriptive analytics. Eg: Healthcare, banking.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 13
Machine Learning
• What is data?
• Where the data is available?
• Types of Data.
• What is data analytics?
• What way data is related to machine learning?
• Architecture of Machine learning Model.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 14
Machine Learning Cont….
• What way data is related to machine learning?
• Architecture of Machine learning Model.
• Data analytics is a subcomponent of machine
learning.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 15
Analytics
Machine Learning
• Machine learning is an application of
artificial intelligence (AI) that provides
systems the ability to automatically learn and
improve from experience without being
explicitly programmed.
• Machine learning focuses on the
development of computer programs that can
access data and use it learn for themselves.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 16
Assumptions in Machine Learning
• If assumptions are not met, the model may
inaccurately reflect the data and will likely
result in inaccurate predictions.
• The assumptions are
– Diagnostics.
– Multicollinearity.
– Dataset Distributions.
– Outliers.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 17
Diagnostics
• Diagnostics are used to evaluate the model
assumptions and figure out whether or not
there are observations with a large, undue
influence (dependent on certain factor) on the
analysis.
• It is mainly used in regression analysis (how
the independent Y variable changes when one
of the X variables changes ).
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 18
Multicollinearity
• Multicollinearity occurs when a dataset’s
features, or X variables are not independent
from each other.
• Major problem in regression analysis .
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 19
Dataset Distribution
• The distribution of a dataset shows the
different possible values for a characteristic of
a population.
• Mostly normal distribution is being used.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 20
Sample Normal Distribution
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 21
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
Series1
Outliers
• outliers can greatly influence our model and
alter its effectiveness.
• Mean is more sensitive to Outliers.
• It can be identified using box plot.
• Eg:
– series 1:3,5.0,5.1, 5.2, 5.3, 5.3,5.4, 5.7, 5.8, 5.9,
– Series 2: 2.1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 22
BoxPlot
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 23
-5
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11
Series2
Series1
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 24
Usage of Machine Learning
• Virtual Personal Assistance:
– Similar to AI. Eg: Google Home (speaker), Amazon
Allo (mobile app).
• Predictions in commuting: Eg: Traffic Predictions.
• cctv surveillance camera: Eg: To help in theft activities
• Social media services:
– Eg: mutual friends, face recognition etc..
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 25
Usage of Machine Learning
• Email spam and malware filtering.
– Uses rule based filtering algorithms of ML. Eg: decision
tree, multi layer perceptron etc..
• Online customer support
– using AI based chatbots using ML. It can also be created
using AWS
– Eg: Banking, Insurance
• Search engine result refining.
– Eg: in Google many algorithms such as Google Panda,
Google Penguin, Google Hummingbird, Google Pigeon,
Google Mobile, Google Rankbrain, Google Possum, Google
Fred
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 26
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 27
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 28
Supervised Learning
• Supervised learning algorithms requires, a data
analyst with learning skills to provide both input
and desired output. And also provide details
about accuracy of predicted data by providing
feedback.
• Supervised learning algorithms has labelled data.
• It contains 3 parts
– Extraction
– Training
– Prediction
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 29
Supervised Learning Workflow
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 30
List of Supervised Algorithms
• The lists of few supervised algorithms are listed
below.
– Decision Trees
– Naive Bayes Classification
– Support vector machines for classification problems
– Random forest for classification and regression
problems
– Linear regression for regression problems
– Ordinary Least Squares Regression
– Logistic Regression
– Ensemble Methods
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 31
Unsupervised Learning
• Unsupervised learning algorithms do not need to be
trained with desired outcome data, but it uses deep
learning approach to review data and come to
conclusions.
• Unsupervised learning has un-labelled data.
• Mainly used in various applications such as image
processing and speech to text conversion, through
neural networks.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 32
Unsupervised Learning Workflow
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 33
List of Unsupervised Learning
Algorithms
• Some common unsupervised algorithms are
listed below
– K-means for clustering problems
– Apriori algorithm for association rule learning
problems
– Principal Component Analysis.
– Singular Value Decomposition.
– Independent Component Analysis.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 34
Supervised vs Unsupervised Leaning
Algorithms
Supervised Learning
Algorithm
Unsupervised Learning
Algorithm
Input Data Labelled data Un-labelled data
Computation complexity Very high Less complexity
Real Time usage Uses of off-line analysis Uses real time analysis
No of classes Known (fixed). Unknown
Accuracy of results Accurate and reliable Moderate and reliable
Category Classification and
Regression
Clustering and association
rule mining.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 35
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 36
Classification
• Classification is a technique where we
categorize data into a given number of classes.
• Classification based machine learning algorithms
are
– Decision Trees.
– Bayesian Classifiers.
– Neural Networks.
– K-Nearest Neighbor.
– Support Vector Machines
– Linear Regression.
– Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 37
Working using R-studio ( Covid-19
dataset)
• covid_india <- read.csv("C:/Users/Admin/Downloads/covid19-
in-india/covid_19_india.csv",header = TRUE)
• state <- table(covid_india$State.UnionTerritory)
• barplot(state)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 38
State-wise count (obtained from R-
studio)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 39
Decision Trees
• Decision tree is a tree
with following properties.
– A inner node represents an
attribute.
– An edge represents the
test of the attribute of the
further node
– A leaf represents one of
the classes.
• Construction of decision
tree is based on training
data.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 40
Types of Decision Tree
• Binary variable decision tree: Decision tree
which has a binary target variable. Eg: will you
play chess? (Yes/No)
• Continuous variable decision tree: Decision
tree which has a continuous target variable.
Eg: prediction of whether all customers in a
insurance company will pay insurance or not.
(Yes/ No)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 41
R code for creating decision tree
• library(rpart)
• library(rpart.plot)
• decisionTree_model <- rpart(Class ~ . , creditcard_data,
method = 'class')
• predicted_val <- predict(decisionTree_model,
creditcard_data, type = 'class')
• probability <- predict(decisionTree_model,
creditcard_data, type = 'prob')
• rpart.plot(decisionTree_model)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 42
Decision Tree for credit card dataset
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 43
Advantages of decision trees
• Very easy to understand.
• Easy data exploration.
• Less data cleaning is required.
• All datatype accepted (qualitative or
quantitative)
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 44
Disadvantage of Decision Trees
• Overfitting.
• Not fit for continuous variables.
We use random forest algorithm to overcome
these drawbacks.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 45
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 46
Random Forest Algorithm
• Scheduled to discuss tomorrow in our
schedule.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 47
Contents
• Machine Learning.
• Usage of Machine Learning.
• Supervised vs Unsupervised Learning.
• Classification.
• Regression Models.
• Decision trees.
• Random Forest.
• Logistic Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 48
Regression Models
• Logistic Regression.
• Linear Regression.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 49
Regression
 Regression is a supervised machine learning
technique where the output variable is continuous.
 Ex: predict sales of product, stock price, temperature,
house price ….
What is Linear Regression:
– It is way of finding a relationship between a single
continuous variable called dependent or target
variable and one or more other variables
(continuous or not) called independent variables
 Where y is dependent variable
 x is independent variable
 b is slope --> how much the line rises for each unit
increase in x
 a is intercept --> the value of y when x=0.
Simple Linear Regression: When you have a single
independent variable, then we call it as Simple Linear
Regression
• Ex: Height(input) --> Weight; Experience(input) --> salary
Multiple Linear Regression:
 When you have multiple independent variables, then
we call it as Multiple Linear Regression
 Ex: sqft,no of bed rooms, location, brand, floor rise
etc. --> Predict house price
Estimate beta coefficients
Ordinary least Square:
 The objective of OLS is to minimize the sum of
squares of residuals (Σerror^2)= (Yact -Ypred)^2
 Beta = Inverse(Xtranspose * X) * Xtranspose*Y -->
(Hat Matrix)
 We make use of linear algebra(matrices)
Variable Selection Methods: (For Regression
only)
 Forward selection: Starts with a single variable, then
add other variables one at a time based on AIC values
(AIC: Akaike Information Criteria Model performance
metrics /measures)
 Backward Elimination: Starts with all variables,
iteratively removing those variables of low
importance based on AIC values
 Stepwise Regression (Bi-direction regression):
Run in both directions
How to find the best Regression line, the
line of best fit:
We discussed that the regression line establishes a
relationship between IND and DEP variables.
A line which explain the relationship better is said
to be the BEST FIT LINE
In other words, the best fit line tends to return the
most accurate value of Y based on X i.e. cause a
minimum difference between the actual and
predicted value of Y (lower prediction error)
Assumptions in regression: ******
 Regression is a parametric approach. Parametric means it
makes assumptions about data for the purpose of analysis
 Linear and additive (Effect of 1 variable 'x1' on Y is independent
of other variables)
 There should be no correlation between the residual terms -->
Auto Correlation (Time series)
 Independent variables should not be correlated --
> Multicollinearity
 Errors terms must have constant variance.
– Constant --> Homoscedasticity;
– non constant --> Heteroscedasticity
 Error terms must be normally distributed
Errors
 Sum of all errors: (Σerror) = Actual -Predicted =Σ(Y-Y^)
 Sum of absolute value of all errors: (Error|)
 Sum of square of all errors:(Σerror^2)
Logistic Regression
 Logistic Regression technique is borrowed by
machine learning from the field of statistics
 It is the go-to method for binary classification (2 class
values -S/F; Y/N..)
 Logistic regression or Logit regression or Logit
model -it is a regression model where the dependent
variable is categorical
Logistic Regression
 Logistic regression measures the relationship between
a categorical DV and one or more independent
variables by estimating the probabilities using a
logistic function
 It is used to predict the binary outcome given a set of
independent variables
Logistic Regression
 LR can be seen as special case of GLM (Generalized
Linear Models) and thus similar to linear regression.
 Below are key differences:
– Predicted values are probabilities and therefore restricted
(0,1) through the logistic distribution function
– Conditional distribution P (Y=0 | for all X) and P (Y=1 | for
all X) is a Bernoulli distribution rather than a Gaussian
distribution
Applications
 Email: spam/No spam
 Online transaction: F/NF
 Customer churn: (R/E)
 HR status: J/NJ
 Credit scoring: D/ND
Advantages
Highly interpretable
Outputs are well calibrated predicted
probabilities
Model training and prediction are fast
Features don’t need scaling
Can perform well with a small number of
observations
Probability to log of odds ratio:
 Let Y be the primary outcome variable indicates:
S/F; 1/0..
 P be the probability of Y to be 1 P(Y=1);
to be 0 P(Y=0)
 X1, X2,…. Xn be the set of predictor variables
 B1,B2… Bn be the model coefficients
Probability to log of odds ratio
Logit Function:
Logistic regression is an estimation of logit
function.
Logit function is simply a log of odds ratio in
favour of event
This function creates a s-shaped curve with the
probability estimate
Logit
Function
In general, we can use the below
for classification
 Confusion matrix (sensitivity, specificity, F1…)
 -K fold cross validation
 -AUC-ROC (Area Under Curve -Receiver Operating
characteristic) --> always this score should be close
towards 1
Queries & Suggestions
• Feel free to mail me for at
jayaramb05@gmail.com.
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 71
Thank you
5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 72

More Related Content

What's hot

What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 

What's hot (20)

Datascience and python
Datascience and pythonDatascience and python
Datascience and python
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Ai lecture 06(unit-02)
Ai lecture 06(unit-02)Ai lecture 06(unit-02)
Ai lecture 06(unit-02)
 
Machine Learning for Improving Disaster Management and Response (WPS313) - AW...
Machine Learning for Improving Disaster Management and Response (WPS313) - AW...Machine Learning for Improving Disaster Management and Response (WPS313) - AW...
Machine Learning for Improving Disaster Management and Response (WPS313) - AW...
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search example
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
Artificial Intelligence Overview Powerpoint Presentation Slides
Artificial Intelligence Overview Powerpoint Presentation SlidesArtificial Intelligence Overview Powerpoint Presentation Slides
Artificial Intelligence Overview Powerpoint Presentation Slides
 
Data science
Data scienceData science
Data science
 
Text to speech with Google Cloud
Text to speech with Google CloudText to speech with Google Cloud
Text to speech with Google Cloud
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
Data science
Data scienceData science
Data science
 
Artificial intelligence & machine learning landscape
Artificial intelligence & machine learning landscapeArtificial intelligence & machine learning landscape
Artificial intelligence & machine learning landscape
 
FDP on AI and ML by R. Rajkumar
FDP on AI and ML by  R. RajkumarFDP on AI and ML by  R. Rajkumar
FDP on AI and ML by R. Rajkumar
 
Natural language Processing.pptx
Natural language Processing.pptxNatural language Processing.pptx
Natural language Processing.pptx
 
Artificial intelligence and expert system.ppt
Artificial intelligence and expert system.pptArtificial intelligence and expert system.ppt
Artificial intelligence and expert system.ppt
 
Q-learning
Q-learningQ-learning
Q-learning
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 

Similar to Fdp ppt

ML UNIT-I.ppt
ML UNIT-I.pptML UNIT-I.ppt
ML UNIT-I.ppt
Gskeitb
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 

Similar to Fdp ppt (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Week 2 lecture
Week 2 lectureWeek 2 lecture
Week 2 lecture
 
1-Intro to MIS
1-Intro to MIS1-Intro to MIS
1-Intro to MIS
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
ML UNIT-I.ppt
ML UNIT-I.pptML UNIT-I.ppt
ML UNIT-I.ppt
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
Oct2019 - What is machine learning?
Oct2019 - What is machine learning?Oct2019 - What is machine learning?
Oct2019 - What is machine learning?
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & Challenges
 
Wrap up
Wrap upWrap up
Wrap up
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Test-Driven Machine Learning
Test-Driven Machine LearningTest-Driven Machine Learning
Test-Driven Machine Learning
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Fdp ppt

  • 1. Machine Learning By B.JAYARAM Assistant Professor Department of Computer Science and Engineering Malla Reddy Institute of Technology Hyderabad - 500055 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 1
  • 2. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 2
  • 3. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 3
  • 4. Machine Learning • What is data? • Where the data is available? • Types of Data. • What is data analytics? • What way data is related to machine learning?. • Architecture of Machine learning Model. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 4
  • 5. What is Data? • Collection of information stored in a particular file. – Structured form • Any form of relational database structure where relation between attributes is possible. Eg: using database programming languages (SQL, Oracle, Mysql etc). – Unstructured form. • Any form of data that does not have predefined structure. Eg: video, images, Comments, posts, few websites such as blogs and wikipedia 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 5
  • 6. Machine Learning • Where the data is available? – There are lot of sources of data available. – Primary source of data • Eg: data created by individual or a business concern on their own. – Secondary source of data • Eg: data can be extracted from cloud servers, website sources (kaggle, UCI, AWS, google cloud, Twitter, Facebook, youtube, Github etc..) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 6
  • 7. Machine Learning • What is data? • Where the data is available? • Types of Data. • What is data analytics? • What way data is related to machine learning?. • Architecture of Machine learning Model. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 7
  • 8. Machine Learning • Types of data 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 8
  • 9. Qualitative Data • Nominal Data. – There is no natural ordering in values in the attribute of the dataset. – Eg: color, Gender, nouns ( name, place, animal, thing) • Ordinal Data. – Has natural ordering in values in the attribute of the dataset. – Eg: size (S,M,L,XL,XXL ), rating (excellent, good, better,worst) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 9
  • 10. Quantitative Data • Discrete Attribute: – It takes only finite number of numerical values (integers). – Eg: number of buttons, no of days for product delivery etc.. • Continuous Attribute: – It can take finite number of fractional values. – Eg: price, discount, height, weight, length, temperature, speed etc… 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 10
  • 11. Sample Dataset • Covid 19 Dataset (statewise in India) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 11
  • 12. Machine Learning • What is data? • Where the data is available? • Types of Data. • What is data analytics? • What way data is related to machine learning?. • Architecture of Machine learning Model. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 12
  • 13. Machine Learning • What is data analytics? – Data analytics is the science of analyzing raw data in order to make conclusions about that information. ... This information can then be used to optimize processes to increase the overall efficiency of a business or system. Types: – Descriptive analytics. Eg: (observation, case-study, surveys) – Predictive analytics. Eg: Healthcare, sports, weather, insurance, social media analysis. – Prescriptive analytics. Eg: Healthcare, banking. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 13
  • 14. Machine Learning • What is data? • Where the data is available? • Types of Data. • What is data analytics? • What way data is related to machine learning? • Architecture of Machine learning Model. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 14
  • 15. Machine Learning Cont…. • What way data is related to machine learning? • Architecture of Machine learning Model. • Data analytics is a subcomponent of machine learning. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 15 Analytics
  • 16. Machine Learning • Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. • Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 16
  • 17. Assumptions in Machine Learning • If assumptions are not met, the model may inaccurately reflect the data and will likely result in inaccurate predictions. • The assumptions are – Diagnostics. – Multicollinearity. – Dataset Distributions. – Outliers. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 17
  • 18. Diagnostics • Diagnostics are used to evaluate the model assumptions and figure out whether or not there are observations with a large, undue influence (dependent on certain factor) on the analysis. • It is mainly used in regression analysis (how the independent Y variable changes when one of the X variables changes ). 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 18
  • 19. Multicollinearity • Multicollinearity occurs when a dataset’s features, or X variables are not independent from each other. • Major problem in regression analysis . 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 19
  • 20. Dataset Distribution • The distribution of a dataset shows the different possible values for a characteristic of a population. • Mostly normal distribution is being used. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 20
  • 21. Sample Normal Distribution 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 21 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -4 -3 -2 -1 0 1 2 3 4 Series1
  • 22. Outliers • outliers can greatly influence our model and alter its effectiveness. • Mean is more sensitive to Outliers. • It can be identified using box plot. • Eg: – series 1:3,5.0,5.1, 5.2, 5.3, 5.3,5.4, 5.7, 5.8, 5.9, – Series 2: 2.1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 22
  • 23. BoxPlot 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 23 -5 0 5 10 15 20 25 30 1 2 3 4 5 6 7 8 9 10 11 Series2 Series1
  • 24. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 24
  • 25. Usage of Machine Learning • Virtual Personal Assistance: – Similar to AI. Eg: Google Home (speaker), Amazon Allo (mobile app). • Predictions in commuting: Eg: Traffic Predictions. • cctv surveillance camera: Eg: To help in theft activities • Social media services: – Eg: mutual friends, face recognition etc.. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 25
  • 26. Usage of Machine Learning • Email spam and malware filtering. – Uses rule based filtering algorithms of ML. Eg: decision tree, multi layer perceptron etc.. • Online customer support – using AI based chatbots using ML. It can also be created using AWS – Eg: Banking, Insurance • Search engine result refining. – Eg: in Google many algorithms such as Google Panda, Google Penguin, Google Hummingbird, Google Pigeon, Google Mobile, Google Rankbrain, Google Possum, Google Fred 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 26
  • 27. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 27
  • 28. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 28
  • 29. Supervised Learning • Supervised learning algorithms requires, a data analyst with learning skills to provide both input and desired output. And also provide details about accuracy of predicted data by providing feedback. • Supervised learning algorithms has labelled data. • It contains 3 parts – Extraction – Training – Prediction 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 29
  • 30. Supervised Learning Workflow 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 30
  • 31. List of Supervised Algorithms • The lists of few supervised algorithms are listed below. – Decision Trees – Naive Bayes Classification – Support vector machines for classification problems – Random forest for classification and regression problems – Linear regression for regression problems – Ordinary Least Squares Regression – Logistic Regression – Ensemble Methods 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 31
  • 32. Unsupervised Learning • Unsupervised learning algorithms do not need to be trained with desired outcome data, but it uses deep learning approach to review data and come to conclusions. • Unsupervised learning has un-labelled data. • Mainly used in various applications such as image processing and speech to text conversion, through neural networks. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 32
  • 33. Unsupervised Learning Workflow 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 33
  • 34. List of Unsupervised Learning Algorithms • Some common unsupervised algorithms are listed below – K-means for clustering problems – Apriori algorithm for association rule learning problems – Principal Component Analysis. – Singular Value Decomposition. – Independent Component Analysis. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 34
  • 35. Supervised vs Unsupervised Leaning Algorithms Supervised Learning Algorithm Unsupervised Learning Algorithm Input Data Labelled data Un-labelled data Computation complexity Very high Less complexity Real Time usage Uses of off-line analysis Uses real time analysis No of classes Known (fixed). Unknown Accuracy of results Accurate and reliable Moderate and reliable Category Classification and Regression Clustering and association rule mining. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 35
  • 36. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 36
  • 37. Classification • Classification is a technique where we categorize data into a given number of classes. • Classification based machine learning algorithms are – Decision Trees. – Bayesian Classifiers. – Neural Networks. – K-Nearest Neighbor. – Support Vector Machines – Linear Regression. – Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 37
  • 38. Working using R-studio ( Covid-19 dataset) • covid_india <- read.csv("C:/Users/Admin/Downloads/covid19- in-india/covid_19_india.csv",header = TRUE) • state <- table(covid_india$State.UnionTerritory) • barplot(state) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 38
  • 39. State-wise count (obtained from R- studio) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 39
  • 40. Decision Trees • Decision tree is a tree with following properties. – A inner node represents an attribute. – An edge represents the test of the attribute of the further node – A leaf represents one of the classes. • Construction of decision tree is based on training data. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 40
  • 41. Types of Decision Tree • Binary variable decision tree: Decision tree which has a binary target variable. Eg: will you play chess? (Yes/No) • Continuous variable decision tree: Decision tree which has a continuous target variable. Eg: prediction of whether all customers in a insurance company will pay insurance or not. (Yes/ No) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 41
  • 42. R code for creating decision tree • library(rpart) • library(rpart.plot) • decisionTree_model <- rpart(Class ~ . , creditcard_data, method = 'class') • predicted_val <- predict(decisionTree_model, creditcard_data, type = 'class') • probability <- predict(decisionTree_model, creditcard_data, type = 'prob') • rpart.plot(decisionTree_model) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 42
  • 43. Decision Tree for credit card dataset 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 43
  • 44. Advantages of decision trees • Very easy to understand. • Easy data exploration. • Less data cleaning is required. • All datatype accepted (qualitative or quantitative) 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 44
  • 45. Disadvantage of Decision Trees • Overfitting. • Not fit for continuous variables. We use random forest algorithm to overcome these drawbacks. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 45
  • 46. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 46
  • 47. Random Forest Algorithm • Scheduled to discuss tomorrow in our schedule. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 47
  • 48. Contents • Machine Learning. • Usage of Machine Learning. • Supervised vs Unsupervised Learning. • Classification. • Regression Models. • Decision trees. • Random Forest. • Logistic Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 48
  • 49. Regression Models • Logistic Regression. • Linear Regression. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 49
  • 50. Regression  Regression is a supervised machine learning technique where the output variable is continuous.  Ex: predict sales of product, stock price, temperature, house price …. What is Linear Regression: – It is way of finding a relationship between a single continuous variable called dependent or target variable and one or more other variables (continuous or not) called independent variables
  • 51.
  • 52.  Where y is dependent variable  x is independent variable  b is slope --> how much the line rises for each unit increase in x  a is intercept --> the value of y when x=0. Simple Linear Regression: When you have a single independent variable, then we call it as Simple Linear Regression • Ex: Height(input) --> Weight; Experience(input) --> salary
  • 53.
  • 54. Multiple Linear Regression:  When you have multiple independent variables, then we call it as Multiple Linear Regression  Ex: sqft,no of bed rooms, location, brand, floor rise etc. --> Predict house price
  • 55. Estimate beta coefficients Ordinary least Square:  The objective of OLS is to minimize the sum of squares of residuals (Σerror^2)= (Yact -Ypred)^2  Beta = Inverse(Xtranspose * X) * Xtranspose*Y --> (Hat Matrix)  We make use of linear algebra(matrices)
  • 56. Variable Selection Methods: (For Regression only)  Forward selection: Starts with a single variable, then add other variables one at a time based on AIC values (AIC: Akaike Information Criteria Model performance metrics /measures)  Backward Elimination: Starts with all variables, iteratively removing those variables of low importance based on AIC values  Stepwise Regression (Bi-direction regression): Run in both directions
  • 57. How to find the best Regression line, the line of best fit: We discussed that the regression line establishes a relationship between IND and DEP variables. A line which explain the relationship better is said to be the BEST FIT LINE In other words, the best fit line tends to return the most accurate value of Y based on X i.e. cause a minimum difference between the actual and predicted value of Y (lower prediction error)
  • 58. Assumptions in regression: ******  Regression is a parametric approach. Parametric means it makes assumptions about data for the purpose of analysis  Linear and additive (Effect of 1 variable 'x1' on Y is independent of other variables)  There should be no correlation between the residual terms --> Auto Correlation (Time series)  Independent variables should not be correlated -- > Multicollinearity  Errors terms must have constant variance. – Constant --> Homoscedasticity; – non constant --> Heteroscedasticity  Error terms must be normally distributed
  • 59. Errors  Sum of all errors: (Σerror) = Actual -Predicted =Σ(Y-Y^)  Sum of absolute value of all errors: (Error|)  Sum of square of all errors:(Σerror^2)
  • 60. Logistic Regression  Logistic Regression technique is borrowed by machine learning from the field of statistics  It is the go-to method for binary classification (2 class values -S/F; Y/N..)  Logistic regression or Logit regression or Logit model -it is a regression model where the dependent variable is categorical
  • 61. Logistic Regression  Logistic regression measures the relationship between a categorical DV and one or more independent variables by estimating the probabilities using a logistic function  It is used to predict the binary outcome given a set of independent variables
  • 62. Logistic Regression  LR can be seen as special case of GLM (Generalized Linear Models) and thus similar to linear regression.  Below are key differences: – Predicted values are probabilities and therefore restricted (0,1) through the logistic distribution function – Conditional distribution P (Y=0 | for all X) and P (Y=1 | for all X) is a Bernoulli distribution rather than a Gaussian distribution
  • 63. Applications  Email: spam/No spam  Online transaction: F/NF  Customer churn: (R/E)  HR status: J/NJ  Credit scoring: D/ND
  • 64. Advantages Highly interpretable Outputs are well calibrated predicted probabilities Model training and prediction are fast Features don’t need scaling Can perform well with a small number of observations
  • 65. Probability to log of odds ratio:  Let Y be the primary outcome variable indicates: S/F; 1/0..  P be the probability of Y to be 1 P(Y=1); to be 0 P(Y=0)  X1, X2,…. Xn be the set of predictor variables  B1,B2… Bn be the model coefficients
  • 66. Probability to log of odds ratio
  • 67. Logit Function: Logistic regression is an estimation of logit function. Logit function is simply a log of odds ratio in favour of event This function creates a s-shaped curve with the probability estimate
  • 69. In general, we can use the below for classification  Confusion matrix (sensitivity, specificity, F1…)  -K fold cross validation  -AUC-ROC (Area Under Curve -Receiver Operating characteristic) --> always this score should be close towards 1
  • 70.
  • 71. Queries & Suggestions • Feel free to mail me for at jayaramb05@gmail.com. 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 71
  • 72. Thank you 5/16/2020 FDP ON HADOOP AND MACHINE LEARNING 72

Editor's Notes

  1. Descriptive analytics: Business domain
  2. A virtual assistant provides various services to entrepreneurs or businesses from a remote location. Traffic Predictions used in cabs (ola, Uber etc…)
  3. A virtual assistant provides various services to entrepreneurs or businesses from a remote location. Traffic Predictions used in cabs (ola, Uber etc…)